Jump to content


Community Members
  • Posts

  • Joined

  • Last visited

Posts posted by Mercury

  1. Some further thoughts: There is no possibility of deadlock if only one thread at a time acquires a lock while holding another. A global JS mutex can designate that thread.

    When using mutex the primary danger is deadlock: when one thread holds mutex A while attempting to lock mutex B while another holds B when attempting to lock A.

    JS needs a global mutex and js code will need to lock arbitrary compiled-side data.

    So we can't prevent the thread which holds the JS mutex from locking additional mutex concurrently, creating half of a deadlock.

    What we can do is not provide the other half. The compiled code could be modified so all shared data is protected by mutex with the rule that nested locks are prohibited. If in some parts of the code it is not possible to do one mutex at a time without exposing an object with incomplete state we can use std::lock to lock multiple mutex atomicly. The resulting code will I think not be too complicated and the rule of no nested mutex is relatively easy to verify. As long as only the thread which currently holds the JS mutex is allowed to do nested mutex locks there is no possibility of deadlock.


    EDIT: I forgot to mention, this plan also requires a message queue, to reduce nested mutex instances which will need to be addressed to direct access via CmpPtr. All Post and Broadcast can be made safe automatically.

  2. 11 hours ago, Stan` said:

    My bad, don't think one can easily overload C++ functions, the roundtrip between js and c++ could be more costly than just js. Also you cannot extend interfaces with more functions, nor can you change the C++ code that use it. Then you have to overwrite those C++ functions with js ones too.

    We wouldn't need to switch to js just to check for existence of a file. Also the result can be cached.

    Regarding the initial point of the thread though: today I'm thinking mutex is overkill and we can use std::atomic. Just need to figure out what all data needs to be marked atomic.

  3. 16 hours ago, Stan` said:

    Rewriting JS code into C++ means more performance, but less moddability and breaking all the mods that depend on that...

    Well isn't the whole point of a message system to be asynchronous?

    In an extreme case we could have C++ which could optionally be replaced by js. Probably not needed. I hope.

    The purpose of a mutex is to prevent memory collisions: when threads try to write or read memory in the middle of another thread writing the memory. Regardless of the details of where threads split and join the mutex is needed, even just to prevent the main thread reading data while the simulation thread is writing it.

  4. Thanks.

    Looks like maybe 2% of users using two core / two thread celeron processors. Still may work better for them to separate simulation from graphics since os doesn't take that much time* but still not great. Is there a way we can test on the very low end?

    The mutex belongs to the data being accessed, not to the code which is accessing it.

    *Particularly when using allocators.

    EDIT: just checked and found my laptops a9-9425 is also 2 core / 2 thread, :) guess I can test this. Also about 1% of users are on 2 thread AMD machines.

  5. Quote

    a recource might be mutated by multiple ComponentTypes

    I'm not sure I understand, each access of a component (or other data we need to protect?) would require acquiring either a read lock or a read write lock. Where the access is coming from shouldn't matter.

    Dynamic message subscription is on my todo list.

    Added static message types to the list as well, thanks.

    Anyway, regardless of the (very much non-trivial) difficulties of multitheading simulation it's self, just separating it from the main thread seems like it would give a very large amount of simulation performance when keeping graphics smooth under load. 20fps * 16ms/frame = 360ms/second: A 56% increases in simulation time budget! At least for any machine that has two or more cores (4+ threads). And that is just considering simulation lag. If we consider animation lag then each simulation turn only gets ~34ms to run in (at 20 fps). In a dedicated thread this is not an issue at all: frame rate remains constant despite simulation lag. Do we have any data on what fraction of the users have a single core machine?

  6. The first phase would just be to separate simulation from graphics. This alone is worth the trouble.

    I don't understand what you mean regarding mutexes in an event based system. What sorts of problems do you have in mind?

    Regarding the more ambitious project of multithreading the simulation it's self the javascript is one issue to deal with, but i think not insurmountable. Some code which is currently in JS might have to be rewritten in C++, we would have to consider other engine users as well of course.  The threading model I'm thinking of for the second phase is to split into multiple threads during certain expensive tasks and then continue as a single thread soon after, so in some cases JS isn't involved at all. One option is to generate a priority queue sorted in some deterministic order (entityID maybe) multithreaded in c++ and then pass to JS. It's not an easy problem.

    • Like 1
  7. I'm thinking about a strategy to add more threading. Components would be protected from memory collisions by mutex*. Is there any other data that needs mutex protection? Any data which is both written to and read from during a game?


    *incrimenting read/readwrite, at the ComponentType level of granularity

  8. Ah I see now. I was under the impression that graphics and simulation ran in separate threads but have since learned otherwise. We should revisit this after that issue is resolved.

    • Doing some expensive tasks only on every other turn is good but running them on turns where we have extra time to work with is better
    • Repathing is probably the hard limit on what we can do here before things look off.
    • There is some simulation overhead checking those timers and running those queries. I don't know how much either. Also some network overhead.
  9. After being informed how things work, I'll revise my claims to a modest performance boost, maybe around 10%, and a reduction in network traffic around 10-18%, depending on player APM. Those things maybe worth 50ms in queued command lag / single player input lag.

    500ms sounds like too much. The difference between 2 turns per second and 5 turns per second (300ms) is much more then between 5 and 4 (50ms).

  10. Currently the default simulation turn length (DEFAULT_TURN_LENGTH) is 200ms. A higher number would reduce the number of turns per second and thus our total simulation load. For example setting 250ms would reduce simulation cost by 20%. The disadvantages I see are increased lag on input and between a unit finishing one task and beginning another. The input lag can be counteracted in multiplayer by reducing the number of turns commands are delayed(COMMAND_DELAY_MP). For example if COMMAND_DELAY_MP is reduced from 4 turns to 3 turns it would balance out increasing DEFAULT_TURN_LENGTH to 250ms. Single player COMMAND_DELAY is 1 so it can't be reduced. Would just have to accept the up to 50ms extra input lag here. I played a quick single player game and found both input lag and lag between queued tasks unnoticeable.


    This patch:



    Implements these changes. Try it and see if it feels weird?

    • Like 1
  11.  @smiley@wraitii
    Thanks for those links.

    Reading through those threads gave me the impression that this was abandoned due to memory 'leaking' because of temporary entities.

    But maybe temporary entities aren't really a problem? An empty vector in 32 bit is only 16 bytes. 1,000,000 entities would be 16MB. If a game had 10,000 units spawned and every unit averaged 100 projectiles in it's lifetime the cost is 16mb. That seems safe.

  12. @wraitiiDefinitely related optimizations but ultimately I think they are parallel.

    This data structure proposal is agnostic about actual storage location, but providing an iterator yielding a series of sequential pointers should be quite fast.


    Maybe this stuff should be grouped together in a ComponentDataManager class which is responsible for both allocator and pointer look-up data structures?

    It could have an interface like addComponentToEntitiy, removeComponentFromEntitiy, componentsByType, componenetsByInterface, componentByEntityAndInterface, etc.

    Just my personal bias against long files, feel free to ignore :)

  13. @nani~~entitiy_id_t is used as the index of the vector in the second case.~~


    Disregard that, I was thinking of the first case. in the second case entity id would have to be stored in the linked list i guess. So a custom linked list instead of std:list? Or a tuple holding entity IID and *IComponent? Hmm, but in that case there is no performance gain to be had vs. std:map. Maybe Ill just try m_ComponentsByEntity then and leave m_ComponentsByInterface alone. Maybe entity_id could be added to IComponent?

  14. I have been thinking lately about how to optimize the component manager system. I have an idea that seems worth trying and wanted to get any feedback possible from people who are more familiar with the code base then I am.

    std::vector<std::unordered_map<entity_id_t, IComponent*> > m_ComponentsByInterface; // indexed by InterfaceId

    This data structure seems to be doing two things at once, and because of that could be optimized more.

    This structure serves two logic paths which seem to me are pretty hot: QueryInterface and getEntitiesWithInterface/getEntitiesWithInterfaceUnordered. (These are all functions within ComponentManager.cpp).

    if we create a new data structure like:

    std::vector<std::vector<iComponent*>> m_ComponenentsByEntity; //entity(outer) and component(inner) indexed by InterfaceID


    to serve QueryInterface we could remove the relatively expensive std::map.find call it contains and replace with fast vector de-reference.

    getEntitiesWithInterface could continue to use the existing data structure or the std::map could become a std::list. This would give marginal performance gains but also allow the minor logical gain of replacing calls to getEntitiesWithInterfaceUnordered with getEntitiesWithInterface, since the insertion order is preserved automatically. Also some significant performance gain for the ordered variety, but I think that one isn't used much anyway?

    There is some minimal cost when units are created and a more significant cost when units die. Still it seems very much worth it. Also some RAM cost, pretty negligible. Maybe 17mb per 1000 units in a worst case scenario with 256 components?

    After doing this maybe an experiment with using a naked array of component pointers in m_ComponentsByEntity to save the vector overhead is worth trying as a further upgrade.

    Thank you for your time reading this and for any feedback you might have.

    • Thanks 1
  15. Hello, this is my first post here.

    I have been poking around in the relevant bits of code. Have not found a fix yet.

    I did find out however that this bug is not specific to water units but rather to units which die in the water and have blood.

    I saw the same behavior by killing infantry and cavalry on the beach at the surf line in Corinthian Isthmus.


    I did not see the behavior killing a battering ram.

    In a25 blood is drawn under the water for land units. No blood that I can see for whales / fishing boats etc.

    In a26 blood is drawn at the water level height, this seems to be the origin of the regression.

    One problem I am having is understanding the conventions of the C++ side. Can anyone tell me how to find where AddEntity is defined?

    I tried interacting with Engine.AddEntity through the in game console. Engine exists in that environment but AddEntity does not. Any solutions?

    Hopefully these notes help point someone in the right direction.


  • Create New...