Jump to content

Optimization proposal for ComponentManager.cpp data structure.


Recommended Posts

I have been thinking lately about how to optimize the component manager system. I have an idea that seems worth trying and wanted to get any feedback possible from people who are more familiar with the code base then I am.

std::vector<std::unordered_map<entity_id_t, IComponent*> > m_ComponentsByInterface; // indexed by InterfaceId

This data structure seems to be doing two things at once, and because of that could be optimized more.

This structure serves two logic paths which seem to me are pretty hot: QueryInterface and getEntitiesWithInterface/getEntitiesWithInterfaceUnordered. (These are all functions within ComponentManager.cpp).

if we create a new data structure like:

std::vector<std::vector<iComponent*>> m_ComponenentsByEntity; //entity(outer) and component(inner) indexed by InterfaceID

 

to serve QueryInterface we could remove the relatively expensive std::map.find call it contains and replace with fast vector de-reference.

getEntitiesWithInterface could continue to use the existing data structure or the std::map could become a std::list. This would give marginal performance gains but also allow the minor logical gain of replacing calls to getEntitiesWithInterfaceUnordered with getEntitiesWithInterface, since the insertion order is preserved automatically. Also some significant performance gain for the ordered variety, but I think that one isn't used much anyway?

There is some minimal cost when units are created and a more significant cost when units die. Still it seems very much worth it. Also some RAM cost, pretty negligible. Maybe 17mb per 1000 units in a worst case scenario with 256 components?

After doing this maybe an experiment with using a naked array of component pointers in m_ComponentsByEntity to save the vector overhead is worth trying as a further upgrade.

Thank you for your time reading this and for any feedback you might have.

  • Thanks 1
Link to comment
Share on other sites

Posted (edited)

@nani~~entitiy_id_t is used as the index of the vector in the second case.~~

 

Disregard that, I was thinking of the first case. in the second case entity id would have to be stored in the linked list i guess. So a custom linked list instead of std:list? Or a tuple holding entity IID and *IComponent? Hmm, but in that case there is no performance gain to be had vs. std:map. Maybe Ill just try m_ComponentsByEntity then and leave m_ComponentsByInterface alone. Maybe entity_id could be added to IComponent?

Edited by Mercury
wrong
Link to comment
Share on other sites

Posted (edited)

@wraitiiDefinitely related optimizations but ultimately I think they are parallel.
 

This data structure proposal is agnostic about actual storage location, but providing an iterator yielding a series of sequential pointers should be quite fast.

 

Maybe this stuff should be grouped together in a ComponentDataManager class which is responsible for both allocator and pointer look-up data structures?

It could have an interface like addComponentToEntitiy, removeComponentFromEntitiy, componentsByType, componenetsByInterface, componentByEntityAndInterface, etc.

Just my personal bias against long files, feel free to ignore :)

Edited by Mercury
:
Link to comment
Share on other sites

I recall seeing this on a patch attached to a ticket from long ago. EntityID as index which I guess is the gist of this. Ultimately it wasn't merged, so if the ticket could be found, there might be noteworthy discussion there. If my memory serves me right it also replaced the map with a vector.

(Might just be a goose chase, I recall stumbling upon the ticket sometime back in 2018)

 

Link to comment
Share on other sites

Posted (edited)

 @smiley@wraitii
Thanks for those links.

Reading through those threads gave me the impression that this was abandoned due to memory 'leaking' because of temporary entities.

But maybe temporary entities aren't really a problem? An empty vector in 32 bit is only 16 bytes. 1,000,000 entities would be 16MB. If a game had 10,000 units spawned and every unit averaged 100 projectiles in it's lifetime the cost is 16mb. That seems safe.

Edited by Mercury
removed my own iterative ramblings
Link to comment
Share on other sites

@smiley Yeah this has been an area of interest for me for a long while.

@Mercury On my end anyways, the problem has mostly been about convincing myself that:

  • The changes are correct and won't crash/OOS in odd ways
  • The changes are actually faster.

One difficulty is that profiling this type of change is difficult, since they tend to change the game determinism. So you kind of have to get a feel for it.

In the particular case of your suggestion, maybe it's straightforward with a replay, but it would still need to be shown. Changes for the sake of change are bad for 0 A.D., particularly 'optimisations' which tend to make things more complex. What I'm saying is that 'conceptually faster' might be 'totally irrelevant' or even not actually faster -> only hard, cold numbers matter.

In the case of my 'sparse vector + dense vector' approach of D3186, the performance degrades over game time and I haven't been sure that it's certainly better than just an unordered_map or things like that. There are slightly less brute-force approach that might make those cases better, but slower in other situations. It's tricky is what I'm saying.

10 hours ago, Mercury said:

This data structure proposal is agnostic about actual storage location,

The problem is that you can't be _that_ agnostic about the actual storage location. Getting more performance at this point generally means thinking about it, at least.

--

A few other things:

  • ComponentManager::QueryInterface is mostly used by JS code, where I think its performance cost is largely dwarfed by the performance cost of C++ <-> JS interop. The C++ code usually goes through the component cache.
  • I am not sure getEntitiesWithInterface is used all that much, I think most JS code that does something similar actually goes through the Range Manager.

--

Finally, a quick overview of some things that can definitely be optimised still:

  • Posting (global) messages is probably slower than it needs to be because of the std::map iteration. That's mostly what my diff above set out to fix, but we might gain from using specific structures for that instead.
  • Memory layout of components is fragmented (using new), and we would gain from denser layout. I do not know if we should favour putting components of the same entity together or components of the same type together.
  • We can probably gain by having more dynamic message subscriptions. The game already allows dynamic global subscriptions, but not dynamic local subscriptions.
  • The fact that we support several component types per interface slows everything down and prevents a number of optimisations, by basically adding a layer of indirection. We only use that feature for UnitMotion vs UnitMotionFlying, and to be honest I am not sure it's a good thing to support. But I also haven't really been able to convince myself of that, and tear the feature out.
    • I think an approach here would be to split unit motion into a 'navigation' component that mostly has the current unitMotion interface, and more specific components 'WalkMotion', 'FlyingMotion' etc. that handle the nitty-gritty of moving. But that sounds kind of complicated, and would probably only be a benefit if we remove the component-interface split.

 

Link to comment
Share on other sites

Too much holes is still non optimal though. A proper O(1) access container for sparse data would be best. A basic first step could just be chunk based.

And memory usage in the vector case would be count*sizeof(ptr) + sizeof(vector<ptr>) I suppose. And I am fairly certain sizeof(vector<T>) is not standard even in debug vs release on some platforms, but that's besides the point as the underlying buffer is the point here which should be ptrs.

Having an unnecessarily large block might also have some negative perf impact on the os level from the MMU.

For the general case, this would still probably be faster than map though. map<incrementing id, components> ignores some easy optimizations.

Edited by smiley
Link to comment
Share on other sites

  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...