Optimization proposal for ComponentManager.cpp data structure.

Mercury · May 18, 2022

I have been thinking lately about how to optimize the component manager system. I have an idea that seems worth trying and wanted to get any feedback possible from people who are more familiar with the code base then I am.

std::vector<std::unordered_map<entity_id_t, IComponent*> > m_ComponentsByInterface; // indexed by InterfaceId

This data structure seems to be doing two things at once, and because of that could be optimized more.

This structure serves two logic paths which seem to me are pretty hot: QueryInterface and getEntitiesWithInterface/getEntitiesWithInterfaceUnordered. (These are all functions within ComponentManager.cpp).

if we create a new data structure like:

std::vector<std::vector<iComponent*>> m_ComponenentsByEntity; //entity(outer) and component(inner) indexed by InterfaceID

to serve QueryInterface we could remove the relatively expensive std::map.find call it contains and replace with fast vector de-reference.

getEntitiesWithInterface could continue to use the existing data structure or the std::map could become a std::list. This would give marginal performance gains but also allow the minor logical gain of replacing calls to getEntitiesWithInterfaceUnordered with getEntitiesWithInterface, since the insertion order is preserved automatically. Also some significant performance gain for the ordered variety, but I think that one isn't used much anyway?

There is some minimal cost when units are created and a more significant cost when units die. Still it seems very much worth it. Also some RAM cost, pretty negligible. Maybe 17mb per 1000 units in a worst case scenario with 256 components?

After doing this maybe an experiment with using a naked array of component pointers in m_ComponentsByEntity to save the vector overhead is worth trying as a further upgrade.

Thank you for your time reading this and for any feedback you might have.

nani · May 18, 2022

Easiest way to find out is to test the change :=)
Also where is entity_id_t stored in the second case?

Mercury · May 18, 2022

@nani~~entitiy_id_t is used as the index of the vector in the second case.~~

Disregard that, I was thinking of the first case. in the second case entity id would have to be stored in the linked list i guess. So a custom linked list instead of std:list? Or a tuple holding entity IID and *IComponent? Hmm, but in that case there is no performance gain to be had vs. std:map. Maybe Ill just try m_ComponentsByEntity then and leave m_ComponentsByInterface alone. Maybe entity_id could be added to IComponent?

Edited May 18, 2022 by Mercury
wrong

**wraitii** · May 18, 2022

This sounds a little like what I've done with D3186. I need to get around to profiling that better.

**Stan`** · May 18, 2022

Or maybe @Mercury could try that if they feel comfortable This way you can work on other stuff and guide them. About RAM usage keep in mind we are limited to 32bits on windows for now. See this.

Mercury · May 18, 2022

@wraitiiDefinitely related optimizations but ultimately I think they are parallel.

This data structure proposal is agnostic about actual storage location, but providing an iterator yielding a series of sequential pointers should be quite fast.

Maybe this stuff should be grouped together in a ComponentDataManager class which is responsible for both allocator and pointer look-up data structures?

It could have an interface like addComponentToEntitiy, removeComponentFromEntitiy, componentsByType, componenetsByInterface, componentByEntityAndInterface, etc.

Just my personal bias against long files, feel free to ignore

Edited May 18, 2022 by Mercury
:

**smiley** · May 18, 2022

I recall seeing this on a patch attached to a ticket from long ago. EntityID as index which I guess is the gist of this. Ultimately it wasn't merged, so if the ticket could be found, there might be noteworthy discussion there. If my memory serves me right it also replaced the map with a vector.

(Might just be a goose chase, I recall stumbling upon the ticket sometime back in 2018)

**smiley** · May 18, 2022

https://trac.wildfiregames.com/ticket/1860 (there seems to be two separate proposals in here, including this particular one)

https://trac.wildfiregames.com/ticket/4046

wraitii posted in both tickets as well.

Edited May 18, 2022 by smiley

Mercury · May 18, 2022

@smiley @wraitii
Thanks for those links.

Reading through those threads gave me the impression that this was abandoned due to memory 'leaking' because of temporary entities.

But maybe temporary entities aren't really a problem? An empty vector in 32 bit is only 16 bytes. 1,000,000 entities would be 16MB. If a game had 10,000 units spawned and every unit averaged 100 projectiles in it's lifetime the cost is 16mb. That seems safe.

Edited May 19, 2022 by Mercury
removed my own iterative ramblings

**wraitii** · May 19, 2022

@smiley Yeah this has been an area of interest for me for a long while.

@Mercury On my end anyways, the problem has mostly been about convincing myself that:

The changes are correct and won't crash/OOS in odd ways
The changes are actually faster.

One difficulty is that profiling this type of change is difficult, since they tend to change the game determinism. So you kind of have to get a feel for it.

In the particular case of your suggestion, maybe it's straightforward with a replay, but it would still need to be shown. Changes for the sake of change are bad for 0 A.D., particularly 'optimisations' which tend to make things more complex. What I'm saying is that 'conceptually faster' might be 'totally irrelevant' or even not actually faster -> only hard, cold numbers matter.

In the case of my 'sparse vector + dense vector' approach of D3186, the performance degrades over game time and I haven't been sure that it's certainly better than just an unordered_map or things like that. There are slightly less brute-force approach that might make those cases better, but slower in other situations. It's tricky is what I'm saying.

10 hours ago, Mercury said:

This data structure proposal is agnostic about actual storage location,

The problem is that you can't be _that_ agnostic about the actual storage location. Getting more performance at this point generally means thinking about it, at least.

--

A few other things:

ComponentManager::QueryInterface is mostly used by JS code, where I think its performance cost is largely dwarfed by the performance cost of C++ <-> JS interop. The C++ code usually goes through the component cache.
I am not sure getEntitiesWithInterface is used all that much, I think most JS code that does something similar actually goes through the Range Manager.

--

Finally, a quick overview of some things that can definitely be optimised still:

Posting (global) messages is probably slower than it needs to be because of the std::map iteration. That's mostly what my diff above set out to fix, but we might gain from using specific structures for that instead.
Memory layout of components is fragmented (using new), and we would gain from denser layout. I do not know if we should favour putting components of the same entity together or components of the same type together.
We can probably gain by having more dynamic message subscriptions. The game already allows dynamic global subscriptions, but not dynamic local subscriptions.
The fact that we support several component types per interface slows everything down and prevents a number of optimisations, by basically adding a layer of indirection. We only use that feature for UnitMotion vs UnitMotionFlying, and to be honest I am not sure it's a good thing to support. But I also haven't really been able to convince myself of that, and tear the feature out.
- I think an approach here would be to split unit motion into a 'navigation' component that mostly has the current unitMotion interface, and more specific components 'WalkMotion', 'FlyingMotion' etc. that handle the nitty-gritty of moving. But that sounds kind of complicated, and would probably only be a benefit if we remove the component-interface split.

**smiley** · May 19, 2022

Too much holes is still non optimal though. A proper O(1) access container for sparse data would be best. A basic first step could just be chunk based.

And memory usage in the vector case would be count*sizeof(ptr) + sizeof(vector<ptr>) I suppose. And I am fairly certain sizeof(vector<T>) is not standard even in debug vs release on some platforms, but that's besides the point as the underlying buffer is the point here which should be ptrs.

Having an unnecessarily large block might also have some negative perf impact on the os level from the MMU.

For the general case, this would still probably be faster than map though. map<incrementing id, components> ignores some easy optimizations.

Edited May 19, 2022 by smiley

**Stan`** · May 19, 2022

Do we need a NOSQL database with sharding ? :p

Mercury · May 19, 2022

@Stan`We will need it eventually: the World of 0AD MMO is gonna be Webscale^tm!

Spoiler

Mercury · June 20, 2022

@wraitiiThanks for all that info! I made a patch based on your suggestions about memory layout and iteration. 12.3% improvement in my testing.

https://code.wildfiregames.com/D4718

**Stan`** · June 20, 2022

12% percent of what?

Mercury · June 20, 2022

Hi Stan

Reduction in the sum of the 'total' metric for a replay when comparing two sets of 10 runs, each set being averaged.

Profile data and commands.txt are attached to the diff.

Mercury · June 20, 2022

Is there something wrong with the windows build pipeline? It's saying the diff failed unit tests but the error is :

E:\Jenkins\workspace\vs2015-differential\cxxtest_debug.xml was length 0

**Stan`** · June 21, 2022

I think you might be triggering a debug assertion with your code.

Optimization proposal for ComponentManager.cpp data structure.

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation