[PATCH] SpatialSubdivision performance optimization

Kay · June 17, 2014

Hi,

Spring engine developer here peeking over the garden walls to a fellow RTS project.

While profiling 0AD I noticed certain functions relying on SpatialSubdivision::Get{Near,InRange} showed up alarmingly often and consumed much more CPU than seemed justified from the functionality provided by them. After quickly diving into your codebase I discovered that SpatialSubdivision::GetInRange calls make_unique on its SpatialQueryArray argument, which in turn issues a std::sort followed by a std::unique (!) just to handle the problem of buildings spanning multiple subdivision tiles. This is an *extremely* heavy-handed way to avoid duplicate entity ID's, not to mention a textbook case on how to kill performance since spatial queries are one of the most frequently executed (certainly in Spring's case) and hence time-critical parts of any RTS engine. So I fixed it, following the old adage that 99% of computing is an exercise in caching: 0AD-SpatialSubdivisionPerfPatch-v3.diff

This trades O(n log n) time for O(n) space which I would consider an acceptable deal since space is generally cheap but time almost never is. However, it also introduces a bit of double bookkeeping in SpatialSubdivision, which may not be to everyone's taste. Either way, with these changes the relevant functions dropped to neglible percentages, so I'm calling it an hour well spent. (AFAICS no calling code actually cares about the ID's being sorted, nor should it)

This patch was created against commit d14b8363fcd143d43a4888e7126e97fddf0015ab of your git mirror (sorry, not touching svn with a ten-foot pole), and is small enough that it shouldn't present any difficulty to apply.

ps: if there's still a need, I might also be tempted to take a look at your pathfinder issues... for a price

Edited June 17, 2014 by Kay

**sanderd17** · June 17, 2014

Hi Kay, normally, it's best to upload patches on Trac (http://trac.wildfiregames.com/), the patches get a nicer formatting there, so we don't even have to download them to read them.

As for the spatial subdivisions, that code is likely to change, as there are a number of problems with it (s.a. a fixed, and easy to reach limit on the number of entities).

I'll take a look at it.

**sanderd17** · June 17, 2014

I like your code, but it's not following our code conventions wrt code blocks (http://trac.wildfiregames.com/wiki/Coding_Conventions)

Also, why do you have a note about "uint32_t that should be entity_id_t" when entity_id_t is just defined to be u32. It would be clearer to leave out the note there, and just use entity_id_t.

"Magic" also isn't the most descriptive term. Something like "latestRangeCall" would be better I think. As it tells you what the latest range call was that had this entity. And if the latest range call is equal to the current range call, it means you already added it, so you don't have to add it again.

But apart from that, very nice.

Kay · June 17, 2014

I like your code, but it's not following our code conventions wrt code blocks

Ah, seems I'm too used to Spring's conventions.

It would be clearer to leave out the note there, and just use entity_id_t.

Spatial.h doesn't know the definition of entity_id_t (and treats id's as uint32_t's itself), so I did that for consistency and to avoid pulling in another header.

Something like "latestRangeCall" would be better I think.

Done and re-attached: 0AD-SpatialSubdivisionPerfPatch-v2.diff (I'll create a trac account for any future patches, right now this is a side-interest)

Edited June 17, 2014 by Kay

iNcog · June 17, 2014

this topic is pure gibberish to me

but i just want to thank Kay for his hour well spent anyway

**sanderd17** · June 17, 2014

Spatial.h doesn't know the definition of entity_id_t (and treats id's as uint32_t's itself), so I did that for consistency and to avoid pulling in another header.

It pulls in Component.h -> CmpPtr.h -> Entity.h. So it should know entity_id_t. But anyway, doesn't matter a lot. As you say, that entire file should be using the same types, so the entire file should be cleaned.

Radagast. · June 17, 2014

Hey Kay. You surprised us greatly.

I would be able to donate 5 euro a month one year long if our pathfinder probs get fixed, but I think it's a bad idea to pay devs directly (it was incredible big pressure for RedFox, our last paid developer).

Instead I would prefer a reward for each milestone:

For example:

- finish the speedier new short+long range pathfinder 50€.

- Alternatively if someone comes up with performance fixes for the current non-uniform pathfinder (which allows roads, but there are workaround thus we should really go Philip's route)

- loyalty + conversion of units / besieging buildings (10€)

- walkable bridges (10€) (extension: all buildings' roofs walkable? or even walkable meshes but that should give more reward?)

- walkable walls (5€ for Sander for the awesome current prop rework + 10€ for the next one who makes it possible that units really can walk ontop of walls.

- improve renderer 5€

- improve performance significantly at any point (depending on the significance 2..20€).

and so on. This way, noone is required to do anything and noone will have to bear the full community + team expectation pressure like it happened with RedFox.

I'm sure Philip won't mind reinforcement since Philip + Sander are needed at several frontiers, historic_bruno is busy with Saving + loading games and other issues, leper is busy with the Mod Configurator, Josh is into the GUI code, wraitii into the Renderer, trompetin into the Atlas Editor, and lost RedFox in one of our recent not-so-lucky battles (the kickstarter). Others I don't know. Perhaps there are more C++ developers I don't know yet.

If we introduce a reward system, then I will donate each month my 5€. In addition to my 5€ we might use the kickstarter income but I'm not sure how the general plan is and I don't want to interfere. It's only an idea.

Radagast. · June 17, 2014

Same for the art team:

- if in team + any of the below, then 1€ extra (to motivate people to try to get into the team).

submit

- 3D model (properly UV unwrapped) 1€

- complete new texture for a 3D model 1€

- animation 1€ (as there are many different required animations this is where most reward can be earned)

Anybody must take the reward but me, I will exclude myself from the reward system as I plan to create plenty of 3D models and it would be unfair if I proposed this reward system with my knowledge of me creating many 3d models.

Modders + community won't be paid out of the kickstarter as long as they are not in the team. Once they are in the team their recent efforts to get in the team might be rewarded, i.e. get the reward from the kickstarter income.

If 0AD wants to go this way and only pay team members or pay once the contributors eventually become team members, then it's fine to me and I would then use my 5€/month to create such a system for modders.

To make people desire to learn the whole process: 3D modeling + UV unwrapping + texturing + committing I would propose to only pay half the reward unless the artists (modders in this case) complete the whole cycle of 3D modeling + UV + texturing + git committing. (of course they will be guided. the advantage is that the next time the artists know it all themselves and our output + effectivity increases)

Radagast. · June 17, 2014

When to get the Reward:

Rewards are only payable once more than 5€ are reached. Then you can always ask us to pay via paypal or what ever.

**sanderd17** · June 17, 2014

The current decision is to not pay per task, to not give us the administrative overhead when reviewing patches. Reviewing takes a great deal of our time.

Anyway, about this patch, I've been pointed out that there's an other patch in trac too about removing that costly sort (and a more complete rewrite): http://trac.wildfiregames.com/ticket/2430

We'll at least have to compare the two, as #2430 also fixes the issue with the limited number of entities in a query (see http://trac.wildfiregames.com/ticket/2573).

Kay · June 17, 2014

I am naturally biased toward my own patch, but SpatialQueryArray having a fixed size would be trivial to fix: just replace items[MAX_COUNT] by a std::vector (reserve()'d to some initial size) and modify copy_items accordingly. If desired I can provide a v3.diff implementing that change as well.

While I am obviously unfamiliar with common 0AD development practices and don't want to take anything away from the work in 2430, a months-old unapplied patch lying around for an (apparently?) long-standing major problem suggests it contains an uncomfortable level of complexity (or perhaps simple indecisiveness ) and this is an area where KISS is paramount.

Edited June 17, 2014 by Kay

**sanderd17** · June 17, 2014

The issue isn't with that patch, it's with the the changes to Spatial that got in 9 months ago. Of which parts are reverted, but not everything.

And there's still no decision on whether we want to continue with the current version (and fix the issues with it), or rather rewrite it.

in any case, I thank you for this patch, as it's a nice solution for the duplication problems.

Kay · June 17, 2014

No problem, and I'll leave it at v3 then (see first post) for easier comparison. Good luck with your project.

Edited June 17, 2014 by Kay

**sanderd17** · June 17, 2014

Thanks, good luck with spring too

**wraitii** · June 19, 2014

This is the kind of beautiful, simple idea that would probably never have occurred to me (who did the other rewrite). It's probably a better solution than mine too. I might just update my patch to work with that method and revert to the dynamic sized template. AFAIK we have found no better system than a simple grid.

iNcog · June 19, 2014

In layman's terms, what exactly did Kay fix anyway?

agentx · June 19, 2014

> In layman's terms, what exactly did Kay fix anyway?

It's not a fix, it is an improvement. It caches results and prevents superfluous recalculation.

Wraitii, any success with this turbo switch?

**Ykkrosh** · June 20, 2014

AFAICS no calling code actually cares about the ID's being sorted, nor should it

ExecuteActiveQueries calls set_difference on the list returned by GetNear, so that it can find the (usually very small) lists of entities that entered and exited the range since the last turn. set_difference requires the lists to be sorted - if they're not then it will give incorrect results. (And if we do have to sort them anyway, the extra make_unique is essentially free.)

If I remember correctly, RedFox made a similar change once and it made ExecuteActiveQueries a lot faster, but it actually made the game slower, because it ended up sending many duplicate entity IDs to scripts which had to do a lot of work before rejecting the out-of-range entities. Am I missing something in your patch, or does it have the same problem?

Part of the need for this set_difference thing is that we don't really do any filtering in RangeManager - if an idle unit is waiting for an enemy to come into range, RangeManager will typically find all the hundreds of trees and friendly units that are always in range, and we don't want the UnitAI scripts to be looping over those hundreds of entities every turn to detect whether one is an enemy. (The notion of "enemy" is complicated because of diplomacy etc, and currently our C++ code is completely ignorant of that concept, so it can only do very generic filtering). How does Spring handle this kind of thing?

Kay · June 20, 2014

Ouch, looked at CmpRangeManager again and I indeed stepped on a small landmine there regarding the "no calling code actually cares about the ID's being sorted" part. That raises a few questions though: [edit: just realized this is all purely an overhead limitation strategy]

1) Does any game logic really *need* to know which entities entered or left range since the last turn? e.g. if I am a static defense turret, do I care specifically about new enemies that walked *into* my firing range when I can just call GetInRange again and grab all the ones that are in range *now* at much smaller cost? (depending on tile size and object count, but currently the sort's form a clear bottleneck)

2) Is ExecuteActiveQueries a frequent or infrequent (conditional) callee? If it's not an oft-visited code path, then inserting std::sort's before the std::set_difference's would fix its needs and still grant the common case performance improvement, otherwise the benefit of not having to sort at all obviously goes out the window. Could this be refactored such that it would only need to be invoked optionally by parties wanting to know exact diffsets?

RedFox made a similar change once and it made ExecuteActiveQueries a lot faster, but it actually made the game slower, because it ended up sending many duplicate entity IDs to scripts which had to do a lot of work before rejecting the out-of-range entities. Am I missing something in your patch, or does it have the same problem?

With my patch (or wraitii's) there would be no more duplicate ID's per individual query (two or more queries overlapping in area might still contain duplicates, of course), though this comment makes me more inclined to think 0AD's default grid resolution is too coarse since a simple Euclidean distance (or w/e) out-of-range check shouldn't break the bank but sending thousands of unfiltered entities to JS per turn certainly would (especially if the overhead isn't carefully managed by sending data across in batches) and from that perspective those set_diff's are understandable. Have you guys tried experimenting with this?

Part of the need for this set_difference thing is that we don't really do any filtering in RangeManager - if an idle unit is waiting for an enemy to come into range, RangeManager will typically find all the hundreds of trees and friendly units that are always in range, and we don't want the UnitAI scripts to be looping over those hundreds of entities every turn to detect whether one is an enemy. (The notion of "enemy" is complicated because of diplomacy etc, and currently our C++ code is completely ignorant of that concept, so it can only do very generic filtering). How does Spring handle this kind of thing?

Spring splits objects into separate classes (trees/rocks/etc are map features, ie. specialized entities) so queries don't have to iterate over anything the caller (engine or script) does not indicate it is interested in. There's no diplomacy aspect involved, but I'm assuming that it probably doesn't make much sense for a unit to wait by default for trees or rocks to venture into its range in 0AD's world either... and from what I've seen of your maps they tend to be densely packed with them so selection at the gates wouldn't hurt.

Edited June 20, 2014 by Kay

**Ykkrosh** · June 20, 2014

(The notion of "enemy" is complicated because of diplomacy etc, and currently our C++ code is completely ignorant of that concept, so it can only do very generic filtering)

Hmm, I misremembered the code a bit - it already does filter by player ID in the C++ (and the JS gives it a list of all the enemy player IDs). But it does that filtering *after* getting the sorted de-duplicated list of every entity owned by any player. That's certainly a bit silly.

(I don't know whether Gaia (owner of trees and rampaging lions) counts as an enemy in these queries...)

1) Does any game logic really *need* to know which entities entered or left range since the last turn? e.g. if I am a static defense turret, do I care specifically about new enemies that walked *into* my firing range when I can just call GetInRange again and grab all the ones that are in range *now* at much smaller cost? (depending on tile size and object count, but currently the sort's form a clear bottleneck)

I'd be surprised if it's ever a much smaller cost - sorting a list of ints in C++ is O(n log n) but the constant factors are pretty small, while converting a list of ints into a JS array and then iterating over that list in JS and checking whether each entity is an enemy is O(n) but huge constant factors. The set_difference thing was added specifically to minimise the JS cost.

2) Is ExecuteActiveQueries a frequent or infrequent (conditional) callee?

If I remember correctly, it's pretty much the only significant user of GetNear. (...not counting PickEntitiesAtPoint which currently uses GetNear but really needs to be redesigned anyway)

... this comment makes me more inclined to think 0AD's default grid resolution is too coarse since a simple Euclidean distance (or w/e) out-of-range check shouldn't break the bank but sending thousands of unfiltered entities to JS per turn certainly would (especially if the overhead isn't carefully managed by sending data across in batches). Have you guys tried experimenting with this?

Do you mean experimenting with different divisionSizes in SpatialSubdivision? I think I tested a few values when I first wrote this code (too many years ago) and the current size (8*8 terrain tiles) worked best in the scenario I ran, but it wasn't at all careful or comprehensive testing, and the game has changed a lot since then.

**sanderd17** · June 20, 2014

Ouch, looked at CmpRangeManager again and I indeed stepped on a small landmine there regarding the "no calling code actually cares about the ID's being sorted" part. That raises a few questions though:
1) Does any game logic really *need* to know which entities entered or left range since the last turn? e.g. if I am a static defense turret, do I care specifically about new enemies that walked *into* my firing range when I can just call GetInRange again and grab all the ones that are in range *now* at much smaller cost? (depending on tile size and object count, but currently the sort's form a clear bottleneck)

I believe the UnitAI los code only checks new units to see if there are hostile units visible. Less data to be send by messages also means less data transformation though, as you need to clone it for every message you send, you don't know what the message receivers will do with it.

2) Is ExecuteActiveQueries a frequent or infrequent (conditional) callee? If it's not an oft-visited code path, then inserting std::sort's before the std::set_difference's would fix its needs and still grant the common case performance improvement, otherwise the benefit of not having to sort at all obviously goes out the window. Could this be refactored such that it would only need to be invoked optionally by parties wanting to know exact diffsets?

I believe it's executed once per turn (200 or 500 ms). So it should be rather fast.

With my patch (or wraitii's) there would be no more duplicate ID's per individual query (two or more queries overlapping in area might still contain duplicates, of course), though this comment makes me more inclined to think 0AD's default grid resolution is too coarse since a simple Euclidean distance (or w/e) out-of-range check shouldn't break the bank but sending thousands of unfiltered entities to JS per turn certainly would (especially if the overhead isn't carefully managed by sending data across in batches) and from that perspective those set_diff's are understandable. Have you guys tried experimenting with this?

Not sure what you mean by our grid. We have a terrain grid, but that's only for rendering and the long range pathfinder. Entity positions are stored as fixed-point numbers, and are precise up to a few micro-meters iirc. The range manager uses those positions directly.

Spring splits objects into separate classes (trees/rocks/etc are map features, ie. specialized entities) so queries don't have to iterate over anything the caller (engine or script) does not indicate it is interested in. There's no diplomacy aspect involved, but I'm assuming that it probably doesn't make much sense for a unit to wait by default for trees or rocks to venture into its range in 0AD's world either... and from what I've seen of your maps they tend to be densely packed with them so selection at the gates wouldn't hurt.

The range manager can split by owner (so scripts can give a list of players if they only want enemies, allies, ...). And by implemented component: all units have the IID_UnitAI interface, resources have IID_ResourceHolder, anything possibly hostile has IID_Attack, ...

Which mostly filters out the not-wanted entities at the gate (apart for the los query, as that's used for multiple purposes, s.a. finding enemies, finding new resources, finding a dropsite, ...)

Radagast. · June 20, 2014

This implies we could get rid of the sorting as we have the chance to filter out trees by excluding units owned by Gaia/Nature. Though Philip said that would even be more costly than doing it right away on the C++ side ...

As a result it seems we can't go without it unless we put entities in bins by classes, but I fear it might complicate things..

[PATCH] SpatialSubdivision performance optimization

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation