[DISCUSS] Performance Improvements

**wraitii** · September 11, 2013

I think fixing the range manager will fix this definitely (the entity map patch)

jcantero · September 14, 2013

Excuse me for the question, I've read the discussion about Octrees/Quadratrees, and I'd like to ask: are you already using a Sweep&Prune* algorithm (or have you considered to use it) for the broad phase of collision/pathfinding/rangefinding? The Octree/Quadtree/k-d tree/... way is expensive, only worthwhile if you plan to filter a large number of objets (for example to reduce hundreds of thousands to thousands polygons in a 3D rendering engine).

Sweep&Prune only requires two sorted arrays (or vectors) of entities, one for each coordinate x and y, and keep those arrays sorted when the positions of the entities change. And this can be very efficient using the appropriate sorting algorithm. Especially if you treat the entities as a simple point on the map (you can also treat it as a AABB).

* http://en.wikipedia.org/wiki/Sweep_and_prune

**RedFox** · September 14, 2013

Hi jcantero. For the sort of problem we're having, it's really hard to find an ideal solution. After a week of hassling with this problem, I have ascertained for sure that the current system is definitely the wrong way to do it.

Let me explain the problem of RangeManager alone. We also have ObstructionManager that relies on the same Spatial subsystem, but it's not the focus of patch 1707.

RangeManager:

1) Entities have AABB's, we interpret these as approximated world-aligned BB's in SpatialSubdivision class.

2) Entities have RangeQueries - meaning they want to know about other entities in their LOS or Range.

3) RangeManager provides each RangeQuery Entity with a list of Added and Removed entities.

I think this is simplest way I can put it. RangeManager doesn't deal with collisions (that's up to ObstructionManager). It deals only with range based queries.

Herein lies the problem though - all of this when put into code:

Iterate over all RangeQueries: O(N).
- Perform Query:
  - SpatialSubdivision::GetInRange® -> M, gets entities in range: O(r/64 + 1) ~> C1
    - Sort and remove duplicate entities (buildings span several tiles): O(Mlog2M) + O(M)
    [*]Iterate over results: O(C2*M) where C2 is constant non-trivial time. It takes work to validate results.
    [*]Validated results are pushed to results R.
  [*]Find Added (entities that entered range) by calculating difference from Rold: O(2(Rold+Rnew)) -> A
  [*]Find Removed (entities that left range) by calculating difference from Rold: O(2(Rold+Rnew))
  [*]Sort Added by distance - O(C3*Alog2A), where C3 is a non-trivial constant. Each comparison is expensive due to our bad implementation of a fixed-point number library. Our CFixedVector2::CompareLength() function shows up very strong in performance profiler.

Since I'm not the author of the original code, it took me about a week to analyze all of this. The code is definitely non-trivial, for example we need to sort the items by distance before the end result of each iteration can be applied, because a lot of code relies on this distance-sorted list of new entities.

If we put all of the above together and do some very general approximations:

O(N) * [ O(C1*Mlog2M) + O(C2*M) + O(8R) + O(C3*Alog2A)]

Now, O(8R) can be considered rather trivial, the same actually applies to O(C3*Alog2A) unless we have a huge amount of Entities that enter range? That rarely happens. So if we remove those:

O(N) * [ O(C1*Mlog2M) + O(C2*M) ]

C1 and C2 are both non-trivial constants, so we have to account for them. C1 is the number of Subdivision tiles explored, which is usually 4 tiles in the actual game. However, if we assume that copying entity id's takes trivial amount of time, we could forget about C1. Though it's nice to know that traversing the SpatialSubdivision also takes some time.

C2 is the precision test we do to validate which Entities are in range. It's definitely non-trivial and should be accounted for.

Now if we take all of it together, the most important arrays are N and M. Queries and the Results:

O(N*Mlog2M) + O(N*M)

We know that M is always smaller than N, since M is a subset of N. However, there are N subsets. The only place we can put this algorithm is somewhere between O(n^2) and O(nlog2n).

So what's killing the performance here? The sorting we do in SpatialSubdivision::GetInRange® and the validation of those results. This is all under PerformQuery. We should be returning an already distance sorted list of entities + turn validation of those entities into something trivial.

Here's a comparison chart from bigocheatsheet.com:

jcantero · September 15, 2013

Impressive work Red Fox.

Let me point to the source code using Github mirror (GH let do that) to refer lines of code: https://github.com/0ad/0ad/blob/3a5c4008e95a7919bde5a46fdfb61a629d25658a/source/simulation2/helpers/Spatial.h#L201-L202

I've been researching if std::sort algorithm could impact the performance. According to this Wikipedia entry http://en.wikipedia.org/wiki/Sort_%28C%2B%2B%29, the worst case is O(n log n) (it's not QuickSort, but a hybrid algorithm to avoid QS worst-case scenario). This doesn't mean it can't be improved, but then I was thinking why it should be sorted in first place. OK, it's required in order to use later std:unique(). But why the need of remove duplicate entities, when an entity should not be in two cells (in two std:vectors) at the same time. Am I right? If it's true, then you could safely remove these 2 lines. Except that these results are expected sorted by the calling methods, of course.

Another solution is to use sets instead of vectors to avoid duplicates, but that only has sense if the std:set is more efficient (I don't know), and again if they don't need to be sorted later.

**RedFox** · September 15, 2013

Hi again jcantero.

All of your statements regarding what /should/ be done is correct, however the current implementation pretty much forces these silly constraints. Basically they're bugs and issues with the approach/algorithm itself. And to fix the issues, the original programmer just threw CPU cycles at the issue to solve it. Hence we have std::sort + std::unique.

Why do we need that? Because large buildings can exist in several tiles at the same time.... I know, sounds very silly, right? Well that's the only reason we need std::sort and std::unique.

I explore this issue in detail in my latest development report here: http://www.wildfiregames.com/forum/index.php?showtopic=17580entry274324

I would like to point out that, I really like the Sweep & Prune approach. I have a technique in mind that should solve the current issue and would automatically return distance sorted entities from the Sweep & Prune structure. Thanks for pointing out that method.

jcantero · September 15, 2013

First, thanks for your patience and time. And second, thanks also for your detailed reports, they really encourages to participate in the development of 0 A.D. I am really looking forward to see future improvements of performance in the simulation :-)

**wraitii** · February 24, 2014

Found a pretty comprehensive and well-written article about C++ good usage: http://www.agner.org/optimize/optimizing_cpp.pdf

It's slightly old, but most of the things I've read still apply, particularly about memory.

I've added it to the Wiki.

**Josh** · February 24, 2014

Found a pretty comprehensive and well-written article about C++ good usage: http://www.agner.org/optimize/optimizing_cpp.pdf
It's slightly old, but most of the things I've read still apply, particularly about memory.
I've added it to the Wiki.

It's really good! It is not actually that old, it refers to "today" as 2013.

**wraitii** · February 24, 2014

Some parts are older than others, I've found (several references to Pentium 4 processors, but it also mentions Sandy bridge in the next paragraph…)

Anyhow I'm about 70% through reading it and I'm finding it really amazing so far, particularly about optimizations that the compiler does on its own, which really highlights how clever modern compilers are.

edit: there also are some assembly guidelines later on which I haven't yet looked at but might be interesting for performance critical code.

agentx · March 8, 2014

I have two proposals regarding A* in terrain-analysis-pathfinder.js: Inside the main while loop, values should not get fetched via 'this.' because there is an existence test behind and the ternary (?

performs better than 'if/then' if branching is not intended. Estimated improvement >50%.

I haven't seen all maps, maybe a bi-directional A* would help too.

Radagast. · March 14, 2014

AgentX, you are becoming our JavaScript mastermind.

Still I would love to see the 1x1 pathfinder come that Philip worked on.

agentx · April 23, 2014

I'm testing asm.js with F30, here is a module it compiles, featuring 0 A.D. cos() implementation:

HANNIBAL = (function(H){  H.ASM = (function(stdlib, foreign, heap) {    "use asm";    var PI   = 3.141592653589793,        arr  = new stdlib.Int8Array(heap),        sqrt = stdlib.Math.sqrt,        abs  = stdlib.Math.abs;    function cos (a){      a = +a;            var b = 0.0, c = 0.0;      a = (a + PI) % (2.0*PI);      a = +abs((2.0*PI + a) % (2.0*PI) - PI);      b = (a-PI/2.0) + +abs(a-PI/2.0);      b = b / (b + +(1e-30)); // normalize b to one while avoiding divide by zero errors.      a = b * PI - a;       c = 1.0 - 2.0 * b; // sign of the output      return +(c * (1.0 - a*a*(0.5000000025619951 - a*a*(1.0/24.0 - a*a*(1.0/720.0 - a*a*(1.0/40320.0 - a*a*(1.0/3628800.0 - a*a/479001600.0)))))));    }    function sin (a){      a = +a;      return +(cos(a - PI/2.0));    }    function distance (a0, a1, b0, b1){      a0 = +a0;      a1 = +a1;      b0 = +b0;      b1 = +b1;            var dx = 0.0, dy = 0.0;                dx = +a0 - +b0;      dy = +a1 - +b1;      return +sqrt(dx * dx + dy * dy);    }    return {      cos: cos,      sin: sin,      distance: distance    };  }(window));return H; }(HANNIBAL));

From there one might call H.ASM.cos(1.234). I'll report back, when I got a speed test running in 0AD and get map data into this module.

scroogie · July 24, 2014

I did a new profile on r15552 using a replay of a 1 human, 3 petra bot match, 20713 turns.

Time spent in modules:

55.9% pyrogenesis

32.3% libmozjs (javascript)

6.3% libc

4.1% libstdc++

0.7% libnspr4

The previously found (minor) hotspots are gone, the PathFinder appears more prominently now, but there were also a lot more units in the game. Looking at the subtree analysis, CCmpPathfinder::ProcessShortRequests() takes 46.4% and CCmpPathfinder::ProcessLongRequests() takes 15.2% (of course relative to the 55.9% spent in pyrogenesis). The only other subtree that appears prominently is BroadcastMessages

Looking closer:

- in BroadcastMessages the cost is mainly caused by the RangeManager (GetNear and UpdateVisibilityData) and CCmpPosition::MoveAndTurnTo (itself due to posting messages)
- in ComputeShortPath, the most costly subtrees are the EdgeSort and the CheckVisibility functions (there apparently is a huge amount of edges to iterate over)

- in the long range pathfinder, CalculateHeuristic and DistanceToGoal call isqrt64 awfully lot

So actually, there is nothing new, except that the situation has improved imho. Apparently there are no low hanging fruit anymore. :/

I'll keep the results, in case someone is interested.

agentx · July 25, 2014

Would you mind to run a test comparing maps with different amount of trees and both little water? 1 human, 3 petra bot is good

scroogie · July 25, 2014

Which maps would you like to compare? Do you have specific ideads? To have "realistic" human behaviour in there, I played the game myself and used the sim_log for the profiling. You can also attach commands.txt and I run it through the profiler if you want.

thamlett · July 25, 2014

hmm...

agentx · July 26, 2014

I have the suspicion pathfinder lags are closely related to the amount of trees on a map. Here are my ten largest commands.txt https://dl.dropboxusercontent.com/u/354885/0ad/comms.txt.ax.zip I can extract the amount/density of trees from the maps, should show some correlation.

scroogie · July 26, 2014

Well, thinking about it, as it creates collision edges for all obstructions, I'm quiet certain its related to the amount of all obstructions on the map. Maybe Ykkrosh can explain how Trees affect it specifically.

agentx · July 26, 2014

I think trees are a special case of obstruction. They are smaller than a tile and probably only an issue if they come as a forest. And then it depends on their distance/density. Effectively they can turn a map into a complex labyrinth breaking any pathfinder. Some maps have thousands of trees and since pathfinder have to look at both sides of any obstruction this leads to a combinatorial explosion very quickly. Probably there should be tool and/or best practice guide for map maker. It is easy to create maps which are a challenge to any pathfinder. In that way the map is slow and not the pathfinder.

**Stan`** · July 26, 2014

IIRC In Age of mythology you couldn't go through too complex forests. Maybe defining trees as a group for pathfinder would be better ?

**wraitii** · July 26, 2014

AoM had very dense forests compared to us.

agentx · July 26, 2014

> Maybe defining trees as a group for pathfinder would be better ?

It would, but the amount of freedom for map maker in 0AD should remain, imho. I'd prefer the editor indicates problematic regions, in red, like the other helpful obstructions/passability overlays. So map makers can choose to design for efficient multiplayer maps or visual pleasing single player maps and players do not wonder why the 12,000 trees giant map isn't suitable for an 8 player combat with a Pentium class machine.

**wraitii** · July 26, 2014

Trees aren't really an issue with the updated long-range pathfinder.

Not that supporting Pentium is realistic.

**Stan`** · July 26, 2014

> Maybe defining trees as a group for pathfinder would be better ?
It would, but the amount of freedom for map maker in 0AD should remain, imho. I'd prefer the editor indicates problematic regions, in red, like the other helpful obstructions/passability overlays. So map makers can choose to design for efficient multiplayer maps or visual pleasing single player maps and players do not wonder why the 12,000 trees giant map isn't suitable for an 8 player combat with a Pentium class machine.

I meant not on the graphical side but on the code side. Detecting small obstacles and making a group, like if it was a big box.

agentx · July 26, 2014

> Detecting small obstacles and making a group, like if it was a big box.

Got it. Problem is trees are dynamic, too many boxes to update. Otherwise yes.

@wraitii: the long range pathfinder dealt with trees? What do you mean by not an issue? Facts would help to understand.

[DISCUSS] Performance Improvements

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

RedFox

agentx

Yves

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation