Jump to content

[DISCUSS] Performance Improvements


Recommended Posts

Excuse me for the question, I've read the discussion about Octrees/Quadratrees, and I'd like to ask: are you already using a Sweep&Prune* algorithm (or have you considered to use it) for the broad phase of collision/pathfinding/rangefinding? The Octree/Quadtree/k-d tree/... way is expensive, only worthwhile if you plan to filter a large number of objets (for example to reduce hundreds of thousands to thousands polygons in a 3D rendering engine).

Sweep&Prune only requires two sorted arrays (or vectors) of entities, one for each coordinate x and y, and keep those arrays sorted when the positions of the entities change. And this can be very efficient using the appropriate sorting algorithm. Especially if you treat the entities as a simple point on the map (you can also treat it as a AABB).

* http://en.wikipedia.org/wiki/Sweep_and_prune

Link to comment
Share on other sites

Hi jcantero. For the sort of problem we're having, it's really hard to find an ideal solution. After a week of hassling with this problem, I have ascertained for sure that the current system is definitely the wrong way to do it.

Let me explain the problem of RangeManager alone. We also have ObstructionManager that relies on the same Spatial subsystem, but it's not the focus of patch 1707.

RangeManager:

1) Entities have AABB's, we interpret these as approximated world-aligned BB's in SpatialSubdivision class.

2) Entities have RangeQueries - meaning they want to know about other entities in their LOS or Range.

3) RangeManager provides each RangeQuery Entity with a list of Added and Removed entities.

I think this is simplest way I can put it. RangeManager doesn't deal with collisions (that's up to ObstructionManager). It deals only with range based queries.

Herein lies the problem though - all of this when put into code:

  • Iterate over all RangeQueries: O(N).
    • Perform Query:
      • SpatialSubdivision::GetInRange® -> M, gets entities in range: O(r/64 + 1) ~> C1
        • Sort and remove duplicate entities (buildings span several tiles): O(Mlog2M) + O(M)

        [*]Iterate over results: O(C2*M) where C2 is constant non-trivial time. It takes work to validate results.

        [*]Validated results are pushed to results R.

      [*]Find Added (entities that entered range) by calculating difference from Rold: O(2(Rold+Rnew)) -> A

      [*]Find Removed (entities that left range) by calculating difference from Rold: O(2(Rold+Rnew))

      [*]Sort Added by distance - O(C3*Alog2A), where C3 is a non-trivial constant. Each comparison is expensive due to our bad implementation of a fixed-point number library. Our CFixedVector2::CompareLength() function shows up very strong in performance profiler.

Since I'm not the author of the original code, it took me about a week to analyze all of this. The code is definitely non-trivial, for example we need to sort the items by distance before the end result of each iteration can be applied, because a lot of code relies on this distance-sorted list of new entities.

If we put all of the above together and do some very general approximations:

O(N) * [ O(C1*Mlog2M) + O(C2*M) + O(8R) + O(C3*Alog2A)]

Now, O(8R) can be considered rather trivial, the same actually applies to O(C3*Alog2A) unless we have a huge amount of Entities that enter range? That rarely happens. So if we remove those:

O(N) * [ O(C1*Mlog2M) + O(C2*M) ]

C1 and C2 are both non-trivial constants, so we have to account for them. C1 is the number of Subdivision tiles explored, which is usually 4 tiles in the actual game. However, if we assume that copying entity id's takes trivial amount of time, we could forget about C1. Though it's nice to know that traversing the SpatialSubdivision also takes some time.

C2 is the precision test we do to validate which Entities are in range. It's definitely non-trivial and should be accounted for.

Now if we take all of it together, the most important arrays are N and M. Queries and the Results:

O(N*Mlog2M) + O(N*M)

We know that M is always smaller than N, since M is a subset of N. However, there are N subsets. The only place we can put this algorithm is somewhere between O(n^2) and O(nlog2n).

So what's killing the performance here? The sorting we do in SpatialSubdivision::GetInRange® and the validation of those results. This is all under PerformQuery. We should be returning an already distance sorted list of entities + turn validation of those entities into something trivial.

Here's a comparison chart from bigocheatsheet.com:

big-o-complexity.png

Link to comment
Share on other sites

Impressive work Red Fox.

Let me point to the source code using Github mirror (GH let do that) to refer lines of code: https://github.com/0ad/0ad/blob/3a5c4008e95a7919bde5a46fdfb61a629d25658a/source/simulation2/helpers/Spatial.h#L201-L202

I've been researching if std::sort algorithm could impact the performance. According to this Wikipedia entry http://en.wikipedia.org/wiki/Sort_%28C%2B%2B%29, the worst case is O(n log n) (it's not QuickSort, but a hybrid algorithm to avoid QS worst-case scenario). This doesn't mean it can't be improved, but then I was thinking why it should be sorted in first place. OK, it's required in order to use later std:unique(). But why the need of remove duplicate entities, when an entity should not be in two cells (in two std:vectors) at the same time. Am I right? If it's true, then you could safely remove these 2 lines. Except that these results are expected sorted by the calling methods, of course.

Another solution is to use sets instead of vectors to avoid duplicates, but that only has sense if the std:set is more efficient (I don't know), and again if they don't need to be sorted later.

Link to comment
Share on other sites

Hi again jcantero.

All of your statements regarding what /should/ be done is correct, however the current implementation pretty much forces these silly constraints. Basically they're bugs and issues with the approach/algorithm itself. And to fix the issues, the original programmer just threw CPU cycles at the issue to solve it. Hence we have std::sort + std::unique.

Why do we need that? Because large buildings can exist in several tiles at the same time.... I know, sounds very silly, right? Well that's the only reason we need std::sort and std::unique.

I explore this issue in detail in my latest development report here: http://www.wildfiregames.com/forum/index.php?showtopic=17580entry274324

I would like to point out that, I really like the Sweep & Prune approach. I have a technique in mind that should solve the current issue and would automatically return distance sorted entities from the Sweep & Prune structure. Thanks for pointing out that method.

Link to comment
Share on other sites

  • 5 months later...

Some parts are older than others, I've found (several references to Pentium 4 processors, but it also mentions Sandy bridge in the next paragraph…)

Anyhow I'm about 70% through reading it and I'm finding it really amazing so far, particularly about optimizations that the compiler does on its own, which really highlights how clever modern compilers are.

edit: there also are some assembly guidelines later on which I haven't yet looked at but might be interesting for performance critical code.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...
I have two proposals regarding A* in terrain-analysis-pathfinder.js: Inside the main while loop, values should not get fetched via 'this.' because there is an existence test behind and the ternary (? :) performs better than 'if/then' if branching is not intended. Estimated improvement >50%.


I haven't seen all maps, maybe a bi-directional A* would help too.


  • Like 3
Link to comment
Share on other sites

  • 1 month later...
I'm testing asm.js with F30, here is a module it compiles, featuring 0 A.D. cos() implementation:
HANNIBAL = (function(H){  H.ASM = (function(stdlib, foreign, heap) {    "use asm";    var PI   = 3.141592653589793,        arr  = new stdlib.Int8Array(heap),        sqrt = stdlib.Math.sqrt,        abs  = stdlib.Math.abs;    function cos (a){      a = +a;            var b = 0.0, c = 0.0;      a = (a + PI) % (2.0*PI);      a = +abs((2.0*PI + a) % (2.0*PI) - PI);      b = (a-PI/2.0) + +abs(a-PI/2.0);      b = b / (b + +(1e-30)); // normalize b to one while avoiding divide by zero errors.      a = b * PI - a;       c = 1.0 - 2.0 * b; // sign of the output      return +(c * (1.0 - a*a*(0.5000000025619951 - a*a*(1.0/24.0 - a*a*(1.0/720.0 - a*a*(1.0/40320.0 - a*a*(1.0/3628800.0 - a*a/479001600.0)))))));    }    function sin (a){      a = +a;      return +(cos(a - PI/2.0));    }    function distance (a0, a1, b0, b1){      a0 = +a0;      a1 = +a1;      b0 = +b0;      b1 = +b1;            var dx = 0.0, dy = 0.0;                dx = +a0 - +b0;      dy = +a1 - +b1;      return +sqrt(dx * dx + dy * dy);    }    return {      cos: cos,      sin: sin,      distance: distance    };  }(window));return H; }(HANNIBAL));

From there one might call H.ASM.cos(1.234). I'll report back, when I got a speed test running in 0AD and get map data into this module.

  • Like 2
Link to comment
Share on other sites

  • 3 months later...

I did a new profile on r15552 using a replay of a 1 human, 3 petra bot match, 20713 turns.

Time spent in modules:

55.9% pyrogenesis

32.3% libmozjs (javascript)

6.3% libc

4.1% libstdc++

0.7% libnspr4

The previously found (minor) hotspots are gone, the PathFinder appears more prominently now, but there were also a lot more units in the game. Looking at the subtree analysis, CCmpPathfinder::ProcessShortRequests() takes 46.4% and CCmpPathfinder::ProcessLongRequests() takes 15.2% (of course relative to the 55.9% spent in pyrogenesis). The only other subtree that appears prominently is BroadcastMessages

Looking closer:

- in BroadcastMessages the cost is mainly caused by the RangeManager (GetNear and UpdateVisibilityData) and CCmpPosition::MoveAndTurnTo (itself due to posting messages)
- in ComputeShortPath, the most costly subtrees are the EdgeSort and the CheckVisibility functions (there apparently is a huge amount of edges to iterate over)

- in the long range pathfinder, CalculateHeuristic and DistanceToGoal call isqrt64 awfully lot

So actually, there is nothing new, except that the situation has improved imho. Apparently there are no low hanging fruit anymore. :/

I'll keep the results, in case someone is interested.

Link to comment
Share on other sites

Which maps would you like to compare? Do you have specific ideads? To have "realistic" human behaviour in there, I played the game myself and used the sim_log for the profiling. You can also attach commands.txt and I run it through the profiler if you want.

Link to comment
Share on other sites

I think trees are a special case of obstruction. They are smaller than a tile and probably only an issue if they come as a forest. And then it depends on their distance/density. Effectively they can turn a map into a complex labyrinth breaking any pathfinder. Some maps have thousands of trees and since pathfinder have to look at both sides of any obstruction this leads to a combinatorial explosion very quickly. Probably there should be tool and/or best practice guide for map maker. It is easy to create maps which are a challenge to any pathfinder. In that way the map is slow and not the pathfinder.

Link to comment
Share on other sites

> Maybe defining trees as a group for pathfinder would be better ?

It would, but the amount of freedom for map maker in 0AD should remain, imho. I'd prefer the editor indicates problematic regions, in red, like the other helpful obstructions/passability overlays. So map makers can choose to design for efficient multiplayer maps or visual pleasing single player maps and players do not wonder why the 12,000 trees giant map isn't suitable for an 8 player combat with a Pentium class machine.

Link to comment
Share on other sites

> Maybe defining trees as a group for pathfinder would be better ?

It would, but the amount of freedom for map maker in 0AD should remain, imho. I'd prefer the editor indicates problematic regions, in red, like the other helpful obstructions/passability overlays. So map makers can choose to design for efficient multiplayer maps or visual pleasing single player maps and players do not wonder why the 12,000 trees giant map isn't suitable for an 8 player combat with a Pentium class machine.

I meant not on the graphical side but on the code side. Detecting small obstacles and making a group, like if it was a big box.

Link to comment
Share on other sites

> Detecting small obstacles and making a group, like if it was a big box.

Got it. Problem is trees are dynamic, too many boxes to update. Otherwise yes.

@wraitii: the long range pathfinder dealt with trees? What do you mean by not an issue? Facts would help to understand.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...