Paid Development: September 2013 Activity Status

**RedFox** · September 8, 2013

I think I might have overworked myself last week. Pulling 88 hour weeks isn't really healthy and has left me wanting for some free time. So the fifth week shines in its modest 32 hours.

Week #5 02.09 - 08.09

02.09 10-22 (12) - Performance profiling data, fancy graphs, screenshots, performance fixes.

03.09 17-20 (03) - Clipboard code on linux/osx fixed. Bugfixes.

04.09 18-20 (02) - Testing megapatch. ModelRenderer optimization.

05.09 23-06 (07) - ModelRenderer glSwapBuffers fixed. GUI occlusion culling.

06.09 13-17 (04) - GUI culling finished and tested. Fixing reported bugs.

07.09 17-19 (02) - Fixed bugs reported by historicbruno.

1. Performance Profiling Data

You probably noticed the thread with fancy profiling graphs and comparisons. It actually took me quite a while to get all the data, analyze it, and fix performance issues as they came along.

All in all, this testing cycle actually fixed some performance issues in Megapatch, so I consider it a significant improvement - especially in the GUI.

You can check the thread HERE. I won't be copy pasting all that.

2. Clipboard and UTF8 Bugfixes.

Mostly UTF8 support for API's on linux and OSX that previously worked in UCS-2.

3. ModelRenderer optimization

A while back I was forced to revert all my efforts on ModelRenderer. Now I've brought all those optimizations back, so the rendering is noticeably faster. I was also able to fix glSwapBuffers slowdown. Apparently because a bug caused models to be incorrectly sorted/batched, the GPU had to spend a lot more effort into rendering the scene.

The slowdown in glSwapBuffers was basically the CPU waiting on the GPU to finish rendering into the backbuffer.

Since we're already doing a combination of alpha blend and alpha testing for all our transparent models, we don't actually have to do any distance sorting of the models. This actually sped up the renderer a lot!

Before: Transparent models were sorted by distance. This made the code very complex.

After: No more sorting is done, everything is handled by the GPU.

4. GUI culling

This is an important fix for the lobby - in order to have acceptable framerate we have to do some additional culling of objects on-screen objects. This means we skip rendering and ticking GUI pages that are active but hidden from view.

This gives a really good performance increase for the lobby where we have thousands of lines of text to render - the previous rendering algorithm was very, very inefficient in such cases. The performance would gradually drop over time until everything became unplayable.

Before: 200 FPS that eventually degraded into 20 FPS or less in Match Setup screen.

After: Good and consistent 500 FPS in Match Setup screen.

5. Bugfixes

Historicbruno did some review on megapatch and found quite a lot of critical issues. I spent a few hours fixing those to ensure the game doesn't crash when switching to Atlas from in-game menu.

--------------

This is it for week #5 - the next one will be a lot bigger.

**fabio** · September 8, 2013

Thanks! What's the status of the patch, how far is to be in a committable status? It looks that every new feature added have some issues, making the merging more difficult. And this is just with a very limited amount of testers, exposing such a big patch in a release will probably reveal a whole lot of problems as it get used by users with many different systems and playing behaviors. Would be possible to stop adding features in the same patch and just concentrate on bug fixing? Even better some stable features could have a separate patch and eventually committed separately.

**wraitii** · September 8, 2013

In the interest of releasing A15 quickly, it would probably be good to commit this as soon as possible so it can be debugged. As far as I know, we've reached a fairly acceptable level of "it works" on most compilers/systems in the early testers.

Though I think it still needs some UTF8 support and stuffs?

**historic_bruno** · September 8, 2013

In the interest of releasing A15 quickly, it would probably be good to commit this as soon as possible so it can be debugged. As far as I know, we've reached a fairly acceptable level of "it works" on most compilers/systems in the early testers.
Though I think it still needs some UTF8 support and stuffs?

Until the patch is updated, I can't say what is fixed, but there were some serious release blockers found only a few days ago.

Philip has updated his gerrit server (http://86.137.32.221:8080/1) with the latest megapatch, I would suggest giving Philip some time to review it (because he is most likely to find errors that aren't trivially reproducible from e.g. compiling and loading a game for a few minutes), and that we review it as much as possible as a team before it's committed - so far I think only Philip and I have commented on it. It's actually a nice system for code review and much easier than dealing with such a huge patch on Trac or reviewing after it's committed.

**RedFox** · September 15, 2013

I started this week by debugging Megapatch and its UTF8 changes. However, since I wanted to really improve performance for A15, I decided to pick a bottleneck in the current code to give a noticeable algorithmic boost that would make the game, well - playable. It looked like RangeManager was an easy enough place to start...

Oh how wrong I was. The amount of research, debugging and despair in this week is hard to put into words. I was all up there with significant improvements and I also visited the deepest pits of despair when it all failed to deliver. It was a rollercoaster of fail that I will definitely remember as a lesson well learned - the hard way.

Enough with poetics, lets get down to business: I spent a lot of effort in performance profiling and research of spatial subdivision methods this week. The bad news: I failed to produce any noticeable results. The good news: I have ideas how to make it work. I spent 63 hours in total, so I definitely think it's not completely wasted.

Week #6 09.09 - 15.09

09.09 14-22 (08) - Bugfixes on megapatch UTF8 changes. Console debugging finished!

10.09 08-23 (15) - HPA* research. RangeManager and SpatialSubdivision optimization.

11.09 08-19 (11) - RangeManager optimization and testing

12.09 00-07 (07) - RangeManager testing.

13.09 20-06 (10) - RangeManager debugging. Profiling RangeManager bottlenecks.

14.09 20-03 (07) - Disappointment. Research, Sweep and Prune?, Analysis of RangeManager algorithm.

15.09 16-21 (05) - Weekly report. Megapatch.

1. Bugfixes and progress on Megapatch.

What's the status: UTF8 transition is almost complete. Only debugging remains.

I started the week by fixing critical issues reported by historicbruno. Finished a rewrite of the console code so that it works with UTF8 proper. Next step will be to make GUI CInput support UTF8, which will take a day for sure.

2. RangeManager and SpatialSubdivision optimization.

What's RangeManager?: This is the module that manages range queries. This is important for UnitAI so that archers and infantry can engage enemies that enter their line of sight.

What's SpatialSubdivision?: This is our old naive attempt at reducing the number of range comparisons. It's not very efficient.

What's the issue?: RangeManager has a huge bottleneck in ExecuteActiveQueries function. The time is mostly spent extracting sorted entity lists from SpatialSubdivision and then comparing the lengths.

So this issue, even though it sounds sort of trivial, is not trivial at all. There are a hundred ways to do this and most of those really bad. Unfortunately, we have one of those bad ones. In one of the many optimization threads I posted a through analysis of the whole algorithm. You can view the relevant post HERE.

We can also see that the Big-Oh notation is pretty useless to describe our folly actually, since operations that should be marked as O(1) are definitely non-trivial and have a constant multiplier to the amount of time spent.

With the rough estimate however, I concluded that our algorithm is somewhere between O(n^2) and O(nlog2n). Both of those are pretty bad cases, because our performance with SpatialSubdivsion should be less than O(nlog2n) if we wish for acceptable performance.

So, to get the actual idea of what's happening behind the scene, I decided to look for:

1) How many QUERIES per turn do we have?

2) How many MATCHES are retrived from SpatialSubdivision for filtering?

3) How many RESULTS we actually get after filtering?

Acropolis 1 IDLE


EAQ Queries: 209
EQ  Matches: 7947
EAQ Filtered Results: 1

We can see that we spend most of the time doing filtering on Matches. For every QUERY we get about 40 MATCHES for filtering. So in Acropolis, the algorithm looks more like O(40n*c1), where C1 is the constant time it takes to compare each result. We can't reduce N, however we can reduce possible MATCHES retrieved from SpatialSubdivision and we can also speed up C1 to make each comparison cheaper.

Combat Demo (Huge) IDLE



EAQ Queries: 2528
EQ  Matches: 685496 <<< !!!
EAQ Filtered Results: 0

Oh boy. It's scary how powerful CPU's are nowadays that we've never noticed the whopping 0.7 million distance comparisons done in Huge combat demo. It's obvious we're returning far too many "positive" results from SpatialSubdivision. We really need to reduce that number any way we can. If the algorithm was O(nlog2n), we would have ~ 28575 matches. If the algorithm was O(n^2), we would have ~ 6.3m. In reality there is no way to measure this with Big-Oh notation.

Combat Demo (Huge) COMBAT



EAQ Queries: 1676
EQ  Matches: 415393
EAQ Filtered Results: 11421 <<< !!!

Once the combat has been going for a while, the actual number of possible MATCHES is reduced - however, the filtered results are the second bottleneck in the algorithm and this part is much more expensive than filtering all the matches.

So, what have we learned? We have to:

1) Reduce the number of possible MATCHES.

2) Speed up distance filtering [CFixedVector2::CompareLength()].

3) Speed up processing of RESULTS.

-------

I've learned a lot from this and it's much easier to approach the problem now that I have actual relevant data instead of bogus Big-Oh graphs that don't really help at all. Oh how low CS has fallen.

3. RangeManager patch debugging. Disappointment.

What's the issue?: You were on the right path, right?: Well, sort of yes and no on this one. I was focusing on reducing allocations in general, however there were flaws in the optimizations and after correcting them, the results are less impressive.

So the first graph, (thanks to historicbruno), showed that we're looking at some pretty good performance improvements in general. It looked as if it would make RangeManager ~5x faster:

-----

However, this ended up being false. The patch caused an issue in ObstructionManager, which actually reduced performance of the other module! After fixing the bugs, the actual gain is not that impressive. Before idle: 17ms, After idle: 12ms. So barely any improvement at all!

Here's the graph by historicbruno, you can notice that even though the patch speeds up one module (red), the overall performance goes worse (green):

I was really devastated at the failure - we spent the whole week furiously debugging this with historicbruno and we ended up emptyhanded. I guess this is a lesson to show that optimization is not something trivial. It takes a huge amount of effort to optimize bad algorithms without breaking the existing logic.

My strategy was to avoid excessive sorting by returning MATCHES from SpatialSubdivision as unsorted blobs that were guaranteed to have unique handles. However, this immediately broke ObstructionManager, which relied on sorted and unique results. It also broke games with very large buildings like a Wonder - making units unable to attack it.

The performance gain was noticeable, but the subtle issues around the engine ensured that I couldn't have my way.

4. Analysis of RangeManager algorithm.

If you missed the link in the above wall of text, you can take a look at the attempt of algorithmic analysis of RangeManager EAQ. It really shows how bad Big-Oh is for actual performance profiling, since it doesn't really help us at all.

http://www.wildfireg...120#entry274300

5. Sweep and Prune.

In the Performance Optimizations thread, jcantero suggested using Sweep and Prune algorithm. It sounds actually really useful if we implement it properly. Right now I'm thinking each entry in the two arrays would have inner bounds for entity Bounding Box and outer bounds for entity range.

In order to avoid iterating constantly, I also plan on using a lookup entry which stores the last known approximate index of the object in the array. If the structure has changed, it would validate the index, which is very easy, since usually objects move very little.

Furthermore, this method would always return unique entity id's that are correctly distance sorted! Isn't this perfect? Even if this algorithm had O(nlog2n), it would be ridiculously fast because of its simplicity and all the extra work we're not doing!

Here's an article on Sweep & Prune: http://jitter-physic...sweep-and-prune

Expect to hear back about Sweep and Prune soon. And kudos to jcantero who suggested it.

----

This is it for now! I hope you liked this update, even though it's a testament to my week full of failures, we still learned a lot. Especially the fact that optimizing existing "finished" code is not easy and not trivial. In the future we should put 2 extra days of effort into our modules, to avoid weeks of optimization / debugging a year later.

**niektb** · September 16, 2013

Too bad the optimization didn't give any noticeable results, but still at least we have knowledge of the problems in the RangeManager. :bye:

Lion.Kanzen · September 16, 2013

In Svn is working after September 13 is hardly work. Congratulations Redfox

**Pureon** · September 16, 2013

Thanks for another detailed report Redfox. Some good news, and some bad, but you're still making progress!

**FeXoR** · September 16, 2013

Thanks for your hard work. I especially agree with you on:

[...] I hope you liked this update, even though it's a testament to my week full of failures, we still learned a lot. Especially the fact that optimizing existing "finished" code is not easy and not trivial. In the future we should put 2 extra days of effort into our modules, to avoid weeks of optimization / debugging a year later.

Keep it going (y) .

scroogie · September 16, 2013

I don't think that using a simple Sweep is a real answer to everything. It can be faster, but so can KD-Tree, RB-trees, QuadTrees, etc. pp. I liked the idea of kuranes, to first fix an abstract API of spatial partitioning, and then implement different techniques to see how they fare against each other in different situations.

jcantero · September 16, 2013

It can be faster, but so can KD-Tree, RB-trees, QuadTrees, etc.

A sidenote: the problem with those structures is that they don't help with the range objects computation, only with collision detection. If an entity has another 100 entities in line of sight, the 100 entities must be reported by the algorithm, whatever it is. In the case of tree-based partitions, you have the 100 entities separated in groups, but ultimately you have to add up these groups to build the 100 elements list to return it. There is no gain (the upper-level algorithm has to do the n*(n-1)/2 comparations yet) and now there is also the overhead of maintain the tree structure (balanced, or whichever the algorithm demands). So whatever solution would be, it can't be a spartial partition, that only serves to maintain an adjusted dinamic tiling to reduce collision checks below certain level.

scroogie · September 16, 2013

With that argument, you don't need any algorithm at all. It all depends on the situation. Sometimes its about the need to sort or not.

jcantero · September 16, 2013

Obviously, a broad phase is required, otherwise the algorithm should compare every entity in the map against each other. Now this broad phase is based on a fixed size grid (implemented with a std:vector). The question that remains is if another type of partition could accelerate the process, and I'm afraid that that doesn't work, not when the algorithm needs to resolve a "fixed" area, making the adaptative partition useless in practice. However, in the collision detection scenario it works (or it should work).

Sweep&Prune is a broad phase that takes a different aproach to the problem: it exploits the algorithmic advantages of sortered sequences (binary search in O(log n), etc). And the sorting criteria is precisely the spatial position of the objects. It doesn't divide the space in "areas" (squares, cubes, ...) so this is not an issue.

Edited September 16, 2013 by jcantero

scroogie · September 17, 2013

Yes, I know, but what does this change? A sweep is basically the most trivial algorithm one can come up with. Its just traversing all items in the way you keep them. The more interesting question is how to get them to this state, and thats where other algorithms get into the equation. Do you always sort all items, or do you keep them in groups, etc. Im afraid thats what all algorithmic geometry is about.

**RedFox** · September 17, 2013

A little preview for non-believers, I reckon. The idea is to drop the use of simple entity_id_t handles that we used in the past in the SpatialSubdivision class, where we had to do some searching for the entities own grid every time we did a lookup, which was kind of silly.

Also, a huge downside of using Spatial Subdivision grid, was with buildings - a building can take several tiles, which complicated the range query algorithm. We had to sort all the results and then call std::unique on the results to remove the duplicate building id's. Worst of all - the Spatial Subdivision grid class provided no actual distance data for us, which is bad.

With Sweep And Prune subdivision, the entities are sorted by their position in the world - perfect right? Sorting itself is very fast because entities never move too much and usually you just have to swap entities in the list to keep the axises sorted.

Furthermore, the biggest benefit is that SAP will only give us "perfect" results, so we won't have to do any additional filtering like we do now. That filtering is really expensive with our current Entity-Component System. Since we already have the position data and items are returned sorted closest-to-furthest, we won't have to do any additional work.

For standard collision schemes, SAP would perhaps not be the best choice, but for this concrete problem with Object size + Object range, SAP is pretty much perfect.

Here's the current prototype I'm working on:


/**
 * A simulated Entity object for this demo alone. The pyrogenesis Entity is completely different,
 * but its semantics remain the same. Entities have position, size and range.
 */
struct Entity 
{
    Vector2 pos;
    Vector2 size;
    float range;
    inline Entity(const Vector2& pos) : pos(pos), size(10.0f), range(10.0f)
    {
    }
};

/**
 * SAPDivision entity handle
 * Contains approximate X/Z axis index for fast random access and
 * the object's bounds data
 */
struct SAPEntity
{
    int approxX;        // approximate index in XAxis array (autocorrected)
    int approxZ;        // approximate index in ZAxis array (autocorrected)
    Entity* entity;     // pointer to the entity object with actual data

    // @note Objects are sorted by "xMin" and "zMin" in the Axis arrays
    float xMin, xMax;   // ------| object |------> axis : the object BOUNDS
    float zMin, zMax;    
};

/**
 * A range query structure for fast queries
 */
struct SAPRangeQuery
{
    enum { MAX_COUNT = 1024 };
    int count;
    SAPEntity* items[MAX_COUNT];

    inline SAPRangeQuery() : count(0) {}
    inline const SAPEntity* operator[](int index) const { return items[index]; }
    inline int size() const { return count; }
    inline void clear() { count = 0; }
};

/**
 * Sweep and Prune spatial subdivision - Great for rangefinding.
 */
class SAPDivision
{
    std::vector<SAPEntity*> m_XAxis;
    std::vector<SAPEntity*> m_YAxis;
    std::vector<SAPEntity*> m_FreeHandles; // reuse handles if possible

public:
    SAPDivision();

    /**
     * @brief Inserts a new entity into the subdivision map
     * @param entity Entity to insert
     */
    SAPEntity* Insert(Entity* entity);

    /**
     * @brief Removes an entity from the subdivision map.
     * @param handle SAPEntity handle to erase.
     */
    void Remove(SAPEntity* handle);

    /**
     * @brief Moves an entity by the given amount. Very cheap - simple shift in the Axis.
     * @note For small moves the entity is shifted in the SAP axises. Large moves perform Remove+Insert.
     * @param handle SAPEntity handle to move.
     * @param pos The new position where to move the entity.
     */
    void Move(SAPEntity* handle, Vector2 pos);

    /**
     * @brief Get entities in range of a BOUNDING BOX.
     * @param inrange [out] The resulting array of entities in range. Sorted from nearest to furthest.
     * @param min Bounding box minimum.
     * @param max Bounding box maximum.
     */
    void GetInRangeBB(SAPRangeQuery& inrange, Vector2 min, Vector2 max);

    /**
     * @brief Get entities in range of a RADII.
     * @param inrange [out] The resulting array of entities in range. Sorted from nearest to furthest.
     * @param pos Center of the range query.
     * @param range RADII of the range query.
     */
    void GetInRange(SAPRangeQuery& inrange, Vector2 pos, float range);
};

**RedFox** · September 24, 2013

So the previous week was actually really tough - I spent pretty much all the time working on implementing Sweep and Prune approach and testing all sorts of configurations. There were some obvious upsides to the algorithm at first, but unfortunately it has a worst case scenario that almost always plays out in our game engine. So my final through opinion is that we can't use it for the sort of queries we're doing.

So would I vote this week of 53 hours as wasted? Well, I could have used all this time for something else that didn't end up with no results.

Week #7 16.09 - 22.09

16.09 08-24 (16) - SAP subdivision implementation draft.

17.09 08-13 (05) - SAP subdivision, small Megapatch fixes.

18.09 11-23 (12) - SAP subdivision.

19.09 22-05 (07) - SAP testing and debugging.

20.09 16-21 (05) - SAP performance testing and results.

22.09 21-07 (10) - SAP implementation in Pyrogenesis and results. Parallel For.

1. Sweep and Prune implementation.

So, I spent pretty much the whole week working on creating a testbed for Sweep and Prune. The first results came in the middle of the week when I could measure its performance. It looked pretty good at first - slightly outperforming the current range implementation.

I think the best part of this implementation was its bounding-box precision that always gave near-perfect results. I think it was also nice how we could rely on the results being in sorted order and also got the squared distance in the result.

However, since our current simulation2 design requires random-access requests (get all entities in range of Entity123, etc..), it made moving along the axis a really expensive task.

The worst case scenario for SAP is when a large bunch of entities is stacked in X axis - and believe me, this happens a lot in 0AD. This causes a lot of iteration on the axis and discarding mismatched entries. Furthermore, it's the reason moving entities in the axis is really expensive.

In a sparse setup, when entities were not obviously aligned in X axis, the results were perfect 2000 entities in 0.97ms. However, when I arranged the entities closely in a large square formation, it became 19ms - which was unacceptable.

2. Sweep and Prune in Pyrogenesis

The last thing I needed to do was to implement it in our engine, which was a pretty trivial task. Initially, even with large formations SAP performed much better than our old subdivision - 3ms vs 12ms.

However, something I could not have anticipated was moving entities becoming a really expensive task. I kindly neglected the fact that when entities are stacked in X axis, the desired small movement of only 1 position in the axis, becomes an expensive case of shifting 100 entity handles instead.

So when entities started moving around - performance hit rock bottom. Clearly a failure. We can't implement an efficient random-access RangeManager with Sweep and Prune.

I will now kindly bin Sweep and Prune into the bucket of algorithms that are useful in many cases, yet useless in our case.

3. Parallel For

What is parallel for?: For many of you non-programmers, it's a very robust multi-threading approach of using several threads to iterate over some range of items. Usually you would do a fair bit of work in every loop iteration.

In our case, Parallel For is something we've been wanting for quite some time. There is barely any multithreading going on in 0AD, so modern 8-thread laptops (like mine), suffer from horrible single-core performance.

After implementing a very robust solution with posix pthreads, the speedup with 4 threads was ironically ~3-4x, which is the same amount of gain SAP would have given - however in this case it's consistent.

You can take a look at the patch in the RangeManager optimization ticket, I think Parallel For is a really great step for the codebase to really squeeze out multi-threading performance of modern systems.

http://trac.wildfire...com/ticket/1707

-------

This is it for now. I am now convinced that we do need a really, really complicated solution for RangeManager that only gives us perfect results. However, I won't be jumping into RangeManager spatial issues any longer, since A15 is nearing up and we need to finish up megapatch optimizations.

jcantero · September 24, 2013

However, something I could not have anticipated was moving entities becoming a really expensive task. I kindly neglected the fact that when entities are stacked in X axis, the desired small movement of only 1 position in the axis, becomes an expensive case of shifting 100 entity handles instead.

Is that code available somewhere yet?

**RedFox** · September 24, 2013

Is that code available somewhere yet?

I only have my personal prototype build (since I ditched the pyrogenesis port after it failed to give any desirable results). The code doesn't really explain much of the problems, but trying to implement random-access search with SAP was pretty much the reason it failed (since SAP is supposed to be sweeping, not random accessing).

However, there is no way around this with the current design of the simulation2 engine in 0AD - we need random access.

SAPDivision.h: http://pastebin.com/5Ld0QUA6

SAPDivision.cpp: http://pastebin.com/mL211s2k

**niektb** · September 25, 2013

Is SAP handled by the CPU / or the GPU?

Ik have read an article about handling SAP with the GPU with good results:

http://graphics.ewha.ac.kr/gSaP/

Edited September 25, 2013 by niektb

jcantero · September 25, 2013

SAPDivision.h: http://pastebin.com/5Ld0QUA6
SAPDivision.cpp: http://pastebin.com/mL211s2k

Thank you again, RedFox. I do not want to spend another minute of your valuable time with this, but I really want to know what's going on.

I owe you a few dozen programming hours. I'll see how I can return them.

scroogie · September 25, 2013

Hi RedFox,

this is what I meant with the cost of getting items in exactly the state you need for SP, but so be it. I'm quite sure your work was still valuable! For example your testbed implementations. Could that code be used by others that want to experiment with the RangeManager? E.g. did you isolate parts so that debugging or performance measurement is easier? I guess that would already help very much.

Cheers

**RedFox** · September 25, 2013

Hi RedFox,
this is what I meant with the cost of getting items in exactly the state you need for SP, but so be it. I'm quite sure your work was still valuable! For example your testbed implementations. Could that code be used by others that want to experiment with the RangeManager? E.g. did you isolate parts so that debugging or performance measurement is easier? I guess that would already help very much.
Cheers

The testbed used was my own private OpenGL engine and doesn't have anything special other than an interactive demo for debugging / testing.

I'm quite sure that my SAP implementation isn't completely optimized in the algorithmic sense. The most expensive parts are moving elements in the axis, which forces us to update the random-access indices. I thought I could make it work, but SAP isn't meant for random access. Even the GPU + SAP demo sweeps the entire set - which is very efficient, but totally different from what we need.

Now, I'm not saying our implementation is perfect - far from it actually. Because of existing code, we are forced into this random-access range detection. Furthermore, SAP is good for collision detection, but doesn't look like very good for range detection. I mean I somehow made it work for ranges - but it was a freak attempt IMHO.

Perhaps we could use regular grid subdivision and implement SAP in every subdivision grid? That would eliminate a lot of X-axis alignments for sure.

If we use subdivision, we have to ensure that entities only exist in one grid at a time, yet ensure that they will be correctly detected from neighboring grids. This simplified drawing illustrates the issue: Entity A, inside grid tile 1, queries for entities in range. Obviously entity B is in range, but it exists in grid tile 2. So in order to get accurate results, we must also include all entities in tile 2.

The current implementation in 0AD SVN would place entity B into both grid tile 1 and grid tile 2. This means we have to sort the results and call std::unique to remove duplicate entities.

I think we can improve the current implementation of Spatial Subdivision by improving on these issues. We should also use floating point numbers internally for faster Squared Distance calculation - meaning we could return accurate results and avoid any further calculations.

**feneur** · September 25, 2013

I don't know enough of programming to know if it would make any difference, but at least in theory I would assume part of the problem (regardless of implementation otherwise) could be remedied by actually implementing formations. I.e. only calculate distance between formations rather than for each individual unit, which should make quite a difference in the amount of calculations needed I would assume.

scroogie · September 25, 2013

Hi RedFox,

sorry I don't remember, does the RangeManager treat entities as points or as featuring some radius?

*edit*

I just had a look at the parallel for patch. Don't you think that is a quite painful way of doing multithreading in games? First, I'd assume it would be easier to use some existing implementation with threadpools etc., like even OpenMP. Secondly, I'd think the easiest (and most common afaik) way of doing it is completely putting one component in a seperate thread, like the AI. That way you don't need to make sure that every called method is threadsafe etc., which will be horrible to maintain over time, at least in my experience.

Edited September 25, 2013 by scroogie

**RedFox** · September 25, 2013

Hi RedFox,
sorry I don't remember, does the RangeManager treat entities as points or as featuring some radius?
*edit*
I just had a look at the parallel for patch. Don't you think that is a quite painful way of doing multithreading in games? First, I'd assume it would be easier to use some existing implementation with threadpools etc., like even OpenMP. Secondly, I'd think the easiest (and most common afaik) way of doing it is completely putting one component in a seperate thread, like the AI. That way you don't need to make sure that every called method is threadsafe etc., which will be horrible to maintain over time, at least in my experience.

Entities are 2D bounding boxes (since they can be buildings, units, flora, fauna...).

The PFor implementation isn't entirely horrible - though it has quite a few concurrency issues. We don't use OpenMP because we already implement pthreads and there is literally no useful cross-platform library with a PFor. The patch already implements thread pools if you didn't notice, so I wouldn't worry about the implementation efficiency.

Since we use C++03, the code isn't that nice. In C++11 we could get away with cool little lambdas:


ps::parallel_for(0, size, [&](int atIndex) =>
{
    printf("%d\n", atIndex);
});

As I measured speedup from multi-threading, my results were:

1 thread: 24ms/turn

2 threads: 14ms/turn

4 threads: 8ms/turn

If we push the code into a worker thread - something that works in background to the whole engine, we will definitely reduce the time taken, even on single core CPU's. However, we have to account for situations where calculating ranges takes more than 1 turn and how that will affect the simulation state. We will probably have to wait out the results from previous turn regardless. All in all, it might be worth the shot and it would have less concurrency issues than PFor.

I don't know enough of programming to know if it would make any difference, but at least in theory I would assume part of the problem (regardless of implementation otherwise) could be remedied by actually implementing formations. I.e. only calculate distance between formations rather than for each individual unit, which should make quite a difference in the amount of calculations needed I would assume.

It would definitely make it easier to do range checks between formations. However, once a formation is in range, we will still need to target individual soldiers which is again one of the slow cases. I wonder how Total War handles this problem?

Paid Development: September 2013 Activity Status

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites