Multi Threading for 0 A.D. simulation and less lag

JuKu96 · May 7, 2015

Currently, 0 A.D. doesnt support multi threading. I have discussed this with some developers in the IRC channel some time ago.

I think this is one of the 2 main problems of the lags, the simulation lag and the network lag.

The network lag is already here, but the simulation lag can be fixed.

If 0 A.D. does support multi threading for units, for example every thread has an executor and execute tasks or calculate the movements from 50 units, you can make the game much faster, i think.

0 A.D. lags currently, if the data has to be sync.

You dont need to support multi threading for all things in 0 A.D., i think this isnt possible without rewrite more than 50% of the code, but there are many little things, which can be optimized. The only problem is, that the synchronization is very difficult with multi threading. But Asynchronous synchronization is also possible, i think. Every executor is responsible for example for 50 units and synchronize they independent from the other executors. You can do the same with ai, but i think its better, if only the host is responsible for ai, every client can compute data for ai, but only the host say, what the ai has to do.

One of the main problems is also the synchronization. Every client has to compute all data and synchronize this data with the host. Why doesnt only the host calculate the data? So you can also avoid oos. Or every client calculates data, but if there are differents, the data from the host will be set.

You have choose this, because you want to pre calculate units and reduce data to send, if there are more network lag.

Balance in alpha 17 was better, i think, you could play alpha 17 without lag, alpha 18 not.

Its also caused by formations, but this is an other problem.

Is it an idea to add a setting in multiplayer mode, where the host can enable formations, or not?

I know, its very difficult to change 0 A.D. structure, because of the history.

But the new cpu s dont have a higher frequency rate, they have more cores.

For example, in the past, a cpu has a frequency rate of 3,00GHz, nowadays a cpu has only 1,80GHz, but 8 cores on a new i7, to save power and so on and the new cpu s are much faster than they.

I think its the right way to try to support multi threading a little bit.

But this is only a discussion.

I also want to help developing 0 A.D., but i have wrote with an developer, i think it isnt so easy to start developing with the Game Engine.

Execuse me for my english, i am from Germany and my english is not the best.

Edited May 7, 2015 by JuKu96

**niektb** · May 7, 2015

That's funny, overall the performance is much higher in A18 (compared to A17) (noticable). I can play A18 without (simulation) lag if I keep the popcap reasonable, I couldn't with A17.

**sanderd17** · May 7, 2015

If only the host would calculate the data, that would mean he has to send the entire state to all other players every turn.

And the gamestate can be huge. A zipped OOS log (which contains the game state and is compressed) normally has a few hundred kB, up to even MB. Since a turn in multiplayer takes 500ms, that would mean the host has to be able to send at a speed of at least 16Mbps for one opponent. And 112Mbps to support 8 players (these are way above the normal speeds for regular customers). Only to have the data arriving 500ms later at the clients than at the host, so giving the host half a second more reaction time.

Next to that, it would also give the host the ability to cheat by modifying the game state.

So non-synced gameplay isn't really an option.

Threading would indeed help, but the simulation should always stay in one thread, to guarantee determinism. Other things (like rendering or AI) can be split off to different threads.

kanetaka · May 7, 2015

Threading would indeed help, but the simulation should always stay in one thread, to guarantee determinism. Other things (like rendering or AI) can be split off to different threads.

I think giving threads to each pathfinding calculation for single unit is effective. The pathfinding calculation depends on previous turn, but doesn't depend on other units on-going movement. And some path results takes much longer.

Lion.Kanzen · May 7, 2015

This a technical discussion?

Loki1950 · May 8, 2015

Yes Lion

Enjoy the Choice

JuKu96 · May 8, 2015

That's funny, overall the performance is much higher in A18 (compared to A17) (noticable). I can play A18 without (simulation) lag if I keep the popcap reasonable, I couldn't with A17.

You can look in the lobby and ask people.

You could play alpha16 with civ limit unlimited.

Many players say, that performance was better in alpha17, than in alpha18.

But i think the user experiences also depends on the network connection and the cpu.

The most users have a good internet connection, 16k+.

Normally, 0 A.D. ist developed for games with civ limit 300.

The most games in 0 A.D. lobby are hosted with civ limit 300.

civ limit 200 isnt enough, 300 is also not huge, for so complex and really good economy simulation.

If only the host would calculate the data, that would mean he has to send the entire state to all other players every turn.
And the gamestate can be huge. A zipped OOS log (which contains the game state and is compressed) normally has a few hundred kB, up to even MB. Since a turn in multiplayer takes 500ms, that would mean the host has to be able to send at a speed of at least 16Mbps for one opponent. And 112Mbps to support 8 players (these are way above the normal speeds for regular customers). Only to have the data arriving 500ms later at the clients than at the host, so giving the host half a second more reaction time.
Next to that, it would also give the host the ability to cheat by modifying the game state.
So non-synced gameplay isn't really an option.
Threading would indeed help, but the simulation should always stay in one thread, to guarantee determinism. Other things (like rendering or AI) can be split off to different threads.

It isnt possible, that only the host calculates data, i know.

I have thought, that every client can pre calculate data and the host sends all data every x turns.

But this is only an idea.

If the game state has some MB, this isnt possible.

And non-sync is also not possible, i think.

But path finding can be optimized with multi threading.

I think giving threads to each pathfinding calculation for single unit is effective. The pathfinding calculation depends on previous turn, but doesn't depend on other units on-going movement. And some path results takes much longer.

I think the path finding calculation depends on the previous turn and the formations, or?

If you give every unit a thread, this means 4x300 Threads for 4 players with civ limit 300.

It is better to use an executor, or?

This executor has an queue, with for example tasks for path finding for 200 units.

So you can split the 1 thread in for example 6 threads with 6 executors.

This a technical discussion?

Yes.

Edited May 8, 2015 by JuKu96

**niektb** · May 8, 2015

[...]
You could play alpha16 with civ limit unlimited.
[...]

Really not. (but that is also because I had a different PC back then)

I find 300 is too high to play with (ain't fun no more). I myself prefer 100~150. (roughly the same amount I had with AoK.

kanetaka · May 8, 2015

I think the path finding calculation depends on the previous turn and the formations, or?

Yes, the formation is also the factor and you know that is one of the performance killer. I am not familiar with the formation code, but a single turn consists of something like follows:

Independent units move
Formation leaders move
Formation followers move

If you give every unit a thread, this means 4x300 Threads for 4 players with civ limit 300.
It is better to use an executor, or?
This executor has an queue, with for example tasks for path finding for 200 units.
So you can split the 1 thread in for example 6 threads with 6 executors.

Yes that sounds practical. I have never known executor pattern. You have more knowledge about multi thread than me.

**sanderd17** · May 8, 2015

The formation process is actually: first move the formation center along the pre-calculated route, then let all units walk to their new formation position.

But the biggest issue with the formations is that the pathfinder doesn't understand that formations have a size that's bigger than their obstruction.

So if a formation walls along an obstacle (like along a shore), then the pathfinder keeps the formation center next to the shore, which causes half of the units trying to reach unreachable positions.

If units can stay in their formation positions, calculating their next position is a rather simple calculation, and would be even faster than without formation, as the long-range calculation only needs to be done once for the formation.

So the pathfinder should estimate the width of a formation, and see how it can avoid most obstacles for a certain route.

serveurix · May 9, 2015

So if a formation walls along an obstacle (like along a shore), then the pathfinder keeps the formation center next to the shore, which causes half of the units trying to reach unreachable positions.

I suppose this question has already been raised, but is there a way to make a difference between movable obstacles and non-movable obstacles, so the unit's don't try to go to a point they have no chance to reach ?

**sanderd17** · May 9, 2015

The current pathfinder already makes a difference between moveable and unmoveable obstacles. The long range pathfinder doesn't even consider moveable obstacles (since it plans a long time ahead, and anything moveable would probably already be changed by then).

But even with non-moveable obstacles, the pathfinder still tries to guide the units to the closest reachable position. Which is a heavy calculation that it shouldn't do if the formation centre is following a more reasonable path.

JuKu96 · May 12, 2015

The current pathfinder already makes a difference between moveable and unmoveable obstacles. The long range pathfinder doesn't even consider moveable obstacles (since it plans a long time ahead, and anything moveable would probably already be changed by then).
But even with non-moveable obstacles, the pathfinder still tries to guide the units to the closest reachable position. Which is a heavy calculation that it shouldn't do if the formation centre is following a more reasonable path.

Its maybe also an good start for multi threading in 0 A.D. , if you put all units without an formation, for example all units which are collecting resources, to an other thread.

They dont have dependencies and can be calculated parallel to the formation units, or?

Can you say me, where the code for path finding is, for example in which classes or in which files?

If the path finding calculation has all data about size and obstacles, why doesnt the path finder split units, for example, if they are going around a house?

mifritscher · May 31, 2015

For me the speed hasn't changed much between Alpha 15 and 18. I'm playing mostly 1vs1 with max. 150 civ. The biggest problems are rendering -> patches -> render terrain base -> unlogged and rendering -> models -> rendering bucketed submissions (Windows Visa, Core2Duo, x3100, lowest graphics settings within GUI, needed to disable glsl because it can't compile the shaders) - so rendering is the biggest problem for me ;-)

Edit: Under Ubuntu (14.04, 64 bit, same computer, glsl enabled) It looks different: here te problem is render -> render submissions -> clear buffer

Even if the gamestat is a few 100 kB (which sounds rather big to me to be honest, perhaps it could be may more effective with a binary encoding - yes, there is a binary form of JSON) It could be worth to transfer only the differences

What are the biggest problems in path planing now? To be honest, it's surprised me that this should be the bottleneck - even on big maps and many (>1000) civs) Some random ideas for optimisation (don't know if they are already used ):

* If a civ runs this way (source->target point) for several (>1?) times, assume it is free -> no recalculation of the path. If there is a new obstacle then recalculate once when the obstacle is (almost) hit

* I hope that it not recalculated every 50 ms or so?

* Stick to "static" pathplaning, only recalculate if there new obstacles occuring which are on the path

* First try to find local cirvumventions on obstacles and not recalculate the whole path

* Formations: As already being said: Do the global pathplaning only for one person (master) in the formation, the other do only a local "follow the master" algorithm. This gobal pathplaning could even estimate the width of the formations (as already being said)

* Formations: If formations are crossing each other, but shouldn't interact (by e.g. attacking) try to keep the masters a bit apart so the "follow the master" algorithm doesn't need to calculate circumventions very dynamically

Edited May 31, 2015 by mifritscher

**sanderd17** · May 31, 2015

As you have an integraded grahpics card, there's not a lot we can do to speed up the rendering. Most users complain about CPU calculation speed, which can indeed be improved a lot (or so we hope). But rendering can only be marginally improved.

Transferring differences still requires players to start with the same state. If the state deviates at some point, all subsequent states will be different. So it's just the same as only sending the commands.

Wrt the pathfinding. Most things you say are already implemented, or don't work. Caching paths is no option, too many paths are calculated, and obstructions change too often (think about units moving, buildings build and trees cut down). The long-paths calculated are already static, they're only calculated once, without taking units into account. Then the short paths are calculated between long-path waypoints. The short paths have to be calculated on the fly, as they also take other units into account. There's no guarantee that a local circumvention will result in a valid path. It's perfectly possible that you get in a dead-end path. So you need to re-evaluate the entire path (at least between two valid points) to guarantee a valid path. Formations already only calculate one long path, and units do short paths to their new formation position (which is most often a straight line). Crossing formations indeed aren't taken into account, but this doesn't happen that often.

**Yves** · May 31, 2015

But rendering can only be marginally improved.

I don't agree here. I have a Radeon R9 270, which is a relatively modern and fast graphics card. Stil, I get less than 20 FPS on some places of the map Deep Forest, even without any AI players and with just a couple of units. In the profiler, I often see 30-40 ms in the renderer.

It's much better on Nvidia cards, but I'd still say it's something that we can improve and not solely because AMD's drivers suck. OpenGL has improved since our renderer was written and there are new extensions and approaches that are designed to reduce driver overhead. I've started working on such a new renderer, but it's definitely not a "quick win" and will still take a lot of time until we see the first improvements (and even more until the renderer takes full advanted of OpenGL 4+).

mifritscher · May 31, 2015

@rendering: one finding for me is, that the profiling result for rendering is totally different on Linux & Windows (On Linux there is no difference if GLSL is enabled or not). The other thing is that the bottlenecks are either on "unlogged" things (as on Windows) or on clearing ( & and switching) buffers. The unlogged thing could be a overlooked thing which is worth to be profiled, the buffer thing could be a surprising ineffectiviness on a specific function.

@Transfering differences vs transfering commands: transfering commands means that the non- host computers need to calculate more, which could lead to problems if e.g. something gets out if (timely) sync or do have slightly different calculations (e.g. floating point, which you try to avoid partly because of this) To help mitigate the problem about deviating states (which shouldn't occour if only the host calculates the "true" data and the other computers only calculate things for interpolating for e.g. smoother displaying) checksums could be used - and if any difference is catched then the raw data is requested, perhaps only section-wide

@Pathfinding: Caching paths could be useful e.g. for gathering ressources. The civs are walking much more often than ressources are exhausted or new buildings get built ;-) The local cirvumventing could be used as a first try and in a way to get back on the planned track at quick as possible. If that doesn't work in e.g. 3..5 seconds, a global replaning could be triggered.

JuKu96 · November 19, 2015

Is there some plan for next versions?

**wraitii** · November 19, 2015

I think threading needs to be considered mostly for two things:

-short-range paths

-AI

The good thing is that these two problems tend to be for different game modes.

The AI is mostly separate and could probably be threaded somewhat easily, but it is generally not the biggest issue in SP anymore. SP doesn't lag too badly unless you experience graphics lag, which is a separate problem.

Pathfinding can still be slow, and has been somewhat designed with threading in mind. The benefit would mostly be felt in MP, where 500ms turns mean a lot of pathfinding request can accumulate, particularly short-range pathfinding requests, and lag will become important. The game will appear to freeze every 500ms which is annoying. Threading pathfinding would alleviate this effect somewhat.

Another solution would be to reduce turn length in MP which would slightly decrease the pathfinding problems.

Of course, we still have a lot of optimizations that can be done on the pathfinding itself, so threading should probably not be a priority.

cRaZy-biScuiT · April 21, 2016

Any news about multithreading? I could see several threads in htop. Still only one of them consumes CPU power. I don't have the fastest GPU (560ti) but still the performance should be much better. With maxed out graphics as well as with low graphics the fps are falling to <20 fps when attacking. Only one core of my Q6600 is being used.

Loki1950 · April 21, 2016

Multi-threading is not the same multi-core usage the engine does not have multi-core support yet that requires a full refactoring of the entire code base not trivial with a complete volunteer dev team as it's not a sexy task

Enjoy the Choice

cRaZy-biScuiT · April 21, 2016

Hey Loki1950,

I'm a dev myself and I'm interested in helping out. So there're no plans at all atm? Multithreading already became true somehow, but the idea of multithreading is kind of sensless if the load is not balanced between the threads at all which means no multi-core support.

**sanderd17** · April 21, 2016

@cRaZy-biScuiT, the theads are indeed only used to have some non-blocking IO AFAIK. Not to speed up the calculations.

The biggest problem we have with multi-threading is that the simulation must be deterministic. All players on a networked game must have the same state after the same commands. So these heavy calculations like the pathfinder will be pretty hard to parallelize, as paths can depend on each other. What would be possible is to split off other parts. Like the 3D renderer and the GUI renderer are mostly independent from the simulation. And the AIs could also be calculated in a simultaneous thread, as long as they're synced again each turn.

cRaZy-biScuiT · April 22, 2016

@sanderd17 That sounds already like a good idea. Probably it would be good to aim for that. The question ist now: How much impact does each of the taks produce.

21 hours ago, sanderd17 said:

What would be possible is to split off other parts. Like the 3D renderer and the GUI renderer are mostly independent from the simulation. And the AIs could also be calculated in a simultaneous thread, as long as they're synced again each turn.

**sanderd17** · April 22, 2016

It depends a lot on the system you're on. On my system (with quite slow intel graphics), 3D rendering causes the main slowdown in the start of the game. In the later game, there are more units around, so the simulation part (pathfinding, attack-damage calculations, target selection, ...) causes more slowdown, while the graphics stays pretty much constant (depends on the number of tris in your view, but not a lot more).

You can easily see the impact of different components when you enable the profiler (F11).

Another advantage of splitting off the rendering would be that we could have more fancy graphics, without causing extra lag as it can be calculated together with the simulation.

Multi Threading for 0 A.D. simulation and less lag

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation