[DISCUSS] Performance Improvements

**RedFox** · April 25, 2013

Hey guys, haven't been active for a few years, but I've been tweaking around with the pyrogenesis source lately and made some performance improvements:

Removed CFixed<> and replaced it with float. All fixed::IsZero checks replaced with ::IsEpsilon(float). This accounts for float imprecision in some cases and removes thousands of conversions from CFixed -> float and float -> CFixed. Improves overall performance by ~20%.
Replaced boost::unordered_map<key, value> with std::unordered_map<key, value>. Implemented std::hash<> for the required keys. This gave only a small performance increase of ~5%, but it was worth the shot, since unordered_map is a part of the C++ standard. Most notably, the bad performance was due to unimplemented hash functors (!) . Right now performance can be further improved if all std::map<> instances have a proper hash method implemented. As it turns out however, most of these maps can be removed overall.
Result: before 37fps, after 48fps on my i7-720QM and Radeon HD5650 laptop.

It took some 250 files to modify and some minor redesign of the math library, but the benefit was worth it: faster game, more maintainable code.

Right now the engine has a lot of inefficiencies that are mostly caused by naive implementations:

1. Inefficient Renderer:

Problem: Currently the rendering system constructs a sorted map of models and their shaders Every Frame. Not only is this bad that such a large map and vector is created every frame, it could be redesigned to achieve the same effect with no maps or vectors recreated.
Solution: the models can be sorted beforehand, during context initialization (!) and grouped under specified shaders and materials.
Result: This should give a huge (I'm not kidding) performance boost, so this will be the next thing I'll implement.

2. Entity/Actor/Prop XMLs:

Problem: Not only is XML inefficient to parse, it's also much harder to actually read and edit. To make it worse, the filesystem overhead for these hundreds of files is huge (!). Loading times for weaker filesystems can be very long.
Solution: use a simplified text parser for entity/actor/prop parsing to increase speed. Group common actors/entities into single text files (e.g. - units_athenian.txt) while still retaining the 'modability' of the game. All current data can be easily converted from XML to the desired (custom) format.
Result: Loading times and memory usage will decrease dramatically (which is a pretty awesome thing I might add).

3. String Translation:

Problem: It would seem as if this had no effect for performance or memory, but you should think again. Creating a translation (multilingual) system offers us a cheap ability to group all the strings together into a huge memory block.
Solution: Load game/unit strings into a large memory block and simply share out const pointers to the strings. Something like this: "unit_javelinist\0Generic Peltast\0unit_javelinist_descr\0Peltasts are light infantry...\0".
Result: Every heap allocation costs some memory for the system heap bookkeeping and on Win32 platform it's around ~24 bytes. If you have 1000 strings of size 24, that will add up to 24KB and totaling at 48KB used. With the translation system this would remain ~24KB. Of course, there are much more strings in the game, so the result would be noticeable.

4. Entity-Component System is inefficient:

Problem: While some would say its having a map<int, map<int, *>> shouldn't have any real effect on performance (since the profiler(?) says so), I would recommend to reconsider and rethink the algorithm of the entire system. Maps are very memory intensive and too many vectors are created/destroyed constantly while the look-up of Components remains slow.
Solution: Give the good old ECS pattern a slight Object-Oriented overhaul. Create an IEntity interface, give concrete implementations like UnitEntity : IEntity a strong reference to a Component in its class definition. This will remove the need for component look-up. The message system can also be redesigned to be more direct. And finally, Managers can be divided into frame slots across frames, since a lot of the data can be asynchronous.
Result: This will give a huge overhaul and streamline the simulation engine; making it more maintainable, easier to program and much much faster and memory efficient.

5. Naive Pathfinder:

Problem: The current implementation runs a very time consuming algorithm over the entire span of the region, making it very inefficient.
Solution: Redesign the pathfinder to include a long-distance inaccurate path and a short-distance accurate path. The accurate path is only updated incrementally and for very short distances, making this method suitable for an RTS.
Result: A huge performance improvement for moving units.

6. Naive Collision/Range/Obstruction detection:

Problem: Currently a very naive subdivision scheme is used for collision, range and obstruction. Even worse is that all of these modules are decoupled, duplicating a lot of data and wasting time on multiple updates of the same kind.
Solution: A proper Quadtree embedded into the Entity Component System would help keep track of all the entities in a single place (!) and the Quadtree structure itself would make ANY(!) range testing trivial and very fast.
Result: Performance would increase by at least 10 times for the Collision/Range/Obstruction detection components.

7. AI is a script:

Problem: I can't begin to describe my bafflement when I saw UnitAI.js. AI is something that should be streamlined into the engine as a decoupled yet maintainable module - a component of ECS, yet in a little world of its own.
Solution: Translate UnitAI to C++ and redesign it. In a FSM case (such as UnitAI.js), the controller pattern would be used to decouple actions from the FSM logic and make it maintainable.
Result: AI performance would increase drastically and an overhauled system wouldbe far more maintainable in the future, leaving room for improvements.

--------------------------------------------------------------------------------------------------------------------

This is not exactly a duplicate of the old Performance Optimisations thread - I've made these observations and conclusions based on the current source and my past experience as a C++ developer.

Furthermore, the reason why I'm bringing out all these problems, is because I intend to solve all of them and I also have the required time and skills to do it. Michael as the Project Leader has invited me to take up as a full-time developer for 0AD after my mandatory service in the Defense Forces finishes in May, so right now I'll be focusing on small changes and getting fully acquainted with the source.

As for me, my name is Jorma Rebane, I'm a 22 year old Software Developer of real-time systems. My weapon of choice is C++, though often C and assembler is required. I've worked with several proprietary and open-source 3D engines in past projects as a graphics and gui programmer, so I'm very comfortable around DirectX and OpenGL.

-----

Hopefully these points will bring out a constructive discussion regarding 0AD performance and improvements. Cheers! ^_^

**wraitii** · April 25, 2013

Sound generally good. I have nothing against changing XML to something else as long as readability is kept (and thus modability too), and most/all features are too. Redesigning the way components are handled seems fair, the use of maps and intertwined calls to other components slows th thing down and makes it awkward to understand. Again, as log as moddability is kept, sounds good.

The pathfinder is a permanent WIP that stalled a year ago. I recommend you check out Philip's work, he had basically done what you describe (still has to give away the code though). I'll add that this should, as much as possible, use functions the eventually could be called by the AI to obtain paths (even of that implies having to wait for the next turn), as that is a serious slowdown for AIs.

If at all possible, a proper quadtree for entity positions should probably be linked with the renderer to speed up frustum culling for models, though that might link rendering and simulation a bit too much.

UnitAI is slow. However, I believe it's wanted to remain mostly JS to allow it to be modded fairl extensively. Moving it to C++ would imper that completely.

(same deal with the actual opponent AIs code)

(also: we need to be sure that removing cFixed won't cause OOS errors in MP, but I'm sure you have that in mind)

(edit: btw, as the kind of guy that makes crap code that need to be fixed (do NOT look at water manager.cpp right now. I can't believe how ugly it is, and I made it), glad to have you on board )

**sanderd17** · April 25, 2013

2. Entity/Actor/Prop XMLs:
Problem: Not only is XML inefficient to parse, it's also much harder to actually read and edit. To make it worse, the filesystem overhead for these hundreds of files is huge (!). Loading times for weaker filesystems can be very long.
Solution: use a simplified text parser for entity/actor/prop parsing to increase speed. Group common actors/entities into single text files (e.g. - units_athenian.txt) while still retaining the 'modability' of the game. All current data can be easily converted from XML to the desired (custom) format.
Result: Loading times and memory usage will decrease dramatically (which is a pretty awesome thing I might add).

In the distribution version, I believe everything is packed in one zip blob. If this still isn't efficient enough, maybe take a look at the pbf format: http://code.google.com/p/protobuf/ The reading speed of PBF is about 6 times that of XML, and the files are a lot smaller.

As a drawback, PBF isn't editable, so it would probably only fit in the official releases. Translation of XML into PBF goes pretty quick too.

And for the pathfinder, take into account that a lot of paths are often to the same place (like how many times does a citizen soldier needs to go to a certain dropsite), but with different starting positions.

So flowfields (backwards pathfinding+storing) could help with that. Flowfields are mentionned on the forums here.

**RedFox** · April 25, 2013

Sound generally good. I have nothing against changing XML to something else as long as readability is kept (and thus modability too), and most/all features are too.

Thanks! Migrating from XML would be a simple task and since all the data is still kept in human readable form, it can be easily modded and new units can be added in new unit files.

As a drawback, PBF isn't editable, so it would probably only fit in the official releases. Translation of XML into PBF goes pretty quick too.

Well, in our case, writing and reading raw binary would be quite trivial, even in C:


struct Vector2 { float x, y, z; };
Vector2 randExample[10] = { { 0.0f, 0.0f, 0.0f } };
// .... lets write the array to a file:
FILE* f = fopen("myfile.bin", "wb");
fwrite(randExample, sizeof(randExample), 1, f);
fclose(f);
// And the whole array of Vectors was written to the file in binary...

Redesigning the way components are handled seems fair, the use of maps and intertwined calls to other components slows th thing down and makes it awkward to understand. Again, as log as moddability is kept, sounds good.

It would streamline the system by allowing explicit knowledge on the type of Entities and their statically defined components. This is not to say that an entity can't contain a 'ScriptComponent'. For now I'm thinking of implementing all of the core components straight in C++ and leave the 'moddable' implementation to the ScriptComponent. The script itself can choose the number of components it defines. As far as the API is concerned, there is only a 'ScriptComponent' as the moddable implementation.

If at all possible, a proper for entity positions should probably be linked with the renderer to speed up frustum culling for models, though that might link rendering and simulation a bit too much.

Yes, the actual position/rotation/scale should be in a Transformation component (not in separate components), implemented as a 4x4 Matrix. This way, the matrix can be sent directly to the shader without any hot-potatoing going around (a situation where data is seemingly passed from module to module and copied numerous times before it reaches its destination).

UnitAI is slow. However, I believe it's wanted to remain mostly JS to allow it to be modded fairl extensively. Moving it to C++ would imper that completely.
(same deal with the actual opponent AIs code)

The AI can be turned into a component-like system, where one of the components is ScriptAIComponent (for example..). This would allow us to implement the AI in C++, yet making it possible to add additional AI functionality. The core AI code shouldn't be in JS, because it leaves out a plethora of optimization opportunities that are definitely needed for an efficient and maintainable system. Current UnitAI.js is unfortunately neither of those two.

(also: we need to be sure that removing cFixed won't cause OOS errors in MP, but I'm sure you have that in mind)

The CFixed<> was replaced by float and the serialization method remains the same, thus the hashes are deterministic. It will work if client and server use this new version, which is probably granted .

Edited April 25, 2013 by RedFox

zoot · April 25, 2013

In the distribution version, I believe everything is packed in one zip blob. If this still isn't efficient enough, maybe take a look at the pbf format: http://code.google.com/p/protobuf/ The reading speed of PBF is about 6 times that of XML, and the files are a lot smaller.

I agree. If the goal is performance, you might as well take the full step and use a binary format. I would suggest retaining XML as the 'source'/'authoring' format, and then having some mechanism in the engine to convert it to binary "on the fly", and then store the binary result in the game's cache. That makes for a nice balance between optimization and mod-friendliness that is also used for textures etc.

So flowfields (backwards pathfinding+storing) could help with that. Flowfields are mentionned on the forums here.

I'm a bit dubious that flowfields can really be used like that. What if you have a loop in the flow? A proper cache in combination with the new pathfinder algorithm seems like a better course to me.

**RedFox** · April 25, 2013

I agree. If the goal is performance, you might as well take the full step and use a binary format. I would suggest retaining XML as the 'source'/'authoring' format, and then having some mechanism in the engine to convert it to binary "on the fly", and then store the binary result in the game's cache. That makes for a nice balance between optimization and mod-friendliness that is also used for textures etc.

That is a really interesting idea and it can be easily done, too. But for rapid development, the system would have to be 'doubled'. This is what I mean for such an implementation (also used in Rome - Total War, mind you):

1) Search 'packs' path for *.pack

2) Search 'data' path for 'loose' (unpacked) input files *.txt, *.dds, *.etc

If a 'loose' file is found, it always has higher priority, so 'data/test.txt' would be used instead of 'packs/test.pack::/test.txt'.

The packs can be compressed, though that wouldn't affect much, since the data is already very dense.

**wraitii** · April 25, 2013

It's most likely some variation on the best way to go for a moddable game. I think civilization IV uses some sort of similar system ("loose" or readable files are cached into binary files for later uses). 0 AD should definitely allow something like that. Given the architecture of the "mods" folder, I'm thinking keeping a separate "pack" cached for each mod. The engine would need to be able to handle either the original files and cache them, or the packs, or a combination of both. Keeps moddability, readability, and efficiency.

(obviously takes more time to do but or the sake of argument, ill assume your time is infinite )

zoot · April 25, 2013

If a 'loose' file is found, it always has higher priority, so 'data/test.txt' would be used instead of 'packs/test.pack::/test.txt'.

Indeed, I believe this is how the engine currently handles textures etc.

Edited April 25, 2013 by zoot

**RedFox** · April 25, 2013

It's most likely some variation on the best way to go for a moddable game. I think civilization IV uses some sort of similar system ("loose" or readable files are cached into binary files for later uses). 0 AD should definitely allow something like that. Given the architecture of the "mods" folder, I'm thinking keeping a separate "pack" cached for each mod. The engine would need to be able to handle either the original files and cache them, or the packs, or a combination of both. Keeps moddability, readability, and efficiency.
(obviously takes more time to do but or the sake of argument, ill assume your time is infinite )

Hmm, that is another thing that most (if not all) MMO-s do. Of course their 'loose' files arrive over broadband...

Since the user provided 'mod' files are in 'plaintext', they will have to be Deserialized before caching as raw binary... However, if the 'plaintext' files are kept in the development folder, then the cache looses its point. To the point: we have to provide a tool (built into pyrogenesis.exe maybe?) that converts 'plaintext' loose files into straight binary. This is something that could be done during release deployment. Otherwise the game would be caching stuff and wasting time.

**wraitii** · April 25, 2013

Basically the current implementation seems fine: only "pack" the files for releases. The engine already for that, just need to change the way it packs.

**RedFox** · April 25, 2013

Basically the current implementation seems fine: only "pack" the files for releases. The engine already for that, just need to change the way it packs.

Exactly! Luckily serializing / deserializing in binary is like 3 lines of code when only POD types (simple types like float, int... struct{float,int}) are concerned. However for more complex types like std::string, a custom implementation would be required... And if pointers enter the picture, things get messy.

Furthermore, the items being serialized are usually entity templates, which means very complex binary format that soon begins to look like text data.

In the end it might be easier to just use a new serialization method on top of more condensed data files (like units_athens.txt). Lets leave it at that for now - it can be reimplemented later Right now just getting good performance out of the parsing would be the main goal.

For example, a current component in XML:


  <Identity>
    <Civ>gaia</Civ>
    <GenericName>Gaia</GenericName>
  </Identity>

Could be instead represented by:

identity gaia

The textual representation for gaia would be looked up from translation as:

; english.txt
gaia_descr Gaia

Introducing translation tables would make things so much easier. Furthermore, the component can be easily parsed with:

file >> cmpIdStr;
IComponent* c = CreateComponent(cmpIdStr); // this will throw an error on invalid component
c->Deserialize(file); // will parse till end of line for specific data and fail gracefully

Edited April 25, 2013 by RedFox

**alpha123** · April 25, 2013

I agree. If the goal is performance, you might as well take the full step and use a binary format. I would suggest retaining XML as the 'source'/'authoring' format, and then having some mechanism in the engine to convert it to binary "on the fly", and then store the binary result in the game's cache. That makes for a nice balance between optimization and mod-friendliness that is also used for textures etc.

I think this would be a very good idea. XML is easy to read, create, and modify, but loading times are much too slow currently.

Regarding UnitAI: I don't think it's much of a bottleneck currently, and will only get faster with newer SpiderMonkey releases (rough guess: 10% faster with the SpiderMonkey 17 upgrade, another 15% with the next one (22? - whatever will have IonMonkey)). Also... it's a bit of a disaster currently, so I'm very hesitant to rewrite it in C++, which is a much less maintainable language than JavaScript.

**RedFox** · April 25, 2013

lso... it's a bit of a disaster currently, so I'm very hesitant to rewrite it in C++, which is a much less maintainable language than JavaScript.

With a proper implementation it can be both maintainable and extendable. The trick is to encapsulate logic into action/decision sequences, much like the Entity-Component system, but actually very different. If a scripting module is implemented, it can just expose current 'decisions' or 'actions' of an entity to the script. The script can implement whatever logic it wants.

**quantumstate** · April 25, 2013

The CFixed<> was replaced by float and the serialization method remains the same, thus the hashes are deterministic. It will work if client and server use this new version, which is probably granted .

From what I have read floats will not be identical between systems once you start using different compilers and different architectures.

5. Naive Pathfinder:
Problem: The current implementation runs a very time consuming algorithm over the entire span of the region, making it very inefficient.
Solution: Redesign the pathfinder to include a long-distance inaccurate path and a short-distance accurate path. The accurate path is only updated incrementally and for very short distances, making this method suitable for an RTS.
Result: A huge performance improvement for moving units.

Do you know about the stalled pathfinder rewrite at #1756?

6. Naive Collision/Range/Obstruction detection:
Problem: Currently a very naive subdivision scheme is used for collision, range and obstruction. Even worse is that all of these modules are decoupled, duplicating a lot of data and wasting time on multiple updates of the same kind.
Solution: A proper Quadtree embedded into the Entity Component System would help keep track of all the entities in a single place (!) and the Quadtree structure itself would make ANY(!) range testing trivial and very fast.
Result: Performance would increase by at least 10 times for the Collision/Range/Obstruction detection components.

I had a look at this. The naive subdivision isn't what causes the performance hit, de-duplicating results which are across region boundaries and using an inefficient set for lookup are much bigger factors. Unification like you say might be a good idea though.

**RedFox** · April 25, 2013

From what I have read floats will not be identical between systems once you start using different compilers and different architectures.

The opcodes and precision might be different, depending on the cpu(fpu?) of the device or software float library of the compiler; But the overall usability of floating point numbers remains the same. Even though some errors occur with floats quite occasionally, I included proper methods for testing float equality, which is just something similar to:


inline bool Equal(float a, float  {
	return abs(a -  < M_EPSILON;
}

Where M_EPSILON is a very small float (e.g. 0.000000001f ).

Do you know about the stalled pathfinder rewrite at #1756?

I've talked to Philip about this, unfortunately he doesn't have time to finish it. I wouldn't benefit much from his solution either, since it would take me as long to implement a new one than to completely understand the existing one.

I had a look at this. The naive subdivision isn't what causes the performance hit, de-duplicating results which are across region boundaries and using an inefficient set for lookup are much bigger factors. Unification like you say might be a good idea though.

This is all something that can be fixed with a custom Quadtree implementation. An entity would have a pointer to a Quadtree grid and vice-versa. This allows an entity to get a list of pointers of objects that are in the same grid. Notice how this suddenly, without any complex lookup what-so-ever, decreased a huge list (like 200) to just a handful (usually never more than 8).

If needed, the parent grid can be checked and so on.

Edited April 25, 2013 by RedFox

**quantumstate** · April 25, 2013

The opcodes and precision might be different, depending on the cpu(fpu?) of the device or software float library of the compiler; But the overall usability of floating point numbers remains the same. Even though some errors occur with floats quite occasionally, I included proper methods for testing float equality, which is just something similar to:
inline bool Equal(float a, float  {
	float diff = abs(a - ;
	return diff < +M_EPSILON;
}
Where M_EPSILON is a very small float (e.g. 0.000000001f ).

Very similar isn't good enough. We need exactly the same result or a player will go out of sync. You can't rely on rounding because you will occasionally get a value right on the boundary so it rounds differently on each system. Very carefully designed code might be able to force stability but it would be really hard to do, and it needs to be done all over the simulation so is basically infeasible.

**RedFox** · April 25, 2013

Very similar isn't good enough. We need exactly the same result or a player will go out of sync. You can't rely on rounding because you will occasionally get a value right on the boundary so it rounds differently on each system. Very carefully designed code might be able to force stability but it would be really hard to do, and it needs to be done all over the simulation so is basically infeasible.

That is a valid point that I didn't consider. Though in this case, the float can be rounded to a near value during Serialization (which is used before the string is hashed...). This method is utter nonsense though. The component itself should have a method GetHash() which provides a reliable hash that can be used to compare two component objects.

I see no reason to revert back to fixed point. Bad design choices spur an onslaught of derivative Whiskey-Tango-Foxtrot, leading to decreased maintainability and performance.

**quantumstate** · April 25, 2013

That is a valid point that I didn't consider. Though in this case, the float can be rounded to a near value during Serialization (which is used before the string is hashed...). This method is utter nonsense though. The component itself should have a method GetHash() which provides a reliable hash that can be used to compare two component objects.
I see no reason to revert back to fixed point. Bad design choices spur an onslaught of derivative Whiskey-Tango-Foxtrot, leading to decreased maintainability and performance.

I already explained that rounding doesn't work. One CPU will give 1.49999, they other will give 1.5000, one player will get 1, the other will get 2, they will go out of sync. Rounding simply decreases the probability of an out of sync problem. Unless you make every algorithm that uses floats avoid every rounding boundary you cannot solve this. So using fixed is a good design decision because it makes keeping the game in sync manageable.

**RedFox** · April 25, 2013

I already explained that rounding doesn't work. One CPU will give 1.49999, they other will give 1.5000, one player will get 1, the other will get 2, they will go out of sync. Rounding simply decreases the probability of an out of sync problem. Unless you make every algorithm that uses floats avoid every rounding boundary you cannot solve this. So using fixed is a good design decision because it makes keeping the game in sync manageable.

Even though that is a rather edge case, something like that can be circumvented by simply handling a special rounding function during hash generation, which would convert the float to an int:


inline __int64 roundedFloatHash(float f) {
	return (__int64(f * 100000.0f) >> 2);
}
// 0.00154 -> 100 -> 25
// 0.00523 -> 500 -> 125

In either case, I don't see why the game engine has to suffer a 20% performance loss, when float hashing could be implemented on-demand. It is a matter of precision that can be easily decided on regarding world size. All games use methods like this. It's a well used pattern.

Edited April 25, 2013 by RedFox

**quantumstate** · April 25, 2013

Even though that is a rather edge case, something like that can be circumvented by simply handling a special rounding function during hash generation, which would convert the float to an int:
inline __int64 roundedFloatHash(float f) {
	return (__int64(f * 100000.0f) >> 2);
}
// 0.00154 -> 100 -> 25
// 0.00523 -> 500 -> 125
In either case, I don't see why the game engine has to suffer a 20% performance loss, when float hashing could be implemented on-demand. It is a matter of precision that can be easily decided on regarding world size. All games use methods like this. It's a well used pattern.

The game has to suffer a 20% performance loss because we don't want random out of sync errors popping up. You say most games use a pattern like this, do you have sources for that statement? This article http://gafferongames.com/networking-for-game-programmers/floating-point-determinism/ has numerous quotes saying that they used the same compiler or replays and multiplayer would not work. The hashing is irrelevant, the little differences will sometimes cause a genuine divergence in the game state. A single multiplayer game could easily have more than a billion floating point operations happening.

**stwf** · April 26, 2013

I know this is probably off topic but since it is a renderer thread. Does any of this effect OpenGL-ES compatibility? I think the new generations of Android consoles could be a boom for 0ad and a great way to get publicity. As I understand it Android by default at least only supports OpenGL-ES. It would be great if in rewriting the code we could keep this in mind.

Thoughts?

**alpha123** · April 26, 2013

I know this is probably off topic but since it is a renderer thread. Does any of this effect OpenGL-ES compatibility? I think the new generations of Android consoles could be a boom for 0ad and a great way to get publicity. As I understand it Android by default at least only supports OpenGL-ES. It would be great if in rewriting the code we could keep this in mind.
Thoughts?

Well, Ogre3D supports OpenGL ES 1.1 or higher. So this would probably make porting easier.

Hm, the more I think about it, the better it would be to use an established rendering engine like Ogre.

**wraitii** · April 26, 2013

It's definitely a very interesting long term perspective, particuarly as the renderer is quite limited.

On the topic of floats vs fixed, unless serialization and OOS checks can be efficiently changed to disregard all roundin errors, this is something that needs to be kept. Perhaps however it can be optimized a little (20% seems fairly big). Anyway, there are other areas where optimizations are interesting so that's nothing too important.

Edit: this is probably impossible, but another solution would be SP to have floats and Mp to have cFixed. Might require two apps though, à la CoD and perhaps some other games.

**RedFox** · April 26, 2013

Well, Ogre3D supports OpenGL ES 1.1 or higher. So this would probably make porting easier.
Hm, the more I think about it, the better it would be to use an established rendering engine like Ogre.

It does seem so. I have substantial time today to assess the amount of code that would require reimplementing if such a switch would be needed. I wouldn't jump into something this drastical without analyzing the current state - but nevertheless, the graphics engine does require a major overhaul, so its something I definitely need to take up!

The game has to suffer a 20% performance loss because we don't want random out of sync errors popping up. You say most games use a pattern like this, do you have sources for that statement?

Yes, for example int the source of Doom3. And all of these games have managed with it. Having a few small places where float imprecision is taken to account, is much much better than using CFixed everywhere. Floating point maths standard is now over 28 years old and holding strong. We are aware of the rounding errors and if all else fail, just compile with /fp:precise.

This article http://gafferongames...nt-determinism/ has numerous quotes saying that they used the same compiler or replays and multiplayer would not work. The hashing is irrelevant, the little differences will sometimes cause a genuine divergence in the game state. A single multiplayer game could easily have more than a billion floating point operations happening.

Yes, it also brings out the points why fully deterministic systems like that don't work - even a slight change in version number or entity tags will break sync. IMO, a different sync model should be considered in this case and deviate into a more network intensive solution.

The article also notes that using /fp:precise works (as stated by several compiler providers too: MSVC++, GCC) and provides fully deterministic results. Just so you know, 0AD already uses /fp:precise, even though precise floats are slower (since they are rounded), they still have a much better performance than CFixed<>.

It's definitely a very interesting long term perspective, particuarly as the renderer is quite limited.
On the topic of floats vs fixed, unless serialization and OOS checks can be efficiently changed to disregard all roundin errors, this is something that needs to be kept. Perhaps however it can be optimized a little (20% seems fairly big). Anyway, there are other areas where optimizations are interesting so that's nothing too important.

Right now the main branch of the svn will stay like it is regardless, since these changes are too huge to immediately integrate into the game. And as mentioned, multiplayer sync may be broken.

Edit: this is probably impossible, but another solution would be SP to have floats and Mp to have cFixed. Might require two apps though, à la CoD and perhaps some other games.

I'd say we should just forget about CFixed<> and use /fp:precise. Its better to use a language built-in feature than designing a library that slows down the game considerably.

Edited April 26, 2013 by RedFox

**wraitii** · April 26, 2013

From what I know of the code, implementing ogre fully means changing the entirety o the GUI folder and perhaps also converting GUI files(to use ogre. If we actually switch the GuI its a full change)

It means a complete change of the renderer folder.

It means tons of updates in the graphics folder, but that actually shouldnt be too much of a problem. Might require some changes to collada(can't recall if ogre handles collada natively).

Ps/ probably has files that would need to be changed as its the core of pyro genesis, but that also shouldn't be a problem with your experience

The issue is maths/. If you use the ogre provided versions of matrices, vectors, it Ould require changing things everywhere in the code. Perhaps a simpler solution would be to typedefs those types (which would maintain the usage of CWhatever for class names)

The rest should be mostly left as is.

However ogre was not chosen before, and we need to know why to check if the reasoning still applies. I suggest you start a new topic explaining what we would gain (in terms of speed, flexibility, features) and how much time you think it'll take after assessing the necessary changes. Then well be able to take a sensible decision

[DISCUSS] Performance Improvements

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

RedFox

agentx

Yves

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation