Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 2013-04-27 in all areas

  1. Hey guys, haven't been active for a few years, but I've been tweaking around with the pyrogenesis source lately and made some performance improvements: Removed CFixed<> and replaced it with float. All fixed::IsZero checks replaced with ::IsEpsilon(float). This accounts for float imprecision in some cases and removes thousands of conversions from CFixed -> float and float -> CFixed. Improves overall performance by ~20%. Replaced boost::unordered_map<key, value> with std::unordered_map<key, value>. Implemented std::hash<> for the required keys. This gave only a small performance increase of ~5%, but it was worth the shot, since unordered_map is a part of the C++ standard. Most notably, the bad performance was due to unimplemented hash functors (!) . Right now performance can be further improved if all std::map<> instances have a proper hash method implemented. As it turns out however, most of these maps can be removed overall. Result: before 37fps, after 48fps on my i7-720QM and Radeon HD5650 laptop. It took some 250 files to modify and some minor redesign of the math library, but the benefit was worth it: faster game, more maintainable code. Right now the engine has a lot of inefficiencies that are mostly caused by naive implementations: 1. Inefficient Renderer: Problem: Currently the rendering system constructs a sorted map of models and their shaders Every Frame. Not only is this bad that such a large map and vector is created every frame, it could be redesigned to achieve the same effect with no maps or vectors recreated. Solution: the models can be sorted beforehand, during context initialization (!) and grouped under specified shaders and materials. Result: This should give a huge (I'm not kidding) performance boost, so this will be the next thing I'll implement. 2. Entity/Actor/Prop XMLs: Problem: Not only is XML inefficient to parse, it's also much harder to actually read and edit. To make it worse, the filesystem overhead for these hundreds of files is huge (!). Loading times for weaker filesystems can be very long. Solution: use a simplified text parser for entity/actor/prop parsing to increase speed. Group common actors/entities into single text files (e.g. - units_athenian.txt) while still retaining the 'modability' of the game. All current data can be easily converted from XML to the desired (custom) format. Result: Loading times and memory usage will decrease dramatically (which is a pretty awesome thing I might add). 3. String Translation: Problem: It would seem as if this had no effect for performance or memory, but you should think again. Creating a translation (multilingual) system offers us a cheap ability to group all the strings together into a huge memory block. Solution: Load game/unit strings into a large memory block and simply share out const pointers to the strings. Something like this: "unit_javelinist\0Generic Peltast\0unit_javelinist_descr\0Peltasts are light infantry...\0". Result: Every heap allocation costs some memory for the system heap bookkeeping and on Win32 platform it's around ~24 bytes. If you have 1000 strings of size 24, that will add up to 24KB and totaling at 48KB used. With the translation system this would remain ~24KB. Of course, there are much more strings in the game, so the result would be noticeable. 4. Entity-Component System is inefficient: Problem: While some would say its having a map<int, map<int, *>> shouldn't have any real effect on performance (since the profiler(?) says so), I would recommend to reconsider and rethink the algorithm of the entire system. Maps are very memory intensive and too many vectors are created/destroyed constantly while the look-up of Components remains slow. Solution: Give the good old ECS pattern a slight Object-Oriented overhaul. Create an IEntity interface, give concrete implementations like UnitEntity : IEntity a strong reference to a Component in its class definition. This will remove the need for component look-up. The message system can also be redesigned to be more direct. And finally, Managers can be divided into frame slots across frames, since a lot of the data can be asynchronous. Result: This will give a huge overhaul and streamline the simulation engine; making it more maintainable, easier to program and much much faster and memory efficient. 5. Naive Pathfinder: Problem: The current implementation runs a very time consuming algorithm over the entire span of the region, making it very inefficient. Solution: Redesign the pathfinder to include a long-distance inaccurate path and a short-distance accurate path. The accurate path is only updated incrementally and for very short distances, making this method suitable for an RTS. Result: A huge performance improvement for moving units. 6. Naive Collision/Range/Obstruction detection: Problem: Currently a very naive subdivision scheme is used for collision, range and obstruction. Even worse is that all of these modules are decoupled, duplicating a lot of data and wasting time on multiple updates of the same kind. Solution: A proper Quadtree embedded into the Entity Component System would help keep track of all the entities in a single place (!) and the Quadtree structure itself would make ANY(!) range testing trivial and very fast. Result: Performance would increase by at least 10 times for the Collision/Range/Obstruction detection components. 7. AI is a script: Problem: I can't begin to describe my bafflement when I saw UnitAI.js. AI is something that should be streamlined into the engine as a decoupled yet maintainable module - a component of ECS, yet in a little world of its own. Solution: Translate UnitAI to C++ and redesign it. In a FSM case (such as UnitAI.js), the controller pattern would be used to decouple actions from the FSM logic and make it maintainable. Result: AI performance would increase drastically and an overhauled system wouldbe far more maintainable in the future, leaving room for improvements. -------------------------------------------------------------------------------------------------------------------- This is not exactly a duplicate of the old Performance Optimisations thread - I've made these observations and conclusions based on the current source and my past experience as a C++ developer. Furthermore, the reason why I'm bringing out all these problems, is because I intend to solve all of them and I also have the required time and skills to do it. Michael as the Project Leader has invited me to take up as a full-time developer for 0AD after my mandatory service in the Defense Forces finishes in May, so right now I'll be focusing on small changes and getting fully acquainted with the source. As for me, my name is Jorma Rebane, I'm a 22 year old Software Developer of real-time systems. My weapon of choice is C++, though often C and assembler is required. I've worked with several proprietary and open-source 3D engines in past projects as a graphics and gui programmer, so I'm very comfortable around DirectX and OpenGL. ----- Hopefully these points will bring out a constructive discussion regarding 0AD performance and improvements. Cheers!
    1 point
  2. ok, first pass at playlist functionality is in. Let me know if there are any problems.
    1 point
×
×
  • Create New...