Jump to content

tuan kuranes

Community Members
  • Posts

  • Joined

  • Last visited

  • Days Won


Everything posted by tuan kuranes

  1. @Thanduel: Using memory analyzer from http://blog.makingartstudios.com/?page_id=72, run 0ad with it in debug mode , switch to "fragmentation tab" and you'll have nice memory mapping block drawings showing just that. (other tabs still interesting, should be able to spot all the recurring "allocators" code spot, and all the other memory problems)
  2. @scroogie: nothing interesting/visible yet, just separated the pathfind code in a lib, behind a facade interface. Now working on making the visualization (just grid and colored square in opengl, not very different than minimap,just with enough flexibility to handle multiples grid level like tile/vertex) this weekend, all that in a wxwidget gui (using a wxformbuilder thing). Then probably next we, I'll have to make all the changes on the pathfind lib interface in order to handle the "classic" proposed tool set for a pathfinder app. (load/save/edit, algo change, step by step/play/pause/stop/start/end, benchs, etc.). Then tests will begins.
  3. That adds other good point for upgrading: - latest sm version will support sourcemaps (allows eaiser use of js "flavors" or any other language that can be transpiled to js) - asm.js support in future sm release (which means real faster typed vars) Both point, plus the current "c++ like" orientation of current js code could leads to either easier js code production, using transpilers like lljs (would recommend that, because it can transpile to asm.js, and handle memory C like in a very efficient way and is very strict), or just directly asm.js (but that have to wait for sm upgrade to asm.js supported version). On a side note, current js code could benefits for addition/usage of js code quality tools. (jsvalidate, jshint, qunit, plato, benchmarkjs, etc.). Some nice things can be done/automated using gruntjs (like setting a "grunt watch" task which run in background and lint against any js tools any changes while you edit js files) (from a few js code reading, there's lot of room for js optimisation)
  4. Looking at actual long pathfinding rewrite patch, that would help even just there, easing the transition and work collaboration. (the patch is a lot of new files mixed with other components, lots of defines in order to keep old methods, a "b" folder, etc.). That's why I'm advocating a test app with run time selection of algo: experimenting made easy, discussing based on facts. @wraitii the crowd techniques I'm referencing, loa/rvo, does that kind of thing. and secondly, french indeed..
  5. @historic_bruno: s/should rather target/could consider... pardon my french. Short path range is the perf killer as soon as lots of entity are in the game (end of game, or a 6player game), and A*/jps is perf limited, too much computations, too much memory moves, etc. What I pointed in the crowd techniques is just the "local obstacle avoidance" (loa) parts. What's interesting is that it doesn't "comput" path, but rather "react","adjust" direction when moving to the next waypoint (those computed once from longpath) on a very low range, and only if there's a obstacle (does nothing if not), handling all other moving units (and handling "reciprocal velocity obstacle" (rvo) which is very helpful for formation handling as seen in the nice video in my post which uses loa/rvo techniques), and letting each unit have it's own avoidance technique (units size, speed and event tactical style). It's very low computation, that's why crowd techniques use that, and therefore very scalable (~15.000 units moving real time on current cpu). ( here's a "sort of" loa/rvo demo with code and source http://software.inte.../samples/colony 65000 units at 20ms/frame using threads or 200ms without) - Another (related) point is that Its currently very hard and slow to code/test/bench current pathfinding code. Currently it's a all or nothing thing to make changes on the pathfinder... I would propose to consider moving all pathfinding code as a static external lib, with a clear and nice interface, but with under the hood, much more flexibility, as in the ability to select/tweak pathfinder algos at runtime. (see the refrence pathfinding library "abstraction" https://code.google.com/p/hog2/ and how researche use it to test code https://code.google....n%2Ftrunk%2Fsrc) Then we could add a 2D testing/dev wxwidget app that uses it. Would be a top-down 2d view grid, with real-time pathfinding technique selection and tests. scenario loading, map loading, benchmarks, etc. (here's some example of what I'm thinking about for the gui and capabilities http://www.cgf-ai.co...acastarexplorer ) Once we have pathfinding code as a more flexible lib, and a testing app, it's a matter of adding algo to the lib, experimenting and discussing on "testable" algos by everyone.(could also experiment with threading/scheduling pathfinding there.). Once everyone agrees, just making the algo the "default" in 0Ad, minimizing breaking changes, but still allowing them. So adding/testing any other techniques (or formation), or tactical reasonning, would just be adding that to the pathfind lib, then testing/benching it with all scenarios we consider mandatory to be solved by the pathfinder (have to build a nice repository of maps/scenario/case demonstrating the needs.), adding it to svn, and then launch a discuss thread.
  6. For short "pathfinding", where perofrmance is the most lacking, specially with lots of moving units, we should rather target "crowd" tehcniques. Those technique evolve around "collision/obstacle avoidance" techniques: Here's the most relevant papers http://gamma.cs.unc....esearch/crowds/ http://grail.cs.wash...ts/crowd-flows/ Here's an nice implementation of local obstacle avoidance and lots of post and links about it: http://digestingduck...-search-results On formation, a nice idea/feature is "formation sketching" which would give a nice "commander/strategic" capabilities http://graphics.cs.u...on-preprint.pdf
  7. Ok. got the point. now documented in forum why mozilla-js rather than v8. @alpha123 agree. Just want to make sure alternatives are known/considered. Docs, support & community is a big part of reliable 3rd party library, better change soon than late and having more work to do when having to upgrade... Didn't know that js code could be SpiderMonkey-specific. Perhaps "Polyfills" (like es5/es6 polyfills) could perhaps help there? Not sure about same perf, as to my understanding that mozilla-js optimisation where mostly "tracing" (spotting and optimising hot-path) and v8 were more "compilation" (type inference) oriented @Yves: I know I'm late. Just wanted to be sure that alternative are considered (especially when hitting a wall) Yeah, GDB thing is just stack trace naming, still useful though, notably in Cpu profilers that get lost by js stack... My secret point is that mozilla-js is really a pain to get running, especially under win64... still didn't succeed here (would like to be able to release 64bits exe, much faster, notably on the fixed sqrt calls...)
  8. "parsing using regex" is a bad way top describe it, sorry. It's more a "preprocessor": you find #define, #pragma, #include. inside glsl and replace it with values, other glsl content file, etc. Here's an example of glsl include support using boost of what I meant. I do agree that real parsing and compiling is not really useful in runtime, only for offline tools like aras_p's glsl optimizer
  9. I guess that's been covered a lot(couldn't find in forum search, though), and I'm very late there, but why not V8 js engine ( https://code.google.com/p/v8/ )? There's a bigger community around than mozilla js afaik, lots of apps & docs & tutorial around (http://athile.net/library/wiki/index.php?title=Library/V8/Tutorial), nicer docs ( https://developers.google.com/v8/embed ) and also has buitltin profiler/debugging capabilities. (even gdb support https://code.google.com/p/v8/wiki/GDBJITInterface or with some work webkit/chrome dev tools like nodejs did here https://github.com/dannycoates/node-inspector ). Some nice working/example c++ interfacing already exists: https://code.google.com/p/nasiu-scripting/ (that one got me with the "std::vector covered"), and the persistent case is indeed covered ( https://developers.google.com/v8/embed#dynamic ).
  10. If everyone does agree, I would propose to create small simple steps/ticket so that people know where to stand there and can start contributing without colliding ? Here's a modest proposal of tickets that could be created, in order and with dependencies: Wipe all non-shader and arb code Those could be done somewhat in parallel with each other ("somewhat" because using svn instead of git is a pain... for any merging): Get rid of all deprecated opengl immediate calls (deprecated and mandatory for openglES support), turning them in vbo calls (yes, even drawing a texture using a quad. should lead to faster rendering) Remove current SDL "os window" handling and handling it directly. (Makes 0ad able to select different opengl profiles) Get rid of fixed function glmatrix calls (deprecated and mandatory for openglES and opengl 4.0 support) and we already compute most matrix anyway (in selection/pathfind/etc). It's just a matter of using uniform for those matrix (worldmatrix, worldviewmatrix, modelmatrix, etc., note that discussing/defining some uniform name scheme so that all shader share the same would ease things there, see next point) Add GLSL high level handling code: parsing glsl using regex to get 0ad #defines, #pragma, etc. (handle debug lines, #import/include pragmas to concatenate files and making glsl code much more DRY, change #precision and #version pragma at runtime, adding #defines at runtime, etc.), add reload/validate shader ability (faster shader debugging/coding). Idea is to be able to have shared reusable glsl code library. (easier to maintain, smaller codebase) A very good tool for those steps is gremedy opengl profiler/debugger as its analyzer gives nice and precise list of deprecated calls per frame or per run. (and lots of other nice opengl helpers) Once 1 and 2 done, a much easier next move then would be: Total simulation2/graphics separation using command buffers. In 0ad case, could do it higher than opengl commands: that would be something like taking advantage of "renderSubmit", and the list of submitted render entity, which would end being the "command buffer" given to graphics/render worker thread. (faster rendering as soon as 2 core available, which is pretty standard nowadays) Add new renderers: different opengl profile, openglES, debug/test mock renderer, deferred, forward, etc. ( the hidden cost here is defining a scheme to handle per renderer *materials/shader* in a nice way. (deferred and forward doesn't use the same shaders)
  11. Just stopping by listing a nice article on stack allocation: http://geidav.wordpress.com/2013/03/21/anatomy-of-dynamic-stack-allocations/
  12. I think that if you carefully all project frustum corner points (including near plane corner points) on the 2D "terrain" plane, you do then get the biggest rectangle containing all frustum projected points, thus preventing any possible popping in front. In fact, it's more on the conservative case, drawing a bit more than needed. Worst case would be make non-visible terrain tiles and small object submitted to render when camera angle near FPS view, but it's a "rare" use case anyway.
  13. Thanks for the graph link. Definitely needed on wiki. Can I just at least copy/paste it there, even if not exhaustive, that help when searching for it ? In CCmpRangeManager: ExecuteQuery, ResetActiveQuery, GetEntitiesByPlayer, etc. All those methods do uses std::vector from and to js, and are called very frequently. It does show here with "very sleepy" profiler. Js related string, malloc, free are in the top calls. Not that it beats the huge "EconomyManager.prototype.buildDropsites" perf gap in aegis bot, but memory fragmentation is the reason of the overall slowdown over time of 0ad.
  14. Definitely need a discuss thread ? Here's another nice c++11 Great Three part list. Rewrite/copy webpages and scott meyers books might make it tedious to read and finally not be read, that's why I went for just listing strict "do that or do not commit" like guidelines. perhaps we can do listing + other wiki pages explaining each item. I would go guidelines + example + link to deep wiki page ?
  15. A first step would be to use current spatial for culling, just projecting 3D camera frustum on 2D terrain, and calling getrange on that. (would give much faster than doing 3D culling against all frustum planes, and letting reuse same 2D current code) Imho, perf wise, current spatial algo problems are: 1: duplicates: better have only one and only one entity in one tile, thus removing the very costly sort/uniq in getRange 2: contiguous memory: better have a single vector for the whole structure, rather than vectors of vectors. 3: getRange allocating on the heap a std::vector each time All those can be solved in current spatial code, with some simplifications, but those must be addressed some way in a new partitioning scheme. 1: inserting based on entity center or point. 2: algo to rearrange subpart of huge single vector when adding/removing, keeping tile in contiguous ranges (sparse vector) 3: range vector as parameter, and static/member of class that makes the call) The clear advantage of a tree would nicely solve the rect to "point simplification" that could make huge structure being not taken in account when in range query border (depending on aabox size, it stays on higher quadtree nodes, instead of ending in leaves.) Octree is certainly overkill, lots of memory and lots branching per node for near nothing interest in a 2.5D game, and that rules out using the same code for range & pathfinder (using CFVector2D) and culling (octree would need to use 3D vectors) Quadtree would perhaps give some perf improvements. Note that those algo are easy to implement and test, if you agree on keeping same interface as current spatial code, and once spatial code is also shared with frustum, it's just a matter of subclassing. I would even add a kd-tree, geohash, hilbert curve to the tests. And even tests them separately (kd-tree would be very fast for static obstacles (costly add/remove but very), and loose quadtree would very fast for moving units.) Btw, on another topic performance improvement : following current code, do anyone know what's the requirement that make current code re-compute local AaBox for animated+prop object ? Couldn't we use object static aabox, not taking animation or/and props ? It's not as if we need that much precision ? (It's very cpu intensive to make all the vertex transformation, especially if you're not needing it at all CPU side, when gpu skinning is enabled ) If it's really needed on some case (a mechanical crane ?), could that be made optional for those case ?
  16. A note/request while you're on js-c++ interface: with current code, cannot make any method with const or const& parameters, and many of them are string or std::vector, thus a lot of unecessary allocation and copies are made on many js/c++ calls.
  17. @enrique: Once inside the 3D viewer, try the bottom "chained link" icon, and click on "embed" in the popup, it gives a bbcode you can just insert in this forum textArea (click on the "switch" in the textarea toolbar), paste the bbcode they generated and that gives the nice image+link to the viewer, which gives automatically that for your model: sketchfab test.blend (click to view in 3D) @sanderd17: seems dae directly from the repository would need to hack on of the exporters script for upload/sync all models painlessly (and resolving textures path on the fly)
  18. There's a lot of great very high quality 3d modeling in 0ad and mods, but hard to really view and contemplate each of them (or and commenting, etc.) It would be great if we could see them in 3D, and even better if it can show up inside forum/blog post. A simple way to do that would be uploading you work (dae is supported out of the box in the upload form, but there's 3d softwares exporter plugins) inside the forum, using www.sketchfab.com (free) and then just copy/paste the bb embed link auto-generated by sketchfab, which gives : maur_civic_centre.dae (click to view in 3D) (edit: changed to sanderd17 uploaded model as demo) Best would be to go the way like other forums did, allowing iframes from sketcfab, allowing to embed it directly in the forums ( no new window ), but that need forum admin work ( https://sketchfab.com/faq#embed ) Would be nice for all the glorious art discuss/share/view. ( could also leads to nice art galleries on blogs/sites, by civilisations, by artists, by mods, etc.) (need a nice browser with webgl enabled for a good 3D view, like chrome or firefox)
  19. @Thanduel: maybe we could go for a new Discuss thread on C++11 usage ? it does has many benefits and is supported across all latest compilers. ( http://c0de517e.blogspot.fr/2013/05/integrating-c11-in-your-diet.html ) @redfox: indeed, lot of my patch posted did just that converting pass by value to pass by reference in hotspots zones. Pathfinding is but an example, that emphasize the complexity of not using pointers and handling memory ourself, leading to complex memory fragmentation and lots of hidden object copy. There's a lot of other code around with std::vector allocated on the stack inside a method, but still reserving memory on heap and all freed at method call end. Note that there's also a lot of new/delete we surely can avoid. What's the idea on them, can we patch that ? Allocating at max size possible, and reusing them as much as possible ? (inside calculateTerritories, los textures computation, lot of grid<> and sparse grids, etc.) And there's also the free/malloc in profiler that seems mostly related to js. Not sure what we can do there,; Is passing const string& from c++ to js method even possible ?
  20. @quantumstate: sry, fixed stack/heap inversion. As you said, computeShort path container are itself allocated on the stack, but content on the heap, and once container removed from the stack at the end of the method call. Then the heap "allocation space" slot must be returned for general "malloc/free" usage, and therefore might be filled before next "reserve" or "push_back" call. (And here the the A* priority queue can allocate inside the loop too.) The point is that Pool allocation does guarantee that we do use the same spot, that if allocated big enough at start, it has big chances to be contiguous, and also have the benefits of saving a lot of copy constructor calls. (copy from stack allocated to heap allocated, and then to other edge list, etc.) Here on windows, free/new/delete[] and lots of internal stl calls (push_back) are on the top list of a profiler (very sleepy profiler). ( https://docs.google....dit?usp=sharing ) Memory fragmentation is my explanation for slowing overtime, but I don't know the code as well as you, and perhaps it can be caused by something else ? @wraitii: removed the I. linked it from code quality in order to avoir a too huge sized page, but can merge it. I'll try to have detailling pass on descriptions, but was afraid it would look like copy paste from sites linked.
  21. @scroogie: I've reuploaded all patches, after testing them each once on linux (with "svn revert --depth=infinity . && svn patch perf.patch && make -C build/workspaces/gcc" ), just to make sure. But didn't get that error? (wouldn't it be easier and less OT if you could post on patch tracker page ?)
  22. @scroogie posted new patch(es), tested under linux, but didn't get C++0X errors, just c++11 warning in non patched code? Created a specific memory performance thread here: http://www.wildfiregames.com/forum/index.php?showtopic=17310
  23. Following performance thread, and my performance patch/work, I create this thread to discuss how to solve that specific problem. First simplest step, we could agree on basic performance guidelines. The draft is open for correction/additions/etc, but I feel it's important that you can have set of rules guiding code quality regarding performance and memory. There's lot of web pages (gotw) or books (scott meyers book, effective c++, effective stl) that enforces/list thoses. Now, as stated in the wiki page, game slowing over time is due to current memory model (nearly everything is heap allocated inside containers), which implies a lot of new/malloc/delete/free during runtime, which even shows up on the profiler. The problem is called memory fragmentation It gives slower perf over time, not only due to slower memory access, but slower malloc/new (time to find free memory block of optimal size grow over time) and can leads to crash ("heap allocation failed"). Once all code does follow those steps, that will leave us with the current "direct allocation" scheme, which is using object allocated on the heap over reusable object pointers, For instance pathfinding computeShortPath code does fill 6 different std::vector<Edge>, making heap allocation of those Edge (0ad does not using any custom stl allocator), then copy allocation, and copy operation along the way. Meaning for a simple path with a minimum range of 16, a munimum average of : 6*16*16=1536 allocations and even more objects copy operation over a single A* step (that explains the memory fragmentation on its own.) Using pointers, as in std::vector<Edge* > would reduce those to allocation done by ourself, and reducing all the "copy allocation", as each Edge allocated here is duplicated and stored in at least, main EdgeList, and depending on position copied in EdgesLeft,EdgesRight,EdgesTop,etc. Using a memory pool, there would be no allocation at all. (no slowing down over time.) That's why I recommand going over pointers, and even better memory pools. [last edit] s/stack/heap/
  24. @scroogie: thx for trying, made note in patch page about that, most notably about lack of testing under linux, will do linux test/compilation changes next we with patch separation in three, and adding code style check, sry for that.
  25. - sorry about network OT discussion, really just laying down clearly the condition to change from fp if float, and pointing derminism is a real complexity problem for sofwtare dev. - About Memory and performance, best we make another thread then ? (list current memory hotspots. ? Vote what to do about how to remove them ?)
  • Create New...