Jump to content


Community Members
  • Posts

  • Joined

  • Last visited

  • Days Won


Posts posted by scroogie

  1. There are two kinds of concerns.

    The first is that some of our code gets used rarely (like random map code for a specific map or a specific trigger script) and it could cause unnecessary bloat to keep the compiled versions in memory. The other concern is that this setting is meant just for testing and it could have unwanted side-effects like keeping code that isn't accessible anymore. Running one non-visual replay worked quite well though. The memory usage was somewhere between the two graphs of v24 and v31+GGC.

    I should probably investigate a bit further and ask the SpiderMonkey developers. My concern is that they design the heuristics for keeping JIT code just for the use case of Firefox and not for a case like ours where most of the code should be kept.

    If the memory consumption was between the old and new version, that would be of no concern, would it? Even the v24 was well below 100Megs most of the time, if I don't misread your graph. Or are there many complaints about 0ad being memory hungry?

  2. I did a new profile on r15552 using a replay of a 1 human, 3 petra bot match, 20713 turns.

    Time spent in modules:

    55.9% pyrogenesis

    32.3% libmozjs (javascript)

    6.3% libc

    4.1% libstdc++

    0.7% libnspr4

    The previously found (minor) hotspots are gone, the PathFinder appears more prominently now, but there were also a lot more units in the game. Looking at the subtree analysis, CCmpPathfinder::ProcessShortRequests() takes 46.4% and CCmpPathfinder::ProcessLongRequests() takes 15.2% (of course relative to the 55.9% spent in pyrogenesis). The only other subtree that appears prominently is BroadcastMessages

    Looking closer:

    - in BroadcastMessages the cost is mainly caused by the RangeManager (GetNear and UpdateVisibilityData) and CCmpPosition::MoveAndTurnTo (itself due to posting messages)
    - in ComputeShortPath, the most costly subtrees are the EdgeSort and the CheckVisibility functions (there apparently is a huge amount of edges to iterate over)

    - in the long range pathfinder, CalculateHeuristic and DistanceToGoal call isqrt64 awfully lot

    So actually, there is nothing new, except that the situation has improved imho. Apparently there are no low hanging fruit anymore. :/

    I'll keep the results, in case someone is interested.

  3. The thing with performance improvements is that they often only change performance by a small margin, 1% say, but if you do that in a hundred commits, it adds up. :)

    And i've often seen it working like that. You benchmark / profile, address one issue giving you 1% gain, then you benchmark/profile again, and address another hotspot, etc.

    But anyway, this is actually quite offtopic here. ;)

    Just out of curiosity, what *are* the actual disadvantages of doing the move? I think even on LTS distributions like RHEL its no problem to install e.g. gcc 4.7 in case its required.

  4. You can read more about issues and possible solutions in countless papers. I'd suggest these as a starter:

    HPA* Near-optimal Hierarchical Pathfinding: http://www.cs.ualberta.ca/~mmueller/ps/hpastar.pdf

    It explains the idea and implementation of long-range / short-range pathfinding.

    Jump-Point Search: https://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3761/4007

    The algorithm that was chosen for the (incomplete) rewrite.

    Apparently improvements will soon be published: http://www.nicta.com.au/pub?id=7771

    As mentioned in the paper, it could potentially be combined with swamps as well: http://leibniz.cs.huji.ac.il/tr/1188.pdf

    Another approach that seems to be getting a lot of attention is SUB: http://idm-lab.org/bib/abstracts/papers/icaps13.pdf

  5. Hi Yves,

    let me say thanks for your tremendous effort on this, its always a pleasure to read about your progress!

    KeepJitCode was the one where Jit code is not garbage collected, right? Do the Mozilla Devs plan to make this safe in ESR31? I honestly don't know how this could work with a moving GC?

    @FeXoR: To the contrary, all ESR31 versions are actually faster. And quite substantially as well (biggest improvement seems to be around 18% by visual judgement).

  6. If I may say so as an outsider, I think the buildings look fantastic, you Art guys are really doing a splendid job. :)

    Whats described as "cartoonish" here could perhaps really be the colors / contrast / saturation. Maybe some of the textures are too perfect and could use some muckiness, and others could get toned down a bit. But I really think it looks great. You certainly know better than me, just my 2cts.

    • Like 5
  7. Wow, these are really great. The run animation also seems to be very realistic, comparing to what is found on youtube, e.g.

    Although in that fast run for hunting, everything is a bit more emphasised, which makes it look more powerful, e.g. the hind legs seem to jump further to the front, and there is a larger muscle movement. Nevertheless, I think the level of detail is already great. Btw, its amazing to see how this comes together. The art section is really a pleasure to watch. You are all doing a great work.

    • Like 1
  8. Entities are 2D bounding boxes (since they can be buildings, units, flora, fauna...).

    CCmpRangeManager deals with points rather than bounding boxes, so it only has to put entities into a single cell (except for the rare cases where an entity is precisely on the edge between cells, which we could avoid by being more careful about the edge conditions).

    So I'm a bit confused now. I guess entities have an extent, but the RangeManager ignores it (which is what I meant to ask)? Does the range vary very much or is it quite uniform?

    The PFor implementation isn't entirely horrible - though it has quite a few concurrency issues. We don't use OpenMP because we already implement pthreads and there is literally no useful cross-platform library with a PFor. The patch already implements thread pools if you didn't notice, so I wouldn't worry about the implementation efficiency.

    Ah, yes, I didn't see the static initialization. I'm still not sure its nicer than using OpenMP, though. The latter would also allow tasks (if 3.0 would be an option), which might even be better suited sometimes. But anyhow, I would still tend to the component-wise solution.

    If we push the code into a worker thread - something that works in background to the whole engine, we will definitely reduce the time taken, even on single core CPU's. However, we have to account for situations where calculating ranges takes more than 1 turn and how that will affect the simulation state. We will probably have to wait out the results from previous turn regardless. All in all, it might be worth the shot and it would have less concurrency issues than PFor.

    Yes. Regarding the issue of waiting for the results, its virtually a decision between blocking the whole game to wait for a result or live with an old result for some time? Also, you gain some time in a way, which is currently unused because of the sequential processing. If we take the AI again as an example, it could already crunch away while the engine is busy with other stuff. Its in no way trivial, but I guess maintaining a bunch of parallel sections isn't trivial in the long run either?

    Anyway, I digress. I hope you don't mind if "externals" chime in, its an interesting topic. :)

  9. Hi RedFox,

    sorry I don't remember, does the RangeManager treat entities as points or as featuring some radius?


    I just had a look at the parallel for patch. Don't you think that is a quite painful way of doing multithreading in games? First, I'd assume it would be easier to use some existing implementation with threadpools etc., like even OpenMP. Secondly, I'd think the easiest (and most common afaik) way of doing it is completely putting one component in a seperate thread, like the AI. That way you don't need to make sure that every called method is threadsafe etc., which will be horrible to maintain over time, at least in my experience.

  10. Hi RedFox,

    this is what I meant with the cost of getting items in exactly the state you need for SP, but so be it. I'm quite sure your work was still valuable! For example your testbed implementations. Could that code be used by others that want to experiment with the RangeManager? E.g. did you isolate parts so that debugging or performance measurement is easier? I guess that would already help very much.


  11. Just to add to the good explanations of sanderd, if you look at the short range pathfinder, it is basically a points-of-visibility algorithm with high precision obstruction edges connecting corners. I think there was also an article in a GPG issue that explained PoV pathfinding. Maybe it helps to understand the idea behind it. Apart from the mismatches of the two systems, the problem seems to be that it is simply too expensive.

    I don't know how good a NavMesh approach will fare, especially because of the modifications of the obstructions, online replanning, and unit clearance (i.e. point 3), but its definitely worth a try. I assume you'd need some additional subdivision to floodfill (or however you build the navmesh) only parts of the map.

  12. Yes, I know, but what does this change? A sweep is basically the most trivial algorithm one can come up with. Its just traversing all items in the way you keep them. The more interesting question is how to get them to this state, and thats where other algorithms get into the equation. Do you always sort all items, or do you keep them in groups, etc. Im afraid thats what all algorithmic geometry is about. ;)

  • Create New...