Jump to content

Yves

WFG Retired
  • Posts

    1.135
  • Joined

  • Last visited

  • Days Won

    25

Everything posted by Yves

  1. The PPA repository is updated: https://launchpad.ne...fg/+archive/0ad The official Ubuntu repositories usually aren't updated (only for new versions of Ubuntu).
  2. Well done, transparency is very important when dealing with money!
  3. According to the comment this is already meant to be a performance improvement. It just avoids looping through all entities when creating the collection. I didn't measure if it makes a difference in practice. It will look like a mess in C++ too. We should only move things to C++ if we can prove that it makes an important difference. If we have entity collections in C++ we still need to query them from JS and we have to create temporary objects etc... If this transition from JS to C++ happens too often we could also end up with lower performance.
  4. I'm sorry it looks like the big performance problem in Shared Apply Entities Delta was caused by replacing a for .. in loop with an array.foreach. This loops through all the "undefined" indices between entity ids. If you have one entity with id 3000, it will loop through 3000 indices. Now it's between 3 and 5 ms which actually is still too much for a finished game but probably not an issue for us right now. The AI Script section still uses way too much time though. Entities isn't an array, it's an object. the entities are indexed by the id which will become non-consecutive as soon as an entity gets destroyed. Deleting entities can actually be a real performance problem because the JIT compiler can't apply certain optimizations and the function calling delete won't be JIT compiled.
  5. Currently it's entirely in JS. Check binaries/data/mods/public/simulation/ai/common-api-v3/shared.js somewhere around line 230. One reason why it takes longer is that at some point the AIs start spamming masses of soldiers. This function basically updates all entity collections for each changed prop in each entity using a slow for .. in loop.
  6. Hmm no sure how good saved games work at the moment. If it works, it's a very good idea for some other tests I need to do where the simulation behavior is slightly different. I can just save a game at turn 10000 and measure a short time to avoid that the changes affect it too much. Thanks for that idea .
  7. Definitely some very nice improvements. Keep up the good work! How long did the game run in these graphs? One thing I'd like to point out is that currently the major bottlenecks are the AI and the pathfinding. You won't see any of these problems if you test only a few seconds in game but they bring down the performance from 60 fps to about 3 FPS with 4AI players, all graphics settings disabled and zoomed in as much as possible. The problem is that because most players seem to play in singleplayer mode, the user experience will only improve significantly if we solve these problems too. Try something like this, set the game to run on "insane" speed and see how it performs after around 10000-15000 turns. ./pyrogenesis -quickstart -autostart="Oasis 04" -autostart-ai=1:qbot-wc -autostart-ai=2:qbot-wc -autostart-ai=3:qbot-wc -autostart-ai=4:qbot-wc I think the main reason is the shared script applying entities delta. At this point there are so many entity changes in each turn that it takes forever. I've posted a graph showing that here. As far as I know wraitii is already thinking about some changes to address this problem. I will finish the Spidermonkey upgrade first before getting into more trouble
  8. I've updated to revision 13766 and did some cleanups to reduce the patch-size from 23791 lines to 21858 lines. My plan is to split the patch up into several independent patches that can be tested and committed separately. Working with such a huge patch has become a real pain and I spend way too much time merging changes. So the main changes in this patch are: Updated to rev 13766 Integrated stwf's changes. I had nearly all of his changes related to the soundmanager and the scripting interface in my patch (since I needed them and they hadn't been committed yet). He has committed them in the meantime, so I could remove them from the patch. Moved the changes from the qbot-wc folder to the aegis folder because it was renamed Removed modifications of tabs and spaces that were added because I forgot to disable a very annoying feature of codeblocks. Merged the serialization changes by historicbruno. I haven't tested it yet, so it could be partially broken in the patch. EDIT: I've tested the random map generation again to check if I can use h4writer's workaround which requires much less code changes. I noticed that there's no workaround required at all with the current version (v26). Apparently they fixed the problems with arguments objects. ./pyrogenesis -autostart=alpine_lakes -autostart-random=123 -autostart-size=256 v26 Workaround randInt1, randInt2, randFloat1, randFloat2: Load RMS: 7.03415 s Load RMS: 7.10919 s Load RMS: 7.07667 s v1.8.5 unmodified: TIMER| Load RMS: 19.958 s TIMER| Load RMS: 19.9252 s TIMER| Load RMS: 20.0246 s V26 unmodified TIMER| Load RMS: 7.07735 s TIMER| Load RMS: 7.04358 s TIMER| Load RMS: 7.08466 s What a nice improvement. What a pity that this is the only spot where v26 currently performs better than v1.8.5. I've even compared the minimap on a printscreen. It really generates the same map. I replaced the attached patch with a new one without the random map workaround and with some other cleanups (now 21408 lines). Yes, that was a helpful "rediscovery". This profiling tool has been available since a long time but everyone seems to have forgotten it somehow. patch_26_v0.6.diff
  9. OK, here's the updated patch (still WIP and with many known problems). I'm currently using the the mozilla central version 142981:a71cedddadd1 with a small patch that is already included in newer versions of Spidermonkey (but these could contain other API changes). Check the step by step instruction from the previous patch. It should still apply (except the Spidermonkey version used of course). Spidermonkey patch to compile with GCC: diff -r a71cedddadd1 js/src/vm/Stack-inl.h --- a/js/src/vm/Stack-inl.h Sat Aug 17 19:50:37 2013 -0700 +++ b/js/src/vm/Stack-inl.h Sun Aug 25 14:01:49 2013 +0200 @@ -228,9 +228,12 @@ uint8_t * InterpreterStack::allocateFrame(JSContext *cx, size_t size) { - size_t maxFrames = cx->compartment()->principals == cx->runtime()->trustedPrincipals() - ? MAX_FRAMES_TRUSTED - : MAX_FRAMES; + size_t maxFrames; + if (cx->compartment()->principals == cx->runtime()->trustedPrincipals()) + maxFrames = MAX_FRAMES_TRUSTED; + else + maxFrames = MAX_FRAMES; + if (JS_UNLIKELY(frameCount_ >= maxFrames)) { js_ReportOverRecursed(cx); return NULL; patch_26_v0.3.diff
  10. I made some measurements concerning the topic we discussed today. We really need a better solution for applying entities delta. This was a 2vs2 AI game that ran for about 15000 turns.
  11. I've decided to get back to Spidermonkey issues. I uncommented the iteration order workaround and measured performance again. The simulation state in the end isn't exactly the same because the workaround is disabled, but it shouldn't differ too much. In the end we have 3120 entities with 1.8.5 and 3133 entities with the new version. The results I got were bad unfortunately. Before my simulation state fixes, we had a performance improvement compared to 1.8.5 in the end of the game in many functions (see previous measurements). This Improvement is gone now and I assume it was only due to bugs that caused the AI to develop worse. Now most functions have more or less the same performance graphs with 1.8.5 and the new version (v26 at the moment). There are some tiny differences where sometimes one version is better and sometimes the other. There are a few peaks where v26 performs much worse and these are the ones I intend to fix. I officially stopped expecting a better performance from v26 now. The tracelogging graphs have shown that there aren't many places left where JIT compiling is broken for some reason and the profiling has shown that there's currently only one place where we really get a significant improvement. We have seen that Ion Monkey performs very well for the random map script where it now takes about 1/3 of the time to generate the map. Unfortunately after seeing so much other code that doesn't get improved, my only conclusion can be that this only applies to some very rare cases that are probably common for benchmarks but not for real world applications. The random map script is more like a usual benchmark that does the same thing many many times in a loop. The improvements that work in these cases apparently don't apply to more common real world scenarios. All I try to achieve now is getting the same performance as 1.8.5 again. If there are any hints that there could still be a general problem somewhere in our code, I will investigate that of course. I will also help as good as I can if the Mozilla people have questions or want to do more analysis with our code (will also post the updated patch soon). The Spidermonkey upgrade is still required, even if it's only to be updated again and to avoid being stuck with an old unsupported version. To get back to 1.8.5 performance it looks like only a few spots need to be fixed, mainly these: ExecuteItems (already analyzed a bit and reported in bug 897926) BuildNewDropsites (EDIT: this also calls map-module.js:39, so it's probably also bug 897926)
  12. I've worked the past weeks on finding several causes of simulation state differences. As long as v1.8.5 and the new version don't produce the same simulation state in the end of the game, it's difficult to tell if I introduced any bugs and also the performance measurements could be affected. At the moment I have found most of the possible differences and in the test setup I use, both versions produce the same simulation state. There are little differences because some properties are set to false instead of being deleted but they don't cause any difference in gameplay and just appear in the serialized state because I haven't adjusted the serialization function yet. I'm going to describe one of the most annoying differences that was quite hard to analyze. I'm looking at entity collections, one of the core components of the AI. Entity collections are used a lot and therefore making them faster is important. They are essentially just objects containing other objects in numerically indexed properties and must support removal of entities. The main problems are: 1. Using delete on object properties is not supported by IonMonkey and the function or loop where it is used can't be JIT compiled. 2. Iterating over sparse arrays (arrays with non consecutive indexing) or using for .. in for iterating over object properties is slow. What I tried first to avoid deleting properties is simply setting them to false. When iterating over the properties I have to ignore those that have the value set to false. First it seems to be a bit ugly and cumbersome, but fortunately most of that is hidden in the entity-collection's forEach implementation. However, there's a very subtle difference that caused a lot of problems (and still isn't completely solved). The ECMA standard specifies that the iteration order of for .. in is undefined, meaning that you can't be sure you get the lowest index first or anything like that. Well, if the standard says the order is undefined, why do we depend on a specific order? We don't really depend on it, we just have to be sure it's the same order each time. For example there are some places where we just pick the first X entities and do something with them. We assign the first X idle workers to chop trees. We don't care which workers that are in the first place, but it could make a difference! At the moment Spidermonkey iterates in ascending order for numeric properties with indices that are smaller than a specified limit (don't know where that limit is exactly). After that it depends on the insertion order of properties in FIFO order (First In First Out). The tricky bit is that the order is different if you add a property, delete it and add it again compared to adding it, setting it to false and setting it to the same object again. Check out this example: Output: There are two problems with this workaround. It must be applied to 1.8.5 and the new version to make the simulation state identical. It completely destroys the performance (about 18 times slower in this example). It adds peaks of several seconds and causes the graph to scale in a way that makes any other peformance analysis impossible. I don't yet understand why this is so bad for performance. It iterates over the properties twice, but that should only make it about half as fast. Since it looks like I have to adjust the 1.8.5 version anyway, I could as well just add the whole change to avoid deleting properties and test if 1.8.5 behaves the same way. Another approach would be finding all places where we depend on the iteration order of for .. in and change the code to be explicit about which entities should be used. For assigning idle workers we could use the closest one for example instead of just picking and arbitrary worker. @h4writer I wanted to fix the iteration order problem before posting the current patch. Unfortunately I couldn't come up with a good solution today. Do you have an idea how we could store entity collections more efficiently?
  13. I think we currently lack a concept what the advantages and disadvantages of different food resources are meant to be. It's nice to simply have more variety of food resources but it should also have some impact on gameplay. In Age of Kings there was basically hunting for a boost in the beginning of the game and the berry-bushes for a fast start. After that these resources were more or less useless. Fishing was kind of an experiment and it depended on how the game developed if the investment paid back in the end. The main food source were farms. All resources were limited and depleted after a while (you can rebuild farms, but it costs wood). The only infinite resource income was trade. In our case I feel like we start mixing and mashing these concepts together without having a plan what it should do to the gameplay. Making farms, berries and fishes infinite decreases the value of trade. Trade is still useful because currently only food is an infinite resource and the bartering rates will become very bad if only food is used to barter other resources. Fields now only support a limited amount of gatherers (5) and you get food faster if you use less workers per field. The idea was to increase the value of territory and add a disadvantage to farms (you need a lot of territory to use them). In my opinion this doesn't have enough impact at the moment because you need to expand for other resources anyway and have plenty of space to place your fields. This is related to the question about the way resources regenerate. In my opinion the sigmoid approach is too complicated for players and doesn't add much value. It could be different if for example you could set a "gathering policy" like "clearing" or "sustainable" and your workers would automatically use the resources according to this policy. If you have to manually micro-manage your units, that's too much IMO. Such a policy-approach would be doable, but as long as we don't have the big picture it doesn't make sense. We're at a point now where we should start approaching the final gameplay and therefore we need such a "big picture" before we can tune little aspects like the resource regeneration algorithm. Some of the advantages/disadvantages of resources we could use: Required space (you need to expand and it's more difficult to defend your workers) Required micro-managing (We should generally avoid micro-managing as much as possible at least for the long-term resources. It could be part of the design for e.g. hunting in the beginning of the game) Available at which stage? (currently all resources are available from the beginning) Required number of workers What kind of workers are required (females are better for fields but they can't be used for fighting) Does it require a special resource or can it be used anywhere? (fish, berries, farm-land etc.) Can the resource be depleted? Is the maximal gather rate limited? (I mean something like regenerative berry bushes compared to fields where you can build as many as you want) How much is the initial cost to enable this kind of gathering (Build a dock + a fishing boat is quite expensive if you don't need the dock anyway) How easily can the enemy damage your economy. (what if, for example we made fields much more expensive and make them vulnerable to fire-arrows?) We could add some very interesting and strategical aspects if we put some more thoughts into the different sources of food and their pros and cons for the player.
  14. I bought Rome Total War yesterday (not Rome II). It's quite fun to play although the units have some difficulties with pathfinding in cities.
  15. I think it could be very difficult to make it look natural and after thinking about it a bit more I'm not sure it's going to work even if you make the animations perfect. If you like to try it, that would certainly be interesting but don't do it just because I suggested that .
  16. I've tested the bounds-check patch. The test-script doesn't bailout multiple times anymore. Also in shared.js:350 and shared.js:360 it only records two bailouts. Unfortunately that only seems to cover the real issue because the performance is still as bad as before and also the tracelogging stats don't improve (still most of the time is spent in ion_compile or script baseline). Here's a graph of the profile section "Run Workers" which spends nearly all the time in shared.js:360(getMetadata) and shared.js:350 (setMetadata). Tracelogging stats before the patch (filename, times called, times compiled, time total ...): simulation/ai/common-api-v3/shared.js:350 16752 13 0.04% ion_compile : 75.03%, script ionmonkey : 20.44%, script baseline : 4.44%, script interpreter : 0.09%, simulation/ai/common-api-v3/shared.js:360 10728 10 0.02% ion_compile : 18.50%, script ionmonkey : 0.04%, script baseline : 80.61%, script interpreter : 0.85%, Tracelogging stats after the patch (filename, times called, times compiled, time total ...): simulation/ai/common-api-v3/shared.js:350 5679 5 0.00% script ionmonkey : 3.54%, script interpreter : 3.03%, ion_compile : 53.20%, script baseline : 40.24%, simulation/ai/common-api-v3/shared.js:360 14771 10 0.03% script ionmonkey : 17.34%, script interpreter : 0.69%, ion_compile : 2.29%, script baseline : 79.69%, Some strange observations about the data: Spidermonkey 25 requires about 5-6 times longer in average (not counding the peaks, just looking at the base line). RunWorkers isn't a big fraction of the whole turn duration, we are talking about 1-2 ms. You can see that it's pretty much the combined duration of getMetadata plus setMetadata. Now if the whole turn takes roughly 20 ms (rounded up), 1-2 ms is between 5% and 10%, not 0.3% or 0.6% as the "time total" column shows. I don't know what this column actually shows, but it's probably not very relevant for the real performance we get in the end. The value in the column "times called" is a bit strange. Our simulation is deterministic and the function should be called exactly the same number of times. Maybe it's related to inlining of the function. EDIT: attached another ion.cfg (not directly related to that post) ion_entity-js-456.cfg.zip
  17. I've attached the log with zoom-level 1000000. Your patch for bug 899051 is applied, but not the workaround for the typebarrier issue. I think it already looks much better, but I haven't measured the difference yet. I'm still checking what's the best way to send you the 5.6 GB raw logs you requested. out6.html.zip
  18. How do you start the game exactly? Do you use the launcher in Unity or do you run it from the terminal? I remember an issue with relative paths if you start it the wrong way (don't remember how exactly).
  19. I was able to find a workaround for the bounds check bailouts. Apparently Ionmonkey has some problems with arrays that contain holes. I knew this kind of arrays aren't optimal but it's a bug if it causes Spidermonkey to keep bailing out. I didn't fix all occurences of these bailouts because at some locations it's more difficult to work with arrays (or objects with numeric properties) that don't contain holes. That issue should also be fixed in Spidermonkey to fix the remaining occurences. Unfortunately I was disappointed again by the result. The improvement is hardly visible on the graph and I expected a lot more when I saw the "70% execution time" (which is now less than 5% with the workaround). One reason could be that a lot of time is spent in these functions before the first turn even starts. This means it's not visible in the graph if these functions execute faster. I've measured that the time passed until the first turn starts decreased from about 4 seconds in release mode with parallel compiling enabled to about 3 seconds. If parallel compiling is disabled, it decreases from about 13 seconds to 4 seconds. It doesn't make a difference if parallel compiling is enabled or disabled later, so that doesn't explain where the rest of the improvent "hides". Maybe it's a measuring issue with debug mode (I created the first tracelog in debug mode because I didn't know that it's also enabled in release mode). I've also attached the updated tracelog including the workaround and created in release mode. EDIT: I investigated a little further and figured out that the same problem that causes the type-barrier issue also seems to cause the bounds check issue. The problem only occurs if the loop is there inside fun. If I pass i as first argument to fun instead of rnd, it only bails once, so the non-consecutive access is also important. var obj = {} fun = function(index, value) { for(var i=0; i<4000; i++) i -= 0.1; obj[index] = value; } for (var i=0; i<20; i++) { var rnd = Math.floor(Math.random() * 3000); fun(rnd, true); } patch_bound_check_bailouts_v1.0.diff bounds_check_fixed.html.zip
  20. Check premake4.lua in 0ad/build/premake. If you change anything in this file you have to run update-workspaces again and it will add the flags to the makefiles. The other way for temporary tests is to edit the makefiles directly in build/workspaces/gcc.
  21. H4writer suggested creating a tracelog. It's a new feature in Spidermonkey 25 and should allow us to identify code that has problems with JIT compiling. There were a few difficulties generating that tracelog but finally it worked. How to create such a file: It will create a graph like that (notice the information it prints when hovering a block). white: interpreter ionmonkey compilation ionmonkey running baseline running GC And it also prints some very useful statistics: The full html file is attached. H4writer has figured out the cause of the typebarrier issue in the meantime and he knows a way to work around it in JS code, but there's no patch yet to fix it in Spidermonkey. The issue is tracked in bug 897926. He also created a meta-bug for 0 A.D. peformance improvements. These are fixes that we discover during debugging and profiling but the issues aren't specific to 0 A.D. and could slow down other applications too. I think we are aiming for v25 or v26 now because v24 is already out of the development branch and we need to get our fixes into Spidermonkey. I think that's better than applying a lot of JS workarounds for v24. EDIT: I realized that I forgot to set limitScriptSize to false in Ion.h, so large script didn't get ion compiled. I've attached a new tracelog with this setting set to false (iontrace_no_limit_script_size.zip). It confirms the issue in terrain-analysis-pathfinder.js:146 which we have already discovered (70.50% total time) and also the type-inference issue in map-module.js:39 (3.3% total time). I'm going to look at the issue in queue-manager.js:128 next (4.97%). iontrace.zip iontrace_no_limit_script_size.zip
  22. Indeed, the animations look nice. A bit off topic but I've seen these animals recently - I think they are also called "springbock" in English. Would be nice to have them animated (probably people could think the animations look unnatural though).
  23. Tuan Kuranes has posted this link which is about some general performance guidelines. Avoiding "delete" for properties is very important from my experience and probably won't change too soon. The other issues we discovered are mostly issues in Spidermonkey that should be fixed soon. Everyting seems to change quite rapidly, so too specific guidelines for Spidermonkey will probably be outdated a few weeks or months later. I could write some instructions how to detect performance problems like using Ion spew if I eventually succeed at it Edit: I've attached the Ion.cfg for the typebarrier issue in map-module.js:39 (oh what a stack of file-endings) ion.cfg.tar.lzma.txt
  24. The bailout inside the loop mentioned above is caused by a typebarrier. [Bailouts] bailing from bytecode: loopentry, MIR: typebarrier [73], LIR: typebarrier [85] I've found a good article that describes some of the basics of type inference. Then I asked on the #jsapi channel what could cause this issue and jandem had a look at it. Inside the loop it detects a type it didn't expect and then has to regenerate that code. It shouldn't happen again and again, so it's probably a bug. I'll try to provide some more information like the ion.cfg log and maybe a standalone script to reproduce the issue. He also pointed me to Bug 894852 with a patch that should fix the GetPopertyPolymorphicV bailout issues. I tested the performance difference and the number of bailouts with the V25 development branch and the different patches: V25 branch without patches: 336021 bailouts V25 branch + LCallGeneric patch: 8057 bailouts V25 +LCallGeneric patch + PropertyLookup patch: 2819 bailouts There are a much less bailouts now, but unfortunately the performance difference is so small that it's hard to tell if there's even a difference. I still think that fixing some more bailout issues will improve performance. It's very important what part of code causes the bailouts, how often this code is called and how well it could be optimized otherwise. For the typebarrier issue we know that there is a performance problem in this specific loop.
×
×
  • Create New...