Jump to content
Sign in to follow this  
Yves

[Discussion] Spidermonkey upgrade

Recommended Posts

I've decided to get back to Spidermonkey issues. I uncommented the iteration order workaround and measured performance again. The simulation state in the end isn't exactly the same because the workaround is disabled, but it shouldn't differ too much. In the end we have 3120 entities with 1.8.5 and 3133 entities with the new version.

The results I got were bad unfortunately. Before my simulation state fixes, we had a performance improvement compared to 1.8.5 in the end of the game in many functions (see previous measurements). This Improvement is gone now and I assume it was only due to bugs that caused the AI to develop worse.

Now most functions have more or less the same performance graphs with 1.8.5 and the new version (v26 at the moment). There are some tiny differences where sometimes one version is better and sometimes the other.

There are a few peaks where v26 performs much worse and these are the ones I intend to fix.

I officially stopped expecting a better performance from v26 now. The tracelogging graphs have shown that there aren't many places left where JIT compiling is broken for some reason and the profiling has shown that there's currently only one place where we really get a significant improvement. We have seen that Ion Monkey performs very well for the random map script where it now takes about 1/3 of the time to generate the map. Unfortunately after seeing so much other code that doesn't get improved, my only conclusion can be that this only applies to some very rare cases that are probably common for benchmarks but not for real world applications. The random map script is more like a usual benchmark that does the same thing many many times in a loop. The improvements that work in these cases apparently don't apply to more common real world scenarios.

All I try to achieve now is getting the same performance as 1.8.5 again. If there are any hints that there could still be a general problem somewhere in our code, I will investigate that of course. I will also help as good as I can if the Mozilla people have questions or want to do more analysis with our code (will also post the updated patch soon).

The Spidermonkey upgrade is still required, even if it's only to be updated again and to avoid being stuck with an old unsupported version.

To get back to 1.8.5 performance it looks like only a few spots need to be fixed, mainly these:

ExecuteItems (already analyzed a bit and reported in bug 897926)

post-7202-0-14365900-1377361369_thumb.gi

BuildNewDropsites (EDIT: this also calls map-module.js:39, so it's probably also bug 897926)

post-7202-0-93125300-1377361369_thumb.gi

  • Like 1

Share this post


Link to post
Share on other sites

OK, here's the updated patch (still WIP and with many known problems).

I'm currently using the the mozilla central version 142981:a71cedddadd1 with a small patch that is already included in newer versions of Spidermonkey (but these could contain other API changes).

Check the step by step instruction from the previous patch. It should still apply (except the Spidermonkey version used of course).

Spidermonkey patch to compile with GCC:


diff -r a71cedddadd1 js/src/vm/Stack-inl.h
--- a/js/src/vm/Stack-inl.h Sat Aug 17 19:50:37 2013 -0700
+++ b/js/src/vm/Stack-inl.h Sun Aug 25 14:01:49 2013 +0200
@@ -228,9 +228,12 @@
uint8_t *
InterpreterStack::allocateFrame(JSContext *cx, size_t size)
{
- size_t maxFrames = cx->compartment()->principals == cx->runtime()->trustedPrincipals()
- ? MAX_FRAMES_TRUSTED
- : MAX_FRAMES;
+ size_t maxFrames;
+ if (cx->compartment()->principals == cx->runtime()->trustedPrincipals())
+ maxFrames = MAX_FRAMES_TRUSTED;
+ else
+ maxFrames = MAX_FRAMES;
+
if (JS_UNLIKELY(frameCount_ >= maxFrames)) {
js_ReportOverRecursed(cx);
return NULL;

patch_26_v0.3.diff

Share this post


Link to post
Share on other sites

How do you get these graphs, btw, Yves? Is it possible to do some more specific stuffs? I don't really know how to handle the in-game profiler…

Share this post


Link to post
Share on other sites

How do you get these graphs, btw, Yves? Is it possible to do some more specific stuffs? I don't really know how to handle the in-game profiler…

I've described that in the wiki.

Share this post


Link to post
Share on other sites

Thanks a ton for this wiki page, Yves', I'm getting pretty cool graphs for AI profiling which is really helping me see what needs to be done and what doesn't.

Share this post


Link to post
Share on other sites

I've updated to revision 13766 and did some cleanups to reduce the patch-size from 23791 lines to 21858 lines.

My plan is to split the patch up into several independent patches that can be tested and committed separately.

Working with such a huge patch has become a real pain and I spend way too much time merging changes.

So the main changes in this patch are:

  • Updated to rev 13766
  • Integrated stwf's changes. I had nearly all of his changes related to the soundmanager and the scripting interface in my patch (since I needed them and they hadn't been committed yet). He has committed them in the meantime, so I could remove them from the patch.
  • Moved the changes from the qbot-wc folder to the aegis folder because it was renamed
  • Removed modifications of tabs and spaces that were added because I forgot to disable a very annoying feature of codeblocks.
  • Merged the serialization changes by historicbruno. I haven't tested it yet, so it could be partially broken in the patch.

EDIT:

I've tested the random map generation again to check if I can use h4writer's workaround which requires much less code changes. I noticed that there's no workaround required at all with the current version (v26). Apparently they fixed the problems with arguments objects.


./pyrogenesis -autostart=alpine_lakes -autostart-random=123 -autostart-size=256

v26 Workaround randInt1, randInt2, randFloat1, randFloat2:
Load RMS: 7.03415 s
Load RMS: 7.10919 s
Load RMS: 7.07667 s

v1.8.5 unmodified:
TIMER| Load RMS: 19.958 s
TIMER| Load RMS: 19.9252 s
TIMER| Load RMS: 20.0246 s

V26 unmodified
TIMER| Load RMS: 7.07735 s
TIMER| Load RMS: 7.04358 s
TIMER| Load RMS: 7.08466 s

What a nice improvement. :)

What a pity that this is the only spot where v26 currently performs better than v1.8.5.

I've even compared the minimap on a printscreen. It really generates the same map.

I replaced the attached patch with a new one without the random map workaround and with some other cleanups (now 21408 lines).

Thanks a ton for this wiki page, Yves', I'm getting pretty cool graphs for AI profiling which is really helping me see what needs to be done and what doesn't.

Yes, that was a helpful "rediscovery". This profiling tool has been available since a long time but everyone seems to have forgotten it somehow.

patch_26_v0.6.diff

  • Like 2

Share this post


Link to post
Share on other sites

H4writer fixed the typebarrier issue in Bug 897926.

Unfortunately the same function still records bailouts very often and the peaks are still there.

The bailout signature before the patch was:

[Bailouts]  bailing from bytecode: loopentry, MIR: typebarrier [73], LIR: typebarrier [85]

Now it doesn't bailout less but there are two different signatures:

[Bailouts]  bailing from bytecode: getaliasedvar, MIR: typebarrier [818], LIR: typebarrier [1308]
[Bailouts] bailing from bytecode: nop, MIR: constant [0], LIR: osipoint [1272]

I'll try writing a small testcase for these issues this weekend.

In my logs there are 85 bailouts in map-module.js:39


grep "Bailing to baseline simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt | wc -l
85

27 of 85 Bailouts have the signature "bailing from bytecode: nop ..." and are recorded after mapmodule.js:123


grep "Resuming after pc offset 1889 (op getprop) (line 123) of simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt | wc -l
27
grep -B 80 "Resuming after pc offset 1889 (op getprop) (line 123) of simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt > filtered_by_mapmodule_39_ln123_bail.txt
grep "Bailing to baseline simulation/ai/aegis/map-module.js:39" filtered_by_mapmodule_39_ln123_bail.txt | wc -l
27
grep "bailing from bytecode: nop" filtered_by_mapmodule_39_ln123_bail.txt | wc -l
27

4 of 85 bailouts are caused by "somehting else"... I don't care at the moment because it's only 4 bailouts.


grep "(line 120) of simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt | wc -l
4

46 of 85 Bailouts have the signature "bailing from bytecode: getaliasedvar, MIR: typebarrier ..." and are recorded after mapmodule.js:136


grep "Resuming at pc offset 1966 (op getgname) (line 136) of simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt | wc -l
46
grep -B 80 "Resuming at pc offset 1966 (op getgname) (line 136) of simulation/ai/aegis/map-module.js:39" Bailout_log_debug.txt > filtered_by_mapmodule_39_ln136_bail.txt
grep "Bailing to baseline simulation/ai/aegis/map-module.js:39" filtered_by_mapmodule_39_ln136_bail.txt | wc -l
46
grep "bailing from bytecode: getaliasedvar, MIR: typebarrier" filtered_by_mapmodule_39_ln136_bail.txt | wc -l
46

Now we have covered 77 of 85. Some bailouts are normal, so I will analyze only those that happen very often.

Share this post


Link to post
Share on other sites

I was able to write a testcase to reproduce these two bailout issues.

Apparently bailouts are caused each time a new object is passed as "gamestate" parameter into map-module.js:39.

The gamestate is a "new" object in each turn because it's passed as a structured clone from the engine to the scripts.

EDIT: Reported as bug 914255.


function Map(val)
{
this.val = val;
}
Map.test = function(global) {
for (var j=0; j<4000; j++) {
j -= 0.1
}
var test2 = function() {
var map = new Map(global);
print(map.val);
}
test2(global);

};
obj1 = {};
obj2 = {};
obj3 = {};
obj4 = {};
// results in four bailouts with the following signature
// [Bailouts] bailing from bytecode: getaliasedvar, MIR: typebarrier [52], LIR: typebarrier [57]
// [BaselineBailouts] Resuming after pc offset 50 (op lambda) (line 12) of tests/getaliasedvar_bailout.js:6
// ... and three bailouts with the following signature:
// [Bailouts] bailing from bytecode: nop, MIR: constant [0], LIR: label [0]
// [BaselineBailouts] Resuming at pc offset 0 (op zero) (line 8) of tests/getaliasedvar_bailout.js:6
// I assume it bails each time it gets a new object. the number of bailouts depends on how many new objects you pass into
// Map.test(); 0 A.D. creates a new object for the gamestate parameter in map-module.js:39 in each turn using structured clone.
for (var i=0; i<40; i++)
{
var arg;
if (i<10) // minimum 8 iterations per arg for some reason
arg = obj1
else if (i<20)
arg = obj2
else if (i<30)
arg = obj3
else
arg = obj4
Map.test( arg );
}

//You can also just copy and paste this code causing one bailout each time...
//but I'm not sure if this reproduces the problem correctly because it's new code each time
/*
for (var i=0; i<8; i++)
{
Map.test( {} );
}*/

outnew.html.tar.lzma.txt

Share this post


Link to post
Share on other sites

Thanks for working on this. OpenBSD currently has Spidermonkey 17 as the default, which means we have to use the internal Spidermonkey for 0AD. Looking forward to when we can use the system lib instead.

  • Like 1

Share this post


Link to post
Share on other sites

We don't know yet which version of Spidermonkey we will use in the end.

I think we missed the deadline for V26 already and h4writer is going to be away more or less the next three weeks. After that I'm going to be away for three weeks too.

I think fixing bug 914255 and the performance issues in map-module.js:39 is a must have.

Unfortunately it looks like fixing bug 914255 won't be enough because with the test patch it runs nearly 100% in Ion mode and still is much slower.

The attachment from the last post is an updated tracelogging graph by the way.

You can view it if you put basic.js and style.css from here into the same directory and open the extracted html file with a web browser.

Share this post


Link to post
Share on other sites

Yay, good news! :)

Since the last post in this thread I've been working on integrating as many changes from the patch into svn as possible.

When working on ticket #2241 I noticed that the changes there brought more or less the same performance problems the whole patch causes.

I suspected the wrapping again and started to look for alternatives.

In a discussion on #jsapi, Boris (bz) suggested using the module pattern to achieve a similar separation we currently get by using different global objects. That was more or less just a keyword for me in the beginning, but after some experimenting I got it working in ticket #2322.

Today I've combined all parts of the puzzle and made the measurements. The results are exactly as I hoped and (kind of) expected!

I've measured four different versions.

  1. The unmodified svn version at revision r14386
  2. The patch from ticket 2241 applied to r14386
  3. The patch from ticket 2322 applied to r14386
  4. The patches from ticket 2241 and 2322 combined. Also based on r14386.

All these versions use SpiderMonkey 1.8.5. It's a non-visual replay on the map Oasis 04 with 4 aegis bots.

It runs about 15'000 simulation turns and (most importantly) all four replays ended with the same simulation state hash.

post-7202-0-52227500-1388230802.png

What you can see on this graph:

  • The second measurement (2241) suffers from significant performance problems due to the wrapping.
  • The patch that moves the AI to one global (2322) doesn't change much compared to the unmodified SVN version. This is expected because our current SVN version can use multiple global in one compartment to avoid wrapping overhead. The theoretical benefits from loading the scripts only once and analyzing them more efficiently with the JIT compiler aren't significant. I expect this to be more important for the new SpiderMonkey version because its JIT compiler is more heavy-weight. On the other hand the JIT compiler is still designed to cause little overhead, so it's probably still not significant.
  • When both patches are combined, we can avoid wrapping overhead and achieve about the same result as our current SVN version.

Because my measurements for the new SpiderMonkey also contained this wrapping overhead, it's now quite open again what performance benefits it will bring. Well, we can say that it won't make much of a difference in the first 4000 turns or so, but that's where it only starts getting interesting! :)

post-7202-0-52227500-1388230802_thumb.pn

  • Like 3

Share this post


Link to post
Share on other sites

Ticket #2322 is committed. The next big step is getting #2241 in.

After that I can update the SpiderMonkey patch again and test the performance.

But first I need help with testing and reviewing of #2241. It's a huge patch and a much more complicated one than the last one.

Unfortunately I couldn't split it up more.

  • Like 2

Share this post


Link to post
Share on other sites

Why are we using Javascript for the game ? I've learnt some at school and did some stuff on web pages (so maybe i could help), but I don't see why actually. Anyway I know a very talented guy called Mefistolis he is retro engineering a game called dungeon keeper :

http://keeper.lubiki.pl/

You can see the source of the game here

http://code.google.com/p/keeperfx/source/list

Contact him by mail at : mefistolis@gmail.com

He may able to fix your stuff and a lot more. He is really patient, and i owe him all I know about C++ he is a professionnal programmer

Edited by stanislas69
  • Like 1

Share this post


Link to post
Share on other sites

Why are we using Javascript for the game ? I've learnt some at school and did some stuff on web pages (so maybe i could help), but I don't see why actually.

I've written something about scripting in 0 A.D. here. Maybe it answers your question. :)

Anyway I know a very talented guy called Mefistolis he is retro engineering a game called dungeon keeper [...]

If he wants to help with 0 A.D., he's welcome.

But when he is working on his own project, he probably prefers working that and doesn't have time for other things. Anyway, for me it doesn't look like he uses Javascript or even SpiderMonkey in his project. Some exchange with other SpiderMonkey embedders would be interesting.

Share this post


Link to post
Share on other sites

I've written something about scripting in 0 A.D. here. Maybe it answers your question. :)

If he wants to help with 0 A.D., he's welcome.

But when he is working on his own project, he probably prefers working that and doesn't have time for other things. Anyway, for me it doesn't look like he uses Javascript or even SpiderMonkey in his project. Some exchange with other SpiderMonkey embedders would be interesting.

Okay, so it is mostly for portability issues ?

I'll ask him maybe he wants to change his mind. He's not really my friend though, we have working together thats it

  • Like 1

Share this post


Link to post
Share on other sites

Why are we using Javascript for the game ? I've learnt some at school and did some stuff on web pages (so maybe i could help), but I don't see why actually.

Keep in mind the decision was made long ago and basically is set in stone, but here is an insight into why JavaScript was chosen: http://www.wildfiregames.com/forum/index.php?showtopic=15068

Share this post


Link to post
Share on other sites

Keep in mind the decision was made long ago and basically is set in stone, but here is an insight into why JavaScript was chosen: http://www.wildfiregames.com/forum/index.php?showtopic=15068

I was not trying to change anything ^^. I had never seen a video game using Javascript. but lua a lot. Thank you for this all my questions were answered ;)

Share this post


Link to post
Share on other sites

I guess we are now finally at the point where we can really draw a conclusion about SpiderMonkey performance.

After seeing the big improvements from moving the AI to one global in the previous profile, I hoped again that SpiderMonkey would raise from the ashes like a Phoenix(or maybe like a burning fox?) and impress us with some nice performance improvements.

The bad news is, it doesn't. The good news is, all of the big performance drawbacks we saw in the previous profiles are gone too and there are only a few smaller peaks left that shouldn't stop us from doing the upgrade.

The fact that both graphs are so close together also seems to prove that hidden performance problems affecting the measurement are quite unlikely.

I'm disappointed that this is the result of highly talented experts working for about two and a half year on Javascript performance improvements but it also tells me that this way of improving peformance is a lost cause.

Here are the graphs:

post-7202-0-01581800-1389120404.png

The hash in the end isn't the same but since both graphs are so similar, it's extremely unlikely that the simulation state ends up significantly different.

The setup is the same as in the last graph.

Now the next steps are:

  1. Figuring out if there are significant problems that keep us from using the packaged ESR24 version and then decide which version should be used for the upgrade
  2. Fix the remaining problems and bring the patch to a committable state

post-7202-0-01581800-1389120404_thumb.pn

  • Like 2

Share this post


Link to post
Share on other sites
The fact that both graphs are so close together also seems to prove that hidden performance problems affecting the measurement are quite unlikely.

I'm disappointed that this is the result of highly talented experts working for about two and a half year on Javascript performance improvements but it also tells me that this way of improving peformance is a lost cause.

Yeah, that is quite unsatisfying given all the work and thorough analysis put into it... It's certainly an interesting finding for JS engine devs which goes against most expectations. I don't know if there is a venue to share some of the insights as a way of sharing the fruits of your labour?

Hopefully the work makes it somewhat easier to transition to newer SpiderMonkey versions in the future which may deliver some marginal gains.

  • Like 1

Share this post


Link to post
Share on other sites
I don't know if there is a venue to share some of the insights as a way of sharing the fruits of your labour?

I'm unsure what you mean. The SpiderMonkey devs know about these results and also gave some tips what could improve performance.

They don't seem to judge the results as so unexpected that they'd spend much time doing their own analysis though.

To be fair, I believe it when they say that there were reports of performance improvements for other applications than the most well known benchmarks.

I also measured performance improvements for random map generation for example.

My guess is that the improvements work well for simple code with a lot of repetition. If you have one function that runs 80% of the time and repeats the same steps over and over again and only works with simple and static objects, this is likely to run faster. If there's more complicated code working on more complicated objects with less of repetition, the concept doesn't work out and the performance doesn't improve. Especially our entities and entity collections are quite complex, generic and polymorphic objects that can't be put in a compact array or optimized well in another way.

That should be one of the next subjects to improve on our side because improving it on the JS engine side is much more difficult and also more limited.

Hopefully the work makes it somewhat easier to transition to newer SpiderMonkey versions in the future which may deliver some marginal gains.

It definitely makes it easier. But there are also additional changes comping soon like dropping support for stack scanning which means a lot of work will have to be put into rooting all jsvals and objects properly.

Share this post


Link to post
Share on other sites

I have updated/downgraded the patch for version ESR24.

Here's a comparison of the different versions. The keepjitcode profiles are with a flag set in JSRuntime.cpp which disables garbage collection of JIT code.

It improves performance but can't be used that way at the current state.

I've attached the profile files because on a screenshot you can't see much. You can open graph.html in a browser, zoom with the mouse wheel and drag the graph with the mouse.

You can also edit data.js to hide some graphs.

What's your opinion? Should we update to v24 now which is an official ESR release and gets packaged for most Linux distributions? Should we use a non-ESR version which is faster but not officially released as standalone library and doesn't get packaged by most Linux distributions (or if they package it, then only for us).

Or, the third option, should we/I continue updating the patch until the ESR 31 version gets released (around July 2014)?

profile version decision.zip

Share this post


Link to post
Share on other sites

I can't say much about the first two options, I believe we relyon our spidermonkey on OSX so I don't really care.

Are there big changes in ESR 31 compared to ESR 24?

(I'm assuming that ESR is extended support release).

Also, I find it amazing that the newer spidermonkey is always slower than current SVN. It's so counter-intuitive.

Share this post


Link to post
Share on other sites

Are there big changes in ESR 31 compared to ESR 24?

(I'm assuming that ESR is extended support release).

Yes, it stands for extended support release. There are no standalone packages for SpiderMonkey provided for other versions, but they can be created manually relatively easy.

I did the "backport" of my patch in less than a day, so the API difference between ESR24 and the later dev version I was using (something around v26) was relatively small. We don't know if there will be bigger API changes until ESR31 is ready though. Maybe they will force exact stack rooting which will probably require quite a lot of work to get it right.

The biggest problem I see with ESR31 is the additional work required for keeping the patch in sync. Your commit yesterday introduced some missing vars in for .. in loops for example. You would have gotten error messages if the update was done already.

I don't think the work would be massive but it should be considered.

I don't know about any interesting new features. I haven't seen a public roadmap with planned features either.

I prefer updating to ESR24 first and leper favoured that approach too. Any other opinions?

Share this post


Link to post
Share on other sites

I agree that we should just go ahead with the update. Logistically, this patch is way to big to just leave on trac for an extended period of time.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...