Jump to content

SpiderMonkey ESR31 upgrade


Recommended Posts

This thread is for discussions and updates about the SpiderMonkey upgrade to ESR31

I've decided to create a new thread because getting the upgrade to ESR24 into SVN was an important milestone and the previous thread was already quite long.

Ticket: 2462

Check it out from my github branch (read the readme!).

Short summary what happened so far

The upgrade from the old SpiderMonkey version v1.8.5 (released in March 2011) was quite a big task due to the API changes.

The conceptual changes (like having a 1to1 releation between global objects and compartments) were the biggest problem. Another problem were inconsistencies and duplication in our own code. The smaller problem were simple API changes. These issues are sorted out now and we can continue with the upgrade to ESR31.

Performance

We expected a lot from the upgrade to ESR24 because most of the significant SpiderMonkey changes from v1.8.5 to v24 were solely to improve performance.

There's a new baseline compiler, a new JIT compiler with enhanced optimization capabilities called Ion Monkey, improved type inference and a lot more. There was a whole team of programmers working on performance improvements over years. The commonly used benchmarks show a big improvement but unfortunately we figured out that this does not improve performance for us and the performance got a little worse with the upgrade.

It's not a big secret that these benchmarks get "special care" from the developers of Javascript engines such as SpiderMonkey or V8. Browser market shares are the main factor deciding how much money these companies get from e.g. search providers for adding their search as default entry in the browser. Benchmark performance improvements seem to be important for marketing, which is where the term "benchmarketing" comes from.

The benachmarks were originally desigend to be a good representation of real world application performance but now one could say they are not related to real world performance anymore because of all the tuning and benchmark specific optimizations.

Still, I trust Mozilla that the fundamental engine changes will also improve performance for embedders like us at some point (and probably they already do). What they did first after integrating big changes like Ion Monkey or Baseline was making sure they don't decrease benchmark performance. Now that this goal is completed they can start fixing performance issues which don't show up in these benchmarks but are important for other applications. H4writer fixed some of these problems we reported during the upgrade to v24 and apparently they continued fixing others in the meantime.

ESR31 schedule

ESR31 is still in development. We have time until the end of the month to get changes in, so I'm starting early with the work on the upgrade (well, as early as I could).

That should also help detecting performance regressions and ease the integration of API changes.

The final version should be released in July and I'm aiming for Alpha 17.

Exact rooting, moving GC and GGC

While upgrading to v31 I'm also working on the transition to exact stack rooting and the support of a moving garbage collection (GC).

V31 will probably be the last version supporting conservative stack scanning (the opposite of exact rooting). This change should bring some performance improvements, in theory, but we'll see what happens in practice. This transition will allow us to use features such as generational garbage collection (GGC) and compacting garbage collection in the future. The progress is tracked in #2415.

Performance: current status

V31 really brings some improvements compared to V24. Unfortunately I can't compare it to v1.8.5 easily anymore.

I've used a non-visual replay with 4 aegis bots on the map Oasis 04 over around 15000 turns and measured how long the whole replay takes.

This shows the difference more clearly than the graphs I've used before.

I've found a bug with incremental GC causing problems with off-thread compilation of JIT code. I've reported the incremental GC problem on the JSAPI channel and they created bug 991820 for it.

I've also confirmed once more that disabling garbage collection of JIT code makes a difference. Unfortunately there are currently only functions meant for testing purpose which allow us to do this and I'll ask for a proper solution soon.

Here's the comparision with different variations concerning the problems mentioned above:

post-7202-0-53293200-1396707010.png

post-7202-0-95228700-1396694342_thumb.pn

post-7202-0-53293200-1396707010_thumb.pn

Edited by Yves
Adjusted the graph to have a matching order with the legend
  • Like 5
Link to comment
Share on other sites

So the conclusion is: All ESR31 versions are slower.

So why should we go for ESR31 anyway?

No they are faster!

The graph shows how many seconds the 15000 turn replay took, so shorter bars are better!

Edit: I have no idea why Libre Office Calc orders the items in the legend bottom-up and the graphs top-down but the colors are what matters.

Edit2: I've fixed the order in the graph

Link to comment
Share on other sites

Hi Yves,

let me say thanks for your tremendous effort on this, its always a pleasure to read about your progress!

KeepJitCode was the one where Jit code is not garbage collected, right? Do the Mozilla Devs plan to make this safe in ESR31? I honestly don't know how this could work with a moving GC?

@FeXoR: To the contrary, all ESR31 versions are actually faster. And quite substantially as well (biggest improvement seems to be around 18% by visual judgement).

Link to comment
Share on other sites

KeepJitCode was the one where Jit code is not garbage collected, right? Do the Mozilla Devs plan to make this safe in ESR31? I honestly don't know how this could work with a moving GC?

H4writer pointed me to bug 984537 which is planned to get into ESR31.

I can test this patch and see if it really solves the problem for us, but according to him it should.

  • Like 1
Link to comment
Share on other sites

I assume that means asm.js extensions would be usable now too? ( http://asmjs.org/ ) Would that be of any help with 0AD?

We try to put all performance sensitive code in C++, which is still faster than asm.js. Certainly if you also know that the conversion of variables from regular js to asm.js costs some conversion time, just as with the conversion to C++ datatypes.

Also, asm.js is meant as a compilation target, not as a language to write. Which would mean we can just as easily rewrite the performance sensitive algorithms directly in C++ rather than in some other language that has to be compiled separately to js.

  • Like 1
Link to comment
Share on other sites

hmm ok, C++ functions seem to make more sense in most cases then.

What about modding though? If you don't want to modify core 0AD but still have some performance problems, using some compiled asm.js javascript code might be helpful, no?

Link to comment
Share on other sites

... incredible. This is complicated. They now use exact rooting, but the compiler couldn't detect it dependably. So Dynamic Stack examination is (still) done to cover those cases. And then Static Stack examination too as the Dynamic method might destroy pointers ...

Then the pointers' target locations can be moved even. Ohoho. I hope this is not taking too much time all this moving.

REally a giant piece of dangerous and time-consuming changes. Once we have a new computing breakthrough this will all be rendered useless. Won't the conservative method (they introduced 2010) be fine for some time into the future?

Still mozilla has come up with a bearable solution by introducing templates Handle<T> (a double pointer, mostly functions use this to allow for an interlayer to allow moving memory even if more than one pointer points here.) and Rooted<T> ..

Still incredible complicated as I now would tend to overly root all objects just to avoid it being removed by the garbage collector.

Excuse my nooby question: isn't this philosophy of 'developer is responsible for telling which is to be garbage collected' better solved by the philosophy 'developer is responsible of trashing invalid pointers directly' ?

I must be understanding this all wrong.

Cool the performance update though. Still looking forward to how this turns out. Perhaps pretty useful in the end. Or our doom.

Link to comment
Share on other sites

I don't expect it to be too be extremely difficult and dangerous, but it's definitely not done in a few hours.

The basics of the rooting API are relatively easy, but there are some difficulties with special cases.

Templates and macros are a problem because you need different types depending on where they are used (JS::Rooted<T> for declaring new variables on the stack, JS::Handle<T> or JS::MutableHandle<T> for function arguments). Templates will just be instantiated as a template function with the type you pass in (JS::Rooted<T>), but arguments should be passed as handle types. Even if you define only a JS::Handle<T> spezialization, it will still complain that there's no spezialization for JS::Rooted<T> instead of using the implicit conversion to a handle which is available. JS::MutableHandle<T> isn't a problem here because it uses the & operator, but JS::Handle<T> needs an explicit cast.

It's more difficult in combination with macros because they implement generic code for types but we need special code (like these casts or the & operator) for these special types. This is a problem in the ScriptInterface::FromJSVal implementation. It either requires a lot of code duplication or a smarter solution which I haven't found yet.

Another problem are places where JSObject pointers are used as identifiers and are mapped to some other values. Ticket #2428 is such an example.

isn't this philosophy of 'developer is responsible for telling which is to be garbage collected' better solved by the philosophy 'developer is responsible of trashing invalid pointers directly' ?

This just follows the same approach the Javascript garbage collector does (marking what's still used).

Making the API user responsible for freeing unused variables wouldn't be much better because instead of crashing you could end up with memory leaks (values not being used anymore which haven't been marked as unused properly). It could also lead to problems or at least inconsistencies when passing values from JS to C++. They are JS values and therefore would be automatically collected and then you also have them on the C++ side where they would not be automatically collected (but both are actually the same value). You can also have a call stack going from C++ to JS and back to C++. In this case you would have to check the whole stack for values still in use from C++ (which is what the stack scanning did).

Link to comment
Share on other sites

Thanks for clarifying. Valuable insights.

As a summary:

If speed's not needed the stack scanning as you explained is enough.

If performance matters then simply give the JavaScript side the pointer deleting task. If the JavaScript side has to deal with speed issues (complicated algorithms for optimization) then surely they have the capability to deal with this pointer tidying up issue too.

So is 'leaving the marking of unused variables via the Rooted<> and Handled<> to the user/JavaScript side' just mirroring the C++ side pointer for deletion marking?

If so, all is fine. Otherwise I wonder if the memory leaks that might occur, due to the JavaScript side accidentally not marking heap-pointer as no-longer-used, really worth the complification?

As generally the more layers the more actions (conversions, ..) where things might go wrong.

Still I appreciate the option to have not to compile the JavaScript. So perhaps the new strategy in the upcoming spidermonkey ESR31 will be really worth it. Otherwise I feel tending towards simply using C++ and wonderbuild compiler. Or simply the not-all-optimized anti-almagamation:

Amalgamation: All C source code for UnQLite and Jx9 are combined into a single source file.

[UQLite]

not-all-optimized anti-almagamation: I imagine having a program where each new file you create can be compiled without the compiler having to look if now some optimizations can be performed. Then compilation speed should increase and then the JavaScript/interpreter languages would be obsolete at once (for an open system like ours, only non-open programs will need it as they don't let modders compile as they don't share code and artwork.).

So for us I think the interpreter language is nice only for those modders that write scripts and still are not aware of compilation. (and I guess those hardly exist as compilation of 0AD boils down to just two commands. ;) )

So getting rid of it would save us tons of issues (JavaScript debugging e.g.) and developer time (mainting the translation from JavaScript to C++). Once we have translated our JavaScript files to C++ of course. But I guess there is a handy automatic conversion program out there.

Edited by Hephaestion
Link to comment
Share on other sites

I have to correct myself. A JavaScript to C++ converter might be rare (because JavaScript is so dynamic and typeless). Edit: For completeness sake: There is JS_to_C. Example (UnityScript -> C#).

As I still hope for technological breakthrough which would render our (not even that big I would say) performance problems history. I prefer keeping things as they are.

But well, we have no choice as it's mozilla's decision ..

Do you know if this will give yet another performance boost. Its effect should be very noticeable if your tests of ESR31 already gave 18%.

Glest is written in C++ if I'm not wrong (what I would prefer as Open Source implies moddability already, so no need for interpreter language, but well I can live with it, 0AD is worth it). Here they talk about us: https://forum.megaglest.org/index.php?topic=9425.0 (they note our complicated folder structure, I share this .. we should not distinguish props that much, that's horror for my future plans.)

Is it to be decided if we continue with SpiderMonkey upgrades? What if it all turns out being a pitfall? Then we have to change the JavaScript<->C++ interface once again.

I see three hypothetical roads:

  • no more upgrades. => not good for many entities at small place (pathfinder).
  • upgrades => speed up => good for pathfinder, no dev has to revise it.
  • JavaScript -> C++ => same as directly above.
I like the pathfinder's robustness. As to what I read about it, it's quite a solid algorithm. (compare to Stronghold where units go through each other.)

Another reason for option 2 would be that Mozilla will ensure huge support and bugs are being fixed (like Yves showed). So they also profit from us like we do from them. => Good for the Open source world. => Good for the planet.

Edited by Hephaestion
Link to comment
Share on other sites

Is it to be decided if we continue with SpiderMonkey upgrades? What if it all turns out being a pitfall? Then we have to change the JavaScript<->C++ interface once again.

SpiderMonkey has been around for almost 20 years. So I doubt it will be a trap. Next to that, it's open. So the worst thing that can happen is that it doesn't get updated anymore (but currently, it seems to be updated a bit too fast for our needs).

Also, translating the JS code to C(++) (if that even works) would most likely not give any performance advantage, as the code has to be interpreted very freely anyway. On top of that, the generated code file would be mostly unreadable.

Also note that JS debugging is no problem. In contrary, since scripts don't have to be compiled, and are hotloaded, it's easier to debug than the C++ code, as you can just start and stop logging anything you want while the game is running.

Link to comment
Share on other sites

You're right on that. I also like the hotloading. Though I have plenty of issues with hotloading in other projects. Ever experienced static classloaded (and not hot-reloaded) member variables loose their values? It's not so nice ... and not so easy to find.

Also note that JS debugging is no problem. In contrary, since scripts don't have to be compiled, and are hotloaded, it's easier to debug than the C++ code, as you can just start and stop logging anything you want while the game is running.

I could also define an XML property that defines a logging setting which is constantly reread and setting the logging mode. But that's still not ideal.

On top of that, the generated code file would be mostly unreadable.

I don't think so. Instead the logic and variable names as well as comments are converted as is. Only the types give some hassle. And to adapt the storage structures and their use thereafter.

It's definitely a time-saver, alone because it saves you to rename our 14968 occurrences of var to the corresponding type.

Though still I also favour the option to just go with the SpiderMonkey. Yet I think Yves has the last word here.

(but currently, it seems to be updated a bit too fast for our needs).

:D yes. besides their speed advantage really becomes that high as is anticipated. This would really be huge then. As we could perhaps live with our pathfinder then? otherwise I like the crowd formations very much and will look into it in the following years. So no matter the SpiderMonkey outcome, we will have a solution for our million-unit at a place lag problem. And's that a good thing. (mostly 0 A.D. is critisized for this performance lag in other forums ... though they probably don't know, that we'll get rid of it :D)

To give a context, I repost a formations and short pathfinder research video here too:

Source: http://www.wildfiregames.com/forum/index.php?showtopic=13042&p=269340

Edited by Hephaestion
Link to comment
Share on other sites

Scripting allows features like automatic downloads of maps, missions and even AI scripts or GUI mods.

We can still restrict this to allow some of these types, but compared to compiled native code it can be delivered over the web in a relatively secure and fully portable way.

I don't see how this could be achieved by C++. C++ code is always compiled for a specific target platform (you can't load a .dll from Linux). Also it would be quite hard or impossible to limit the possibilities and make executing untrusted code relatively secure.

0 A.D. has always been designed for modding and scripting is also important to lower the barrier for new modders. C++ is not the preferred language for modding because it's quite hard to learn and easy to introduce bugs.

I'm quite sure we find a solution with Javascript. Part one of the plan is figuring out how far we can push Javascript performance and where the limit is.

The second part is finding solutions for these limits like providing native C++ functions for the scripts.

  • Like 1
Link to comment
Share on other sites

These are another two convincing arguments. Thx. Didn't think of the networking ... nor that it's quite difficult to learn C++.

I'm quite sure we find a solution with Javascript. Part one of the plan is figuring out how far we can push Javascript performance and where the limit is.

The second part is finding solutions for these limits like providing native C++ functions for the scripts.

thx for the hope. sounds good.
Link to comment
Share on other sites

  • 4 months later...

The currently planned schedule needs replanning. We have several options for the upgrade and related tasks and need to decide on the priorities.

Unfortunately, the two major bugs (bug 991820, bug 984537) have not yet been fixed or not in time for v31. This means the performance improvement of the upgrade won't be as big as the best results in the first post of this thread (nearly 20% simulation performance), but if the first measurements are still valid, it should still be nearly 10%. These numbers are only valid for that specific testcase and can be different in other cases of course.

I was hoping that generational GC could mitigate the impact of these bugs, but the public API is not quite ready to control generational GC yet and I haven't yet tested if it really helps.

I wanted to get ready with exact stack rooting support (ticket #2415) before the upgrade because it needs a rebuild of SpiderMonkey with different flags (and new Windows binaries). It would also be good if our programmers get used to exact rooting now.

Most of the changes for exact stack rooting are done, but there's still quite a bit left. Especially it needs careful testing. We have to test specifically for that because rooting issues are very hard to find by just playing the game. The same thing could work a thousand times and then suddenly fail because of garbage collection timing.

There's dynamic rooting analysis which can already be used and we might also need static rooting analysis, which is probably quite a bit of work to implement (ticket #2729). I still don't know enough to tell if we really need the static analysis though.

The SpiderMonkey 31 upgrade itself (without all the additional things like rooting) is a bit behind schedule. I haven't heard of any progress from Philip on the autobuilder (which we need for C++11 and SpiderMonkey) and I noticed that there's no bundled release from Mozilla yet. In addition there's still some work I have to do on the code. After all this is done, a testing period of at least one week is needed. I don't think this is all going to happen in less than two weeks (until the currently planned Alpha 17 release).

I would reduce it to two options now:

  1. We aim for Alpha 17. In this case we'd probably keep exact rooting disabled but would continue with the API transition. Developers could also enable it if they want and don't mind rebuilding SpiderMonkey. This also depends a lot on Philip and whether he can/wants to complete the autobuilder work in time. The Alpha 17 release would most likely get delayed. We would have a small performance improvement.
  2. We plan the upgrade for after the Alpha 17 release. In this case we have more time for the autobuilder and exact rooting and it could be committed with exact rooting already enabled. Most of the benefits from exact rooting won't be accessible before the next SpiderMonkey version, but it would still help a bit. Some JIT compiler bugs were related to the conservative stack scanner for example. I haven't yet done performance tests with only exact rooting enabled.

Here's a little visualization of the dependencies:

post-7202-0-73981800-1408291553.png

post-7202-0-73981800-1408291553_thumb.pn

  • Like 1
Link to comment
Share on other sites

It seems like pushing back to after A17 has no drawback other than "it makes A17 weaker"?

Given that Philip is afaik still looking for a job, it would be better to ask him when he estimates he'll be over with this. I haven't really followed your progress but I don't think it'd be a good thing to push A17 back.

  • Like 1
Link to comment
Share on other sites

If we so wish, we can afford to push it to after A17 because (among other releases) A17 already has a number of performance improvements. With the school year starting after/around A17, it's going to be much harder to get algorithmic speed improvements in A18. It could be argued that moving ESR 31 to A18 would actually be beneficial.

  • Like 1
Link to comment
Share on other sites

If we so wish, we can afford to push it to after A17 because (among other releases) A17 already has a number of performance improvements. With the school year starting after/around A17, it's going to be much harder to get algorithmic speed improvements in A18. It could be argued that moving ESR 31 to A18 would actually be beneficial.

But if the school year start in the development of A18, that means nobody will have time to work on it ?

Link to comment
Share on other sites

But if the school year start in the development of A18, that means nobody will have time to work on it ?

I'm currently the only one working on the SpiderMonkey upgrade and I'm not at school at the moment, so that shouldn't be a problem. There will probably be a project at work which will require a lot of time soon, but there have been other projects in the past and I should still have enough time to complete the upgrade for Alpha 18.

  • Like 2
Link to comment
Share on other sites

  • 4 weeks later...

I've made some more performance measurements. I wanted to answer the following questions:
1. What effect does GGC (Generational Garbage Collection) currently have on performance and memory usage
2. What has changed performance-wise since the first measurements in this thread? There have been many changes on 0 A.D., including Philip's performance improvements and AI changes.
There have also been some changes in SpiderMonkey since that measurement and in our SpiderMonkey related code. All that could have changed the relative performance improvement directly or indirectly.
3. Have the first measurements been accurate and can they be confirmed again? Based on the experience from the Measurement and statistics thread I suspected that parts of the results could be slightly different if measured again.
I didn't expect that difference to be greater than 3% at a maximum, but I wanted to confirm that and get some more accurate numbers by increasing the number of measurements and reducing the effects of other processes running on the system.
4. One mistake I've made in the first measurement was to measure v24 with C++03 and v31 with C++11. I wanted to know how much of the improvement is related to SpiderMonkey and how much to C++11.

The measurement conditions were again:
r15626, 2vs2 AI game on "Median Oasis (4)", 15008 turns
Each measurements was executed 50 times on my notebook by a scirpt (except the v24/C++11 measurement which was executed 100 times).

I've posted the distribution graphs just to give a better idea of how close these measurements are in addition to the standard deviation value.


SpiderMonkey 24 with C++03
Average: 662.04 s
Median: 662 s
Standard deviation: 1.68 s

Distribution graph of the measured values:
post-7202-0-97975800-1410553451_thumb.pn


SpiderMonkey 24 with C++11
Average: 662.43 s
Median: 662 s
Standard deviation: 2.47 s

Distribution graph of the measured values:
post-7202-0-81843000-1410553385_thumb.pn

It was quite surprising to me that C++11 apparently didn't improve performance at all. At least it didn't get worse either.
I have to say that this probably depends on the compiler a lot and I was using a relatively old GCC version which might not be able to take full adavantage of C++11 yet.


SpiderMonkey 31, no GGC
Average: 661.08 s
Median: 661 s
Standard deviation: 1.44 s

Distribution graph of the measured values:
post-7202-0-47154800-1410553271_thumb.pn


SpiderMonkey 31, GGC
Average: 659.94 s
Median: 660 s
Standard deviation: 1.48 s

Distribution graph of the measured values:
post-7202-0-48870900-1410553057_thumb.pn

Memory usage graphs (only from 1 measurement each, comparing v31 and GGC with v24):
post-7202-0-06487800-1410552833_thumb.pn

We see that the average and median values are lower for v31 compared to v24 and for GGC compared to no GGC.
However, the difference is so small that it could also be conincidence.

That's disappointing because even though the difference in the first measurement was small too, I've hoped for a bigger improvement because of GGC.
We don't quite know if the first measurement was conincidence or if some of the changes made the difference smaller again.

The memory graphs look quite promising. With GGC there's quite a big decrease in memory usage. Here it's also important to note that this improvement did not have a negative impact on performance. We could have achieved a similar improvement by increasing the garbage collection frequency, but this would have had a bad impact on performance. The additional minor GCs that run between full GCs don't seem to be a performance problem.


SpiderMonkey 31, GGC, keepJIT code
Average: 572.98 s
Median: 573 s
Standard deviation: 3.72 s

Distribution graph of the measured values (there was probably a background job or something running... the standard deviation is a little bit higher than for the other measurements):
post-7202-0-46831600-1410552998_thumb.pn

So far we have not seen significant performance improvements. Fortunately the impact of SpiderMonkey's problem with garbage collection of JIT code could be confirmed.
It's approximately a ~13.5% improvement in this case. There's an additional problem with incremental GC, but I didn't measure that because it's quite hard to setup conditions that reflect the performance of a real fix of this bug.
Anyway, "KeepJIT" means I've set the flag "alwaysPreserveCode" in vm/Runtime.cpp from false to true. It causes SpiderMonkey to keep all JIT compiled code instead of collecting it from time to time. Just setting that flag would be kind of a memory leak, so it's only valid for testing. However, SpiderMonkey's behaviour in regard to GC of JIT code has been improved in several changes, so there's a good chance that we'll get a proper fix for that performance problem (unfortunately not with v31).

  • Like 6
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...