Jump to content

Help needed: Optimizing 0 A.D. with Intel VTune Amplifier


Recommended Posts

@feneur Could you post it on twitter and facebook ? :) Something along the lines of:

"We are glad to announce 0 A.D. was recently chosen as an experimentation ground for Intel® VTune™ Amplifier with the goal of finding potential bottlenecks in the game's code, you can find the detailed analysis here."

Maybe on twitter @intel  with a few hashtags on twitter #freesoftware #opensource #SoftwareTesting

  • Like 4
Link to comment
Share on other sites

  • 3 months later...
  • 3 months later...

so i walked now through this thread and made the finding that i also guess is the most relevant to our 0ad performance:

On 3/4/2019 at 8:14 PM, Alex from Intel said:

So what I believe is going on here, is that these gaps between frames are coming from the game having to wait for the JavaScript. A couple possibilities I can think of are that you might be interfacing the two languages too often, or you might be doing computations in the JavaScript that really belong in the C++.

is someone currently looking on that performance bottleneck on javascript computations that belong to C++? I would dive into that if someone is on it and would like to split work. So let me know

Edited by ffffffff
  • Like 1
Link to comment
Share on other sites

41 minutes ago, ffffffff said:

is someone currently looking on that performance bottleneck on javascript computations that belong to C++? I would dive into that if someone is on it and would like to split work. So let me know

I am sort of, I used getmicroseconds to profile some functions to see easy optimizations. However it's not easy to reduce the coupling, and one cannot cache QueryInterface Pointers as they keep changing. What could be nice is implementing workers so that each component could run in threads ? Also running the same profiling with SM45 would be interesting. One might also try to use the tracelogger to find potential bottlenecks.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

11 minutes ago, Stan` said:

each component could run in threads

thought the same. i looked in some worker threads from current code to know how to implement and how they work. i wanted to start trying to thread the current archive builder code to thread the translations happening there to see if that can be threaded as first try that would result in a gain for that when building archives for mods and stuff. then i would try to apply to the transitions of the functions being called most often when building a frame in the game. current idea from me. so need to isolate by perf measuring code the functions that do the most work. then try to thread and make the data lock that is being worked on so we get a parallel work here.

Edited by ffffffff
  • Thanks 1
Link to comment
Share on other sites

Honestly threading the archive builder won't help much because it's a punctual. I guess the sprintf thing could be speed up but I haven't profiled it yet with  https://code.wildfiregames.com/P187

IMHO there are some optimizations to be done with Auras, maybe the FSM (I wonder if we could move it to cpp)

On 11/8/2019 at 4:25 PM, Stan` said:

{"Component":"Health","FunctionName":"ExecuteRegeneration","TotalTime":1.8259999998263083,"Count":29,"Average":0.06297,"TurnAverageCount":0.00332378223495702}
{"Component":"Barter","FunctionName":"ProgressTimeout","TotalTime":3.0849999999336433,"Count":67,"Average":0.04604,"TurnAverageCount":0.007679083094555874}
{"Component":"Trigger","FunctionName":"DoAction","TotalTime":14.219999999999345,"Count":1,"Average":14.22,"TurnAverageCount":0.00011461318051575932}
{"Component":"AttackDetection","FunctionName":"HandleTimeout","TotalTime":22.20600000010745,"Count":6216,"Average":0.00357,"TurnAverageCount":0.71243553008596}
{"Component":"GarrisonHolder","FunctionName":"HealTimeout","TotalTime":34.932000000058906,"Count":503,"Average":0.06945,"TurnAverageCount":0.05765042979942694}
{"Component":"Capturable","FunctionName":"TimerTick","TotalTime":37.416999999777545,"Count":292,"Average":0.12814,"TurnAverageCount":0.03346704871060172}
{"Component":"StatisticsTracker","FunctionName":"UpdateSequences","TotalTime":49.37099999987913,"Count":118,"Average":0.4184,"TurnAverageCount":0.013524355300859598}
{"Component":"ResourceTrickle","FunctionName":"Trickle","TotalTime":52.41299999988587,"Count":3490,"Average":0.01502,"TurnAverageCount":0.4}
{"Component":"BattleDetection","FunctionName":"TimerHandler","TotalTime":64.0520000008255,"Count":3773,"Average":0.01698,"TurnAverageCount":0.43243553008595986}
{"Component":"BuildingAI","FunctionName":"FireArrows","TotalTime":96.48299999882875,"Count":929,"Average":0.10386,"TurnAverageCount":0.10647564469914039}
{"Component":"Pack","FunctionName":"PackProgress","TotalTime":811.1559999997262,"Count":1095,"Average":0.74078,"TurnAverageCount":0.12550143266475644}
{"Component":"DelayedDamage","FunctionName":"MissileHit","TotalTime":4515.770000000877,"Count":6767,"Average":0.66732,"TurnAverageCount":0.7755873925501432}
{"Component":"ProductionQueue","FunctionName":"ProgressTimeout","TotalTime":7443.771000001156,"Count":6729,"Average":1.10622,"TurnAverageCount":0.7712320916905444}
{"Component":"UnitAI","FunctionName":"TimerHandler","TotalTime":34888.44400000893,"Count":155744,"Average":0.22401,"TurnAverageCount":17.85031518624642}

With GetMicroseconds, without any optimizations same replayed match (since they were ais it's different but I believe the data is still good.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...