Jump to content
todor943

Same FPS and weird CPU pattern between very different GPUs

Recommended Posts

So, I've noticed that when you move the camera there is a considerable amount of fps drop. I have a tendency to not trust my AMD ATI driver on linux, so I switched to the Intel GPU. Both were yielding about the same FPS in game, which I found weird. Until I looked at the screenshots I made for CPU use. I am attaching that below. On the left side of figure 1 you can see that how the Intel Core i7-4810MQ tries to juggle  around single thread performance when using the dedicated AMD FirePro W4170M. On the right side, with the same FPS, the game is running on the Intel graphics. The CPU usage pattern, however, is different. To me as a dev this looks like a scheduling as well as a threading problem. I have the steps to replicate, so how can I profile the runtime to see what is causing this particular issue : Momentary stutter/lag when moving around, with all effects on low and no units. Is there an issue already tracked for this?

 

Figure 1:

perfshot.thumb.png.e8227ed6dfeba741a0faed0dc7bcdcd8.png

Share this post


Link to post
Share on other sites

AFAIK the game is not threaded in any way. Sim and rendering and the netclient all run on one thread. There are plans to move the net client to a separate thread but that hasn't been done yet. Rendering and sim are tightly interdependent so it probably wouldn't do much good to run them on separate threads. I do have plans to try to reduce the CPU-side costs of rendering, but it'll be a while before I make any real progress on it. Keep in mind that intel graphics steals memory from main memory so it wouldn't be strange to have different/weird CPU usage patterns from it, not to mention it'd use a totally different driver.

Share this post


Link to post
Share on other sites
6 hours ago, todor943 said:

So, I've noticed that when you move the camera there is a considerable amount of fps drop. I have a tendency to not trust my AMD ATI driver on linux, so I switched to the Intel GPU. Both were yielding about the same FPS in game, which I found weird. Until I looked at the screenshots I made for CPU use. I am attaching that below. On the left side of figure 1 you can see that how the Intel Core i7-4810MQ tries to juggle  around single thread performance when using the dedicated AMD FirePro W4170M. On the right side, with the same FPS, the game is running on the Intel graphics. The CPU usage pattern, however, is different. To me as a dev this looks like a scheduling as well as a threading problem. I have the steps to replicate, so how can I profile the runtime to see what is causing this particular issue : Momentary stutter/lag when moving around, with all effects on low and no units. Is there an issue already tracked for this?

 

Figure 1:

perfshot.thumb.png.e8227ed6dfeba741a0faed0dc7bcdcd8.png

See https://trac.wildfiregames.com/wiki/EngineProfiling

Share this post


Link to post
Share on other sites
13 hours ago, aeonios said:

Rendering and sim are tightly interdependent so it probably wouldn't do much good to run them on separate threads.

But it shouldn't be interdependent. Because we need to split it for 2 steps: submitting and drawing. Without threading it would be less powerful, but still useful. Because we don't need to wait until post-processing and buffer swapping will be finished.

Share this post


Link to post
Share on other sites
31 minutes ago, vladislavbelov said:

But it shouldn't be interdependent. Because we need to split it for 2 steps: submitting and drawing. Without threading it would be less powerful, but still useful. Because we don't need to wait until post-processing and buffer swapping will be finished.

Yes but if render is running much faster than sim then it will sit there and spam enumerateObjects at sim. Synchronizing that to eliminate the wasted requests would be a serious pain. I do however plan on separating submission and visibility testing and centralizing visibility testing in the renderer. I suspect that on large maps the cost of visibility testing becomes very high so using a hierarchal system could improve performance by a lot.

Share this post


Link to post
Share on other sites
59 minutes ago, aeonios said:

Yes but if render is running much faster than sim then it will sit there and spam enumerateObjects at sim.

Renderer shouldn't do this. He should wait, if he has been done his work already.

1 hour ago, aeonios said:

Synchronizing that to eliminate the wasted requests would be a serious pain.

No, it wouldn't. We only need a simple semaphore. Because we need to have a 1:1 running scale.

1 hour ago, aeonios said:

I suspect that on large maps the cost of visibility testing becomes very high so using a hierarchal system could improve performance by a lot.

I'm not sure that only visibility checking improvements will save a lot of performance, but the testing will show. Also the hierarchal system should be pretty dynamic without much asymptotics losts.

Share this post


Link to post
Share on other sites
19 minutes ago, vladislavbelov said:

Renderer shouldn't do this. He should wait, if he has been done his work already.

No, it wouldn't. We only need a simple semaphore. Because we need to have a 1:1 running scale.

If the game on normal speed updates at 30fps and the renderer is trying to draw 60+fps for smooth camera scrolling then you won't be doing anybody any favors by forcing the renderer to be capped at 30fps. What we would need is a way for the renderer to know that it doesn't need to update animations or object positions if sim hasn't advanced, and then you might get less than smooth animation if the renderer draws right before the sim is about to advance. I don't know how that's usually handled.

I don't think a semaphore would work that well either. The renderer has to make a synchronized call to enumerateObjects on the one hand, and then sim has to make a lot of synchronized calls to the renderer for submitting each object. Compared to simply reducing the CPU overhead of rendering I don't know if it'd be worth it given the amount of synchronization overhead that would be required. At the very least it'd require a synchronized queue on both ends.

21 minutes ago, vladislavbelov said:

I'm not sure that only visibility checking improvements will save a lot of performance, but the testing will show. Also the hierarchal system should be pretty dynamic without much asymptotics losts.

I'm no so sure about that either but I do know that we're using expensive bounding box tests rather than sphere tests and it might potentially have to check thousands or even tens of thousands of objects every frame. At the moment I don't have any concrete numbers because there's no way to collect statistics since visibility tests are done in simulation and sim doesn't have direct access to the renderer, but that's another thing that centralization would allow.

Share this post


Link to post
Share on other sites
2 hours ago, aeonios said:

If the game on normal speed updates at 30fps and the renderer is trying to draw 60+fps for smooth camera scrolling then you won't be doing anybody any favors by forcing the renderer to be capped at 30fps. What we would need is a way for the renderer to know that it doesn't need to update animations or object positions if sim hasn't advanced, and then you might get less than smooth animation if the renderer draws right before the sim is about to advance. I don't know how that's usually handled.

You don't need 60fps if nothing was changed. And I don't suggest to fully separate thread, I only suggest a simple worker. It requires much less work from the current state.

2 hours ago, aeonios said:

I don't think a semaphore would work that well either. The renderer has to make a synchronized call to enumerateObjects on the one hand, and then sim has to make a lot of synchronized calls to the renderer for submitting each object. Compared to simply reducing the CPU overhead of rendering I don't know if it'd be worth it given the amount of synchronization overhead that would be required. At the very least it'd require a synchronized queue on both ends.

You need only 2 synchronizations in my variant. Because after the submitting step you have a set of buckets independent of the sim state.

2 hours ago, aeonios said:

I'm no so sure about that either but I do know that we're using expensive bounding box tests rather than sphere tests and it might potentially have to check thousands or even tens of thousands of objects every frame.

Why do you call AABB expensive? It's usually faster than sphere, but usual BB is slower, but not significantly (btw, it'd be good to compare).

2 hours ago, aeonios said:

At the moment I don't have any concrete numbers because there's no way to collect statistics since visibility tests are done in simulation and sim doesn't have direct access to the renderer, but that's another thing that centralization would allow.

You have profiler.txt for that.

Share this post


Link to post
Share on other sites
3 hours ago, vladislavbelov said:

You don't need 60fps if nothing was changed. And I don't suggest to fully separate thread, I only suggest a simple worker. It requires much less work from the current state.

If the camera moved then something changed, ie the camera view.

3 hours ago, vladislavbelov said:

Why do you call AABB expensive? It's usually faster than sphere, but usual BB is slower, but not significantly (btw, it'd be good to compare).

In what universe? Sphere requires only one signed distance calculation per frustum plane. AABB requires a lot more complicated math, again per frustum plane.

3 hours ago, vladislavbelov said:

You have profiler.txt for that.

Profiler.txt can't tell you anything that it doesn't know. I don't think there's any info recorded for objects that are culled by visibility testing. Renderer only knows about objects which are actually submitted.

Share this post


Link to post
Share on other sites
1 hour ago, aeonios said:

If the camera moved then something changed, ie the camera view.

"camera moved" != "nothings was changed".

1 hour ago, aeonios said:

In what universe?

Obviously. And AABB with AABB usually faster than Sphere with Sphere. About frustum planes the math is pretty simple for AABB, you need only 2 dots. Something bad is happening, if you need a complicated math for it.

1 hour ago, aeonios said:

Profiler.txt can't tell you anything that it doesn't know. I don't think there's any info recorded for objects that are culled by visibility testing. Renderer only knows about objects which are actually submitted.

Feel free to add new values to the profiler: https://trac.wildfiregames.com/wiki/EngineProfiling.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×