Jump to content

Shader-Based Rendering


Recommended Posts

I committed some changes to use GL_ARB_fragment_program/etc for rendering (as discussed here).

It should be enabled by default, unless you explicitly changed renderpath in a .cfg file. You can change at runtime by typing renderer.renderpath="shader" (the new one) or renderer.renderpath="vertexshader" (the old one which probably will be deleted since it's redundant now) or renderer.renderpath="fixed" (the oldest one which will be kept for compatibility) into the console.

Probably the most important thing is to ensure compatibility. In theory it should look pretty much indistinguishable to the old version (except for a couple of trivial bugs). If you get any new error messages, or strange rendering (where it used to work before this change), please let me know :)

About the only new feature is support for a brighter lighting model: In Atlas create a new map, or load an old map and change the 'lighting model' in the Environment panel, and then change the sun overbrightness control up and everything should get much brighter (though only with the new renderpath - lighting will be wrong in the old renderpaths, which is okay because they should only be used for compatibility with very old hardware). Old maps will typically need their ambient colours and sun colour and overbrightness adjusted to look okay in the new model, so they default to the old model for now. (The old model calculates lighting then clamps the values to 256 before multiplying by the texture, so it can only make the texture darker, if I remember correctly. The new model effectively clamps lighting to 512 instead, so textures can become brighter. Otherwise they're exactly the same.)

Link to comment
Share on other sites

Hmm, that sounds like an unpleasant slowdown - there aren't any changes that ought to affect it that much.

Is this 5-40% slower than the old default "vertexshader" path (rather than "fixed")?

Do you get the same performance difference if you change the renderpath in the .cfg file instead of through the console? (Changing it at runtime appears to cause bad fragmentation in the vertex buffers, so it could potentially distort the figures unfairly.)

I think Gallium always translates the old multi-texturing code into the same language as GL_ARB_fragment_program, so the drivers should be doing pretty much the same thing in both cases. (Running a debug Mesa with GALLIVM_DEBUG=tgsi and llvmpipe prints the shaders and they look pretty similar to me in both modes). So I'm not sure what else the problem could be :(

Link to comment
Share on other sites

hm, looks like performance has indeed taken a hit.

The profile says each of the three passes (reflections, refractions, patches)

takes about 15 ms, basically all billed towards "compute batches".

This is on the default map at maximum zoom on


Win7 (6.1.7600)
Graphics Card : NVIDIA GeForce GTX 460
OpenGL Drivers : 4.1.0; nvoglv64.dll (8.17.12.6099), nvoglv32.dll (8.17.12.6099)

Vertexshader is about half as fast as shader (after switching at runtime).

However, fixed is about twice as fast as shader, with only 3..8 ms spent in "compute batches" (the large variance can be influenced by alt+tab / interacting with other windows that overlap the game).

Link to comment
Share on other sites

Huh, odd. I see the same on Windows but not Linux. "compute batches" is purely CPU work so it can't be related to shaders unless I'm incredibly confused. On Windows, when I first launch the game, regardless of configured renderpath, the terrain blends compute batches is about 12ms/frame. On Linux it's about 0.2ms/frame. Changing renderpath at runtime on Windows sometimes makes it drop to about 2ms/frame, presumably due to the different vertex buffers meaning different amounts of batching.

Replacing it with a boost::unordered_map doesn't remove the Windows/Linux difference, so it's not just an STL problem. Xperf seems to say almost all the time is spent in memory allocation in the std::map. Is allocation really that much slower on Windows? I guess we should give it a pool allocator if that'd solve the problem without having to rewrite the algorithm.

Link to comment
Share on other sites

Change the ambient colours and sun colour to be darker. They get added together, so typically you want their sum to be close to (255,255,255). (I changed _default.xml to have ambient (128,128,128) and sun (192,192,192) - the old default had sun (255,255,255) which made it much too bright when added to ambient.)

Link to comment
Share on other sites

I assume you mean the way there's a light strip just under the right slope of the roof? I don't see any difference in shadowing between the different renderpaths when I switch between them (and there probably shouldn't be any difference since they're all using basically the same shadowing code). That kind of problem is generally inevitable, though - the shadow texture map has a limited resolution, and it's too low resolution to accurately represent all shadows, so in this case it misses a bit under the roof. It's particularly bad at low camera angles since the shadow map gets stretched across a much larger area. But there's a few things we should probably try to improve it (use tighter frustum culling; use higher-res shadow maps if it doesn't affect performance much; tweak the z-bias ("renderer.shadowZBias" in the console)) so it could get a bit better in the future.

Link to comment
Share on other sites

Here are some better datas:

Common setting on local.cfg:

windowed = true

I also added

renderpath = VALUE

where VALUE is shader, vertexshader and fixed

Data taken on first screen pressing SHIFT-F and then F11 after starting 0ad with:

./0ad -quickstart -autostart=MAP


MAP renderpath FPS render (msec/frame)

Miletus def/shader 12 77-80
vertexshader 13 72-73
fixed 13-14 68-69
def + comm shad 14 66-67

Oasis def/shader 10 96-98
vertexshader 12-13 77-78
fixed 13-14 70-71
def + comm shad 12 80-81

Let me know if you need any other data.

EDIT: added def + comm shad data (default + commented shader) as requested on comment 16.

Edited by fabio
Link to comment
Share on other sites

And here are the screenshots for Oasis with info on render pressing F11 3 times. It looks like there is some game randomizations anyway for trees, stones and animals:

default

vertexshader

fixed

Yeah, there is the actor variations system that isn't yet implemented for the current simulation system (or at least not fully, it seems to work to some degree as the different models/textures are displayed and not just one, it's the "remembering which variation is used for this actor/entity" that obviously isn't implemented yet).

Link to comment
Share on other sites

Would it be too much work to have an option to choose the texture size, ie 'Shadow Quality'(thats what it is, rite?), so people with slower systems can use a small texture and people with better PC's a high-res texture?

Yeah, we should do that - currently the shadow texture size is computed from the screen size rounded up to a power of two, but we could add an option to let people choose half or double resolution etc. (Another option we should add is (relatively-)soft-shadowing - currently it's sort of enabled for NVIDIA and Intel devices, but it should be disableable to improve performance, and it should be enableable in a way that works on ATI too.)

Here are some better datas

Thanks, that does look like a real difference :(. "shader" and "vertexshader" both work pretty similarly, so I still don't see why it'd change so much, and it's hard to debug performance remotely. The only thing I can think to try: If you look in binaries/data/mods/public/shaders/model_common.fp at the end of the file for the lines

TEX tex.a, fragment.texcoord[2], texture[2], 2D;
MUL color.rgb, color, tex.a;

and comment out both those lines (using "//"), does the "shader" mode go any faster? (That's new functionality compared to "vertexshader" mode, and it makes a ~10% difference when rendering with llvmpipe. I wouldn't expect it to be a bottleneck on a real GPU, but I could be wrong, and I can't think of anything else to try.)

Link to comment
Share on other sites

On Windows, when I first launch the game, regardless of configured renderpath, the terrain blends compute batches is about 12ms/frame. On Linux it's about 0.2ms/frame.

I still have no idea why the memory allocation was a hundred times slower on Windows, but I changed all that code to use a pool allocator so it seems to be comparable to Linux now.

Link to comment
Share on other sites

Thanks, that does look like a real difference :(. "shader" and "vertexshader" both work pretty similarly, so I still don't see why it'd change so much, and it's hard to debug performance remotely. The only thing I can think to try: If you look in binaries/data/mods/public/shaders/model_common.fp at the end of the file for the lines

TEX tex.a, fragment.texcoord[2], texture[2], 2D;
MUL color.rgb, color, tex.a;

and comment out both those lines (using "//"), does the "shader" mode go any faster? (That's new functionality compared to "vertexshader" mode, and it makes a ~10% difference when rendering with llvmpipe. I wouldn't expect it to be a bottleneck on a real GPU, but I could be wrong, and I can't think of anything else to try.)

I added it to comment 13. On Miletus this is the fastest combination, while on Oasis it is faster but still slower than fixed and vertexshader.

Link to comment
Share on other sites

Huh, interesting. I expect we're drawing much less than a million model fragments per frame, and the X1600 can apparently handle about 2 gigatexels/sec, so I still don't really see how the minor increase in shader complexity could cost you 10-20msec/frame. Also this isn't the first rendered thing that uses the LOS texture (the "texture[2]" in this shader; the terrain needs it first), so I think it can't be texture upload latency.

Maybe I'm just underestimating the amount of drawing and overestimating the GPU performance... It makes sense to skip this LOS texture multiplication for most models anyway - it's only really needed for large ones (buildings, maybe rocks, etc) that aren't already wholly visible. That's something that should probably be fixed along with various other fog-of-war graphics issues. So at least it sounds for now like it's not a fundamental problem with this new shader renderer :). (I don't know about the other minor variations, but I assume they're similar things we can optimise later, hopefully.)

Link to comment
Share on other sites

Can you run the r300c driver to compare? (I think that doesn't support GLSL so it won't do the vertexshader mode or fancywater, but the rest should work). (I can only run the (non-Gallium) i965 driver, where those TEX/MUL lines make no measurable difference, and Gallium llvmpipe where the difference is ~30msec/frame on Oasis with shadows/fancywater enabled (which is unsurprising since texture filtering in software will always be slow).)

Link to comment
Share on other sites

Oh, and something else to try if it's not inconvenient: compile Mesa with --enable-debug and run the game with RADEON_DEBUG=info,fp,vp,pstat (I'm guessing those are the right commands from looking at the code) and see if that prints much stuff (and also with the TEX/MUL lines removed to see if that makes an unexpectedly large difference to the output).

Link to comment
Share on other sites

Here's my info. Hope I did it right.

I manually set the shader in local.cfg before starting the game, to get as accurate info as possible:

Shader: 15 FPS, 62-65 msec/frame, 170-200 msec/turn (fairly stable)

VertexShader: 11 FPS, 85-87 msec/frame, 171-250 msec/turn (constant jumping between high and low)

Fixed: 13 FPS, 76-78 msec/frame, 155-230 msec/turn (constant jumping between high and low)

So shader has a better FPS and msec/frame, but it is beaten by fixed when it comes to msec/turn.

Link to comment
Share on other sites

Oh, and something else to try if it's not inconvenient: compile Mesa with --enable-debug and run the game with RADEON_DEBUG=info,fp,vp,pstat (I'm guessing those are the right commands from looking at the code) and see if that prints much stuff (and also with the TEX/MUL lines removed to see if that makes an unexpectedly large difference to the output).

r300g (mesa master up to 4a7f013f9db793dab8dbc9f71646dab49f12ed2f) debug outputs attached. 0ad run with:

RADEON_DEBUG=info,fp,vp,pstat ./0ad -quickstart -autostart=Oasis > r300g-debug.txt 2>&1

r300g-debug.txt

r300g-debug-no-tex-mul.txt

Edited by fabio
Link to comment
Share on other sites

Can you run the r300c driver to compare? (I think that doesn't support GLSL so it won't do the vertexshader mode or fancywater, but the rest should work). (I can only run the (non-Gallium) i965 driver, where those TEX/MUL lines make no measurable difference, and Gallium llvmpipe where the difference is ~30msec/frame on Oasis with shadows/fancywater enabled (which is unsurprising since texture filtering in software will always be slow).)

With r300c trees and other objects (stone bases) are all black. Enabling shadows fixes the blacks, but shadow are completely borked.

With r300c with TEX and MUL disabled trees are OK but stone bases are still black. Shadows still are still broken.

I think these are r300c regression (the shader compiler is shared between r300g/c and nobody still tests r300c when the compiler changes...). Taking number from this is probably meaningless.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...