Jump to content

Shader-Based Rendering


Recommended Posts

Hmm, I don't see any obvious problem. The most relevant part is around line 4600-4700, where it says

~~~~~~~~ FRAGMENT PROGRAM ~~~~~~~
~ 8 Instructions
~ 5 Vector Instructions (RGB)
~ 2 Scalar Instructions (Alpha)
~ 0 Flow Control Instructions
~ 2 Texture Instructions
~ 0 Presub Operations
~ 4 Temporary Registers
~~~~~~~~~~~~~~ END ~~~~~~~~~~~~~~

in no-tex-mul, and

~~~~~~~~ FRAGMENT PROGRAM ~~~~~~~
~ 10 Instructions
~ 6 Vector Instructions (RGB)
~ 2 Scalar Instructions (Alpha)
~ 0 Flow Control Instructions
~ 3 Texture Instructions
~ 0 Presub Operations
~ 5 Temporary Registers
~~~~~~~~~~~~~~ END ~~~~~~~~~~~~~~

in the other - it's just adding a couple of instructions and isn't doing anything unreasonably inefficient. So the compiler seems fine - I guess the performance difference must come from the hardware or something.

But one thing that's weird is that it seems to compile each fragment program twice, differently. (It probably should compile twice, since it's used with two different vertex programs, but should be the same either way). E.g. in the no-tex-mul case, it has two blocks that start with

  0: TEX TEMP[0], IN[1], SAMP[0], 2D
1: TEX TEMP[1], IN[2], SAMP[2], SHADOW2D
2: MUL TEMP[2].xyz, IN[0], IMM[0].xxxx
3: MAD_SAT TEMP[1].xyz, TEMP[2], TEMP[1], CONST[1]
4: MUL TEMP[0].xyz, TEMP[0], TEMP[1]
5: TEX TEMP[3].w, IN[3], SAMP[3], 2D
6: MUL TEMP[0].xyz, TEMP[0], TEMP[3].wwww
7: MOV OUT[0].xyz, TEMP[0]
8: END

but diverge at the 'transform TEX' step: the first says

Fragment Program: after 'transform TEX'
# Radeon Compiler Program
0: TEX temp[0], input[1], 2D[0];
1: MOV temp[1], none.0000;
2: MUL temp[2].xyz, input[0], const[2].xxxx;
3: MAD_SAT temp[1].xyz, temp[2], temp[1], const[1];
4: MUL temp[0].xyz, temp[0], temp[1];
5: TEX temp[3].w, input[3], 2D[3];
6: MUL temp[0].xyz, temp[0], temp[3].wwww;
7: MOV_SAT output[0].xyz, temp[0];

while the second says

Fragment Program: after 'transform TEX'
# Radeon Compiler Program
0: TEX temp[0], input[1], 2D[0];
1: TEX temp[4], input[2], 2DSHADOW[2];
2: MOV_SAT temp[5].w, input[2].zzzz;
3: ADD temp[5].w, -temp[5].wwww, temp[4].xxxx;
4: CMP temp[1], temp[5].www1, none.0000, none.1111;
5: MUL temp[2].xyz, input[0], const[2].xxxx;
6: MAD_SAT temp[1].xyz, temp[2], temp[1], const[1];
7: MUL temp[0].xyz, temp[0], temp[1];
8: TEX temp[3].w, input[3], 2D[3];
9: MUL temp[0].xyz, temp[0], temp[3].wwww;
10: MOV_SAT output[0].xyz, temp[0];

That looks like the first case is assuming the shadow test always fails (it uses "none.0000" (the constant zero) instead of looking at the 2DSHADOW texture, which looks to come from the RC_COMPARE_FUNC_NEVER path in radeon_program_tex.c).

The weirder thing is the normal (not no-tex-mul) case, where it compiles

  0: TEX TEMP[0], IN[1], SAMP[0], 2D
1: MOV TEMP[1].xyz, TEMP[0]
2: TEX TEMP[2], IN[2], SAMP[1], SHADOW2D
3: MUL TEMP[3].xyz, IN[0], IMM[0].xxxx
4: MAD_SAT TEMP[2].xyz, TEMP[3], TEMP[2], CONST[1]
5: MUL TEMP[1].xyz, TEMP[1], TEMP[2]
6: TEX TEMP[0].w, IN[3], SAMP[2], 2D
7: MUL TEMP[1].xyz, TEMP[1], TEMP[0].wwww
8: MOV OUT[0].xyz, TEMP[1]
9: END

first into

Fragment Program: after 'transform TEX'
# Radeon Compiler Program
0: TEX temp[0], input[1], 2D[0];
1: MOV temp[1].xyz, temp[0];
2: MOV temp[2], none.0000;
3: MUL temp[3].xyz, input[0], const[2].xxxx;
4: MAD_SAT temp[2].xyz, temp[3], temp[2], const[1];
5: MUL temp[1].xyz, temp[1], temp[2];
6: TEX temp[4], input[3], 2D[2];
7: MOV_SAT temp[5].w, input[3].zzzz;
8: ADD temp[5].w, -temp[5].wwww, temp[4].xxxx;
9: CMP temp[0].w, temp[5].www1, none.0000, none.1111;
10: MUL temp[1].xyz, temp[1], temp[0].wwww;
11: MOV_SAT output[0].xyz, temp[1];

It's replacing the shadow texture with none.0000 as before, but then it's sort of doing texture depth comparisons (which are meant for shadows) for the LOS texture (which doesn't want depth comparison) in lines 6-9, except actually that's all computing temp[0].w = 1 which is a total waste of time. That would result in certain models (maybe the heads of typical units) being dark (shadow-coloured) even when not in shadow, I think - do you see any visible bugs like that?

I suppose this might be indicating a compiler bug (unless it's a weird bug in the game instead)... Perhaps you could try to find where in our application the shader is getting compiled like that: run with the normal shaders in gdb with a breakpoint like "b radeon_program_tex.c:141" (the "inst->U.I.Opcode = RC_OPCODE_MOV" line) and do a "bt full".

Link to comment
Share on other sites

Sweet roller! That's got to be the single biggest bang for the buck in recent memory - you improved the framerate by a factor of ~10 :)

(reflect/refract/patches now take 1.4/1.6/0.7 ms)

Allocations are indeed expensive on Windows - VirtualAlloc is inexplicably slow (well, it does zero out the pages) and HeapAlloc is quite wasteful for small allocations, not to mention crazy complex (2 indirect DLL calls, >= 12 sub-functions and hundreds of instructions, including at least one with LOCK prefix).

That said, there is considerable room for improvement in pool_alloc (inlining, removing the assertions in release mode). Since it is now being called millions of times, I think it's worth tackling. Let's talk about that in tomorrow's meeting.

Link to comment
Share on other sites

It's replacing the shadow texture with none.0000 as before, but then it's sort of doing texture depth comparisons (which are meant for shadows) for the LOS texture (which doesn't want depth comparison) in lines 6-9, except actually that's all computing temp[0].w = 1 which is a total waste of time. That would result in certain models (maybe the heads of typical units) being dark (shadow-coloured) even when not in shadow, I think - do you see any visible bugs like that?

I suppose this might be indicating a compiler bug (unless it's a weird bug in the game instead)... Perhaps you could try to find where in our application the shader is getting compiled like that: run with the normal shaders in gdb with a breakpoint like "b radeon_program_tex.c:141" (the "inst->U.I.Opcode = RC_OPCODE_MOV" line) and do a "bt full".

I noticed I got black stone bases also with the r300g driver, but they appear fixed now that I am at 0ad r9179 with the same driver version. I then updated the r300g driver to cd2857fae16e1352f39b37f611797e66619d3fe5 and run 0ad under gdb with a breakpoint at radeon_program_tex.c:155 (that line has moved in the meantime) but I never got here (with both 0ad standard/modified FP).

I'll attach the updated r300g debug output anyway.

Also interesting that with classic r300 there is no FOW, all the map is visible from the start (it's OK on the minimap however). This is a driver bug, but I was expecting that for performance reasons the black on FOW was forced by the game engine and never sent in any way to the graphic driver.

r300g-debug.txt

r300g-debug-no-tex-mul.txt

Link to comment
Share on other sites

Yeah, from the debug output it looks like it's only compiling the shader once now, and it's not doing that bogus one with the "none.0000" which the breakpoint was for, so I guess they've fixed whatever that particular bug was :)

With FOW we don't render units that are hidden in it, so you shouldn't be seeing those, but we do render all the terrain regardless. I don't think it's a significant performance issue since if you're looking at a totally black part of the map you won't notice the framerate anyway :P. Better to optimise the worst case (visible map with lots of units) rather than the best case. Also there's a danger of graphical glitches like shadows suddenly popping into existence, if there's a mountain which we don't render until a unit gets close to it, so it seems safer to stick with always rendering it.

Link to comment
Share on other sites

It's pretty much all just boring technical details, so I'm not sure what kind of things you want explained :). What it means in terms of implementation is that instead of writing code like this, we write code like this, which is probably equally incomprehensible, but it makes it much easier to write more complex graphical effects and it allows us to make better use of the user's graphics hardware.

The difference with the lighting is also just boring technical details, with the consequence that it handles bright sunlight more accurately than the old approach.

Link to comment
Share on other sites

Sorry for expressing myself in a way that was a bit misleading. I didn't mean that the look hadn't changed at all, simply that regardless of whether it's a shader or something else light interacts with the 3d objects etc. It may look different, it may be done in a slightly different way, but it's still light interacting with in-game objects. I wasn't too serious and meant it more as a reaction to how Paul phrased it, but that's hard to get across in a forum post =) As for what a shader is I believe it can be described as "a way to define how light interacts with the 3D objects", which probably was what Paul meant in the first place :P I was just being silly. Sorry.

Link to comment
Share on other sites

I always believed it was the way light interacts with a 3D object's geometry and textures, to produce the full picture.

Hmm, I guess the important distinction is that what we're using now is programmable shaders.

Early graphics hardware (which OpenGL was originally designed for) just provided a fixed set of features that could be enabled or disabled or maybe hooked together in slightly different ways - there's a thing that calculates diffuse lighting per vertex, a thing that calculates specular lighting per vertex, a thing that computes fog, a thing that reads from a texture, maybe another for a second texture, etc, and they all connect together in a predefined order (originally because they were literally connected that way in the silicon, I expect). That's effectively doing shading, but the term shader usually isn't used in this context. You don't get much flexibility, so if you want to do a moderately complex graphical effect it usually requires some tricky combination of all the available features.

With (programmable) shaders on modern hardware (c. 2003), you replace all of that fixed functionality with something much more generic that can run a programming language. It's a pretty restrictive and highly specialised programming language (especially the old-fashioned language we're using), but it basically lets you do whatever lighting/texturing computations you want - it doesn't care that you're doing diffuse lighting, it just knows you're multiplying two values together and dividing by a constant and adding a third value or whatever. That gives far more flexibility. Instead of having to twist your implementation of a graphical effect into the predefined sequence of functions, you just write exactly what you want and the graphics card does exactly that.

The new brighter lighting model might be technically possible with our old fixed-function rendering code, but it's non-obvious and fiddly and might not actually be possible at all. In the new programmable shader system, it's two short lines of code and that's it.

Link to comment
Share on other sites

Note that on r300c it works perfectly with fixed renderpath but it shows the errors I talked before with shader. Are these problems really driver bugs or is a 0 A.D. faulty trying to run in shader mode on a card without the requisite (no GLSL, etc...).

Link to comment
Share on other sites

About default.cfg: Thanks, fixed.

Shader mode doesn't use GLSL shaders, it uses GL_ARB_fragment_program/GL_ARB_vertex_program which those drivers claim to support (by advertising those extensions), so I don't see why it would fail. It's always possible there's bugs in the game, but since it seems to work on most other drivers I'd probably assume it's just driver bugs. I don't know any way of automatically detecting that it's going to fail, so if this affects stable release versions of Mesa (not just recent Git versions) I guess we probably need to add stuff to hwdetect.js to force the disabling of shader mode on r300c drivers, which isn't nice since it really ought to work on those devices :(

Link to comment
Share on other sites

I think all the "DRI R300 Project" devices reported here are r300c (vs "X.Org R300 Project" which are r300g, and "X.Org" which are r600g) (does that sound correct?). That looks like about 3% of users are still using r300c so it seems important to not break badly for them - hopefully it'll be sufficient to switch to the fixed mode and show a warning suggesting they update their system.

Link to comment
Share on other sites

I think all the "DRI R300 Project" devices reported here are r300c (vs "X.Org R300 Project" which are r300g, and "X.Org" which are r600g) (does that sound correct?).

Correct.

That looks like about 3% of users are still using r300c so it seems important to not break badly for them - hopefully it'll be sufficient to switch to the fixed mode and show a warning suggesting they update their system.

Since mesa 7.9 r300g is the default driver, so hopefully more users will use it. Adding the fallback for r300c is anyway a good idea since r300g is still also not supported on non-Linux Unixes.

Link to comment
Share on other sites

  • 1 month later...
  • 3 weeks later...

shader-db: http://lists.freedesktop.org/archives/mesa-dev/2011-May/007694.html

Maybe 0ad shaders could be submitted so there is a chance that mesa drivers can be optimized for it.

Eric just added some 0 A.D. shaders (commit):

I've added their one GLSL shader pair from that link ( http://trac.wildfiregames.com/browser/ps/trunk/binaries/data/mods/public/shaders ). Unfortunately, ARB_fp/vp programs are hard to incorporate into this project since they don't have the linking API I use in order to get the debug information out of the driver (it would have to actually render a primitive, which means binding textures, providing appropriate vertex attributes, etc., to really get a generated program from the debug that represents what the shader would be when in actual use).
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...