 
        Ykkrosh
WFG Retired- 
                Posts4.928
- 
                Joined
- 
                Last visited
- 
                Days Won6
Everything posted by Ykkrosh
- 
	Collecting data from usersYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion RAM graph. (Fun facts: the lowest reported value is 223MB; the highest is 49150MB; the second highest is 16080MB.) I would have expected much stronger steps in the graph. Looks like it's common for our measured value to slightly under-report the nominal RAM. On Linux the figure excludes 1.7% plus maybe the kernel image size (hence the step at ~2012MB before the step at ~2048MB). I guess the rest of the variation is some other figures that get subtracted (AGP aperture size maybe?), or people sticking random 256MB/512MB pieces into their machines, or something else, or some combination of those things. Regardless of the reasons, this means trying to read off figures at exactly (e.g.) 1024MB isn't useful. I think the important figures are: * 99.5% of users have nearly 512MB RAM. * 95% of users have nearly 1GB RAM. * 80% of users have nearly 2GB RAM. * 50% of users have nearly 3GB RAM. So I think we can safely assume 1GB as a minimum, but should worry if we assume much more than that.
- 
	Looks like this. What kernel version are you running? (I wonder if the bug is triggered by something that changed very recently, since there were no reports until a few days ago...)
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Hmm, I don't see any obvious problem. The most relevant part is around line 4600-4700, where it says ~~~~~~~~ FRAGMENT PROGRAM ~~~~~~~ ~ 8 Instructions ~ 5 Vector Instructions (RGB) ~ 2 Scalar Instructions (Alpha) ~ 0 Flow Control Instructions ~ 2 Texture Instructions ~ 0 Presub Operations ~ 4 Temporary Registers ~~~~~~~~~~~~~~ END ~~~~~~~~~~~~~~ in no-tex-mul, and ~~~~~~~~ FRAGMENT PROGRAM ~~~~~~~ ~ 10 Instructions ~ 6 Vector Instructions (RGB) ~ 2 Scalar Instructions (Alpha) ~ 0 Flow Control Instructions ~ 3 Texture Instructions ~ 0 Presub Operations ~ 5 Temporary Registers ~~~~~~~~~~~~~~ END ~~~~~~~~~~~~~~ in the other - it's just adding a couple of instructions and isn't doing anything unreasonably inefficient. So the compiler seems fine - I guess the performance difference must come from the hardware or something. But one thing that's weird is that it seems to compile each fragment program twice, differently. (It probably should compile twice, since it's used with two different vertex programs, but should be the same either way). E.g. in the no-tex-mul case, it has two blocks that start with 0: TEX TEMP[0], IN[1], SAMP[0], 2D 1: TEX TEMP[1], IN[2], SAMP[2], SHADOW2D 2: MUL TEMP[2].xyz, IN[0], IMM[0].xxxx 3: MAD_SAT TEMP[1].xyz, TEMP[2], TEMP[1], CONST[1] 4: MUL TEMP[0].xyz, TEMP[0], TEMP[1] 5: TEX TEMP[3].w, IN[3], SAMP[3], 2D 6: MUL TEMP[0].xyz, TEMP[0], TEMP[3].wwww 7: MOV OUT[0].xyz, TEMP[0] 8: END but diverge at the 'transform TEX' step: the first says Fragment Program: after 'transform TEX' # Radeon Compiler Program 0: TEX temp[0], input[1], 2D[0]; 1: MOV temp[1], none.0000; 2: MUL temp[2].xyz, input[0], const[2].xxxx; 3: MAD_SAT temp[1].xyz, temp[2], temp[1], const[1]; 4: MUL temp[0].xyz, temp[0], temp[1]; 5: TEX temp[3].w, input[3], 2D[3]; 6: MUL temp[0].xyz, temp[0], temp[3].wwww; 7: MOV_SAT output[0].xyz, temp[0]; while the second says Fragment Program: after 'transform TEX' # Radeon Compiler Program 0: TEX temp[0], input[1], 2D[0]; 1: TEX temp[4], input[2], 2DSHADOW[2]; 2: MOV_SAT temp[5].w, input[2].zzzz; 3: ADD temp[5].w, -temp[5].wwww, temp[4].xxxx; 4: CMP temp[1], temp[5].www1, none.0000, none.1111; 5: MUL temp[2].xyz, input[0], const[2].xxxx; 6: MAD_SAT temp[1].xyz, temp[2], temp[1], const[1]; 7: MUL temp[0].xyz, temp[0], temp[1]; 8: TEX temp[3].w, input[3], 2D[3]; 9: MUL temp[0].xyz, temp[0], temp[3].wwww; 10: MOV_SAT output[0].xyz, temp[0]; That looks like the first case is assuming the shadow test always fails (it uses "none.0000" (the constant zero) instead of looking at the 2DSHADOW texture, which looks to come from the RC_COMPARE_FUNC_NEVER path in radeon_program_tex.c). The weirder thing is the normal (not no-tex-mul) case, where it compiles 0: TEX TEMP[0], IN[1], SAMP[0], 2D 1: MOV TEMP[1].xyz, TEMP[0] 2: TEX TEMP[2], IN[2], SAMP[1], SHADOW2D 3: MUL TEMP[3].xyz, IN[0], IMM[0].xxxx 4: MAD_SAT TEMP[2].xyz, TEMP[3], TEMP[2], CONST[1] 5: MUL TEMP[1].xyz, TEMP[1], TEMP[2] 6: TEX TEMP[0].w, IN[3], SAMP[2], 2D 7: MUL TEMP[1].xyz, TEMP[1], TEMP[0].wwww 8: MOV OUT[0].xyz, TEMP[1] 9: END first into Fragment Program: after 'transform TEX' # Radeon Compiler Program 0: TEX temp[0], input[1], 2D[0]; 1: MOV temp[1].xyz, temp[0]; 2: MOV temp[2], none.0000; 3: MUL temp[3].xyz, input[0], const[2].xxxx; 4: MAD_SAT temp[2].xyz, temp[3], temp[2], const[1]; 5: MUL temp[1].xyz, temp[1], temp[2]; 6: TEX temp[4], input[3], 2D[2]; 7: MOV_SAT temp[5].w, input[3].zzzz; 8: ADD temp[5].w, -temp[5].wwww, temp[4].xxxx; 9: CMP temp[0].w, temp[5].www1, none.0000, none.1111; 10: MUL temp[1].xyz, temp[1], temp[0].wwww; 11: MOV_SAT output[0].xyz, temp[1]; It's replacing the shadow texture with none.0000 as before, but then it's sort of doing texture depth comparisons (which are meant for shadows) for the LOS texture (which doesn't want depth comparison) in lines 6-9, except actually that's all computing temp[0].w = 1 which is a total waste of time. That would result in certain models (maybe the heads of typical units) being dark (shadow-coloured) even when not in shadow, I think - do you see any visible bugs like that? I suppose this might be indicating a compiler bug (unless it's a weird bug in the game instead)... Perhaps you could try to find where in our application the shader is getting compiled like that: run with the normal shaders in gdb with a breakpoint like "b radeon_program_tex.c:141" (the "inst->U.I.Opcode = RC_OPCODE_MOV" line) and do a "bt full".
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Oh, and something else to try if it's not inconvenient: compile Mesa with --enable-debug and run the game with RADEON_DEBUG=info,fp,vp,pstat (I'm guessing those are the right commands from looking at the code) and see if that prints much stuff (and also with the TEX/MUL lines removed to see if that makes an unexpectedly large difference to the output).
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Can you run the r300c driver to compare? (I think that doesn't support GLSL so it won't do the vertexshader mode or fancywater, but the rest should work). (I can only run the (non-Gallium) i965 driver, where those TEX/MUL lines make no measurable difference, and Gallium llvmpipe where the difference is ~30msec/frame on Oasis with shadows/fancywater enabled (which is unsurprising since texture filtering in software will always be slow).)
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Huh, interesting. I expect we're drawing much less than a million model fragments per frame, and the X1600 can apparently handle about 2 gigatexels/sec, so I still don't really see how the minor increase in shader complexity could cost you 10-20msec/frame. Also this isn't the first rendered thing that uses the LOS texture (the "texture[2]" in this shader; the terrain needs it first), so I think it can't be texture upload latency. Maybe I'm just underestimating the amount of drawing and overestimating the GPU performance... It makes sense to skip this LOS texture multiplication for most models anyway - it's only really needed for large ones (buildings, maybe rocks, etc) that aren't already wholly visible. That's something that should probably be fixed along with various other fog-of-war graphics issues. So at least it sounds for now like it's not a fundamental problem with this new shader renderer . (I don't know about the other minor variations, but I assume they're similar things we can optimise later, hopefully.)
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion I still have no idea why the memory allocation was a hundred times slower on Windows, but I changed all that code to use a pool allocator so it seems to be comparable to Linux now.
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Yeah, we should do that - currently the shadow texture size is computed from the screen size rounded up to a power of two, but we could add an option to let people choose half or double resolution etc. (Another option we should add is (relatively-)soft-shadowing - currently it's sort of enabled for NVIDIA and Intel devices, but it should be disableable to improve performance, and it should be enableable in a way that works on ATI too.) Thanks, that does look like a real difference . "shader" and "vertexshader" both work pretty similarly, so I still don't see why it'd change so much, and it's hard to debug performance remotely. The only thing I can think to try: If you look in binaries/data/mods/public/shaders/model_common.fp at the end of the file for the lines TEX tex.a, fragment.texcoord[2], texture[2], 2D; MUL color.rgb, color, tex.a; and comment out both those lines (using "//"), does the "shader" mode go any faster? (That's new functionality compared to "vertexshader" mode, and it makes a ~10% difference when rendering with llvmpipe. I wouldn't expect it to be a bottleneck on a real GPU, but I could be wrong, and I can't think of anything else to try.)
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion I assume you mean the way there's a light strip just under the right slope of the roof? I don't see any difference in shadowing between the different renderpaths when I switch between them (and there probably shouldn't be any difference since they're all using basically the same shadowing code). That kind of problem is generally inevitable, though - the shadow texture map has a limited resolution, and it's too low resolution to accurately represent all shadows, so in this case it misses a bit under the roof. It's particularly bad at low camera angles since the shadow map gets stretched across a much larger area. But there's a few things we should probably try to improve it (use tighter frustum culling; use higher-res shadow maps if it doesn't affect performance much; tweak the z-bias ("renderer.shadowZBias" in the console)) so it could get a bit better in the future.
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Change the ambient colours and sun colour to be darker. They get added together, so typically you want their sum to be close to (255,255,255). (I changed _default.xml to have ambient (128,128,128) and sun (192,192,192) - the old default had sun (255,255,255) which made it much too bright when added to ambient.)
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Huh, odd. I see the same on Windows but not Linux. "compute batches" is purely CPU work so it can't be related to shaders unless I'm incredibly confused. On Windows, when I first launch the game, regardless of configured renderpath, the terrain blends compute batches is about 12ms/frame. On Linux it's about 0.2ms/frame. Changing renderpath at runtime on Windows sometimes makes it drop to about 2ms/frame, presumably due to the different vertex buffers meaning different amounts of batching. Replacing it with a boost::unordered_map doesn't remove the Windows/Linux difference, so it's not just an STL problem. Xperf seems to say almost all the time is spent in memory allocation in the std::map. Is allocation really that much slower on Windows? I guess we should give it a pool allocator if that'd solve the problem without having to rewrite the algorithm.
- 
	Shader-Based RenderingYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Hmm, that sounds like an unpleasant slowdown - there aren't any changes that ought to affect it that much. Is this 5-40% slower than the old default "vertexshader" path (rather than "fixed")? Do you get the same performance difference if you change the renderpath in the .cfg file instead of through the console? (Changing it at runtime appears to cause bad fragmentation in the vertex buffers, so it could potentially distort the figures unfairly.) I think Gallium always translates the old multi-texturing code into the same language as GL_ARB_fragment_program, so the drivers should be doing pretty much the same thing in both cases. (Running a debug Mesa with GALLIVM_DEBUG=tgsi and llvmpipe prints the shaders and they look pretty similar to me in both modes). So I'm not sure what else the problem could be
- 
	I committed some changes to use GL_ARB_fragment_program/etc for rendering (as discussed here). It should be enabled by default, unless you explicitly changed renderpath in a .cfg file. You can change at runtime by typing renderer.renderpath="shader" (the new one) or renderer.renderpath="vertexshader" (the old one which probably will be deleted since it's redundant now) or renderer.renderpath="fixed" (the oldest one which will be kept for compatibility) into the console. Probably the most important thing is to ensure compatibility. In theory it should look pretty much indistinguishable to the old version (except for a couple of trivial bugs). If you get any new error messages, or strange rendering (where it used to work before this change), please let me know About the only new feature is support for a brighter lighting model: In Atlas create a new map, or load an old map and change the 'lighting model' in the Environment panel, and then change the sun overbrightness control up and everything should get much brighter (though only with the new renderpath - lighting will be wrong in the old renderpaths, which is okay because they should only be used for compatibility with very old hardware). Old maps will typically need their ambient colours and sun colour and overbrightness adjusted to look okay in the new model, so they default to the old model for now. (The old model calculates lighting then clamps the values to 256 before multiplying by the texture, so it can only make the texture darker, if I remember correctly. The new model effectively clamps lighting to 512 instead, so textures can become brighter. Otherwise they're exactly the same.)
- 
	I get that water bug with the latest Intel drivers (but not with earlier ones, I think). Haven't looked in any detail to try debugging it, though.
- 
	Mouse Really Slow On Main Menu?Ykkrosh replied to liamdawe's topic in Game Development & Technical Discussion On what OS, what graphics card, what version of the game, etc?
- 
	Should be fixed (just an old typo that affected transparent models).
- 
	Graphics hardware compatibilityYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion GLSL isn't required, and the game works without it - you probably had some other problem (e.g. no OpenGL acceleration at all), but it's hard to tell without more detail (e.g. what the error message said). I started playing with some GL_ARB_fragment_program stuff, and it seems to work cleanly enough. Currently it renders all the non-transparent models, with lighting and shadows (and a similar-looking fallback behaviour when GL_ARB_fragment_program_shadow isn't supported) and player colour and smooth fading into the FoW/SoD, with a single ~40-line fragment program and ~30-line vertex program, with hotloading and a preprocessor (copied from Ogre) and flexible name-based binding of textures and uniform parameters. Interestingly it makes the 'sun overbrightness' control in Atlas much more useful, because the intermediate calculations don't all get clamped to [0, 1] so it can result in strongly over-saturated colours that make everything look much brighter. Also started writing about the target requirements.
- 
	They're projects whose users are programmers, and whose developers are definitely programmers, so they don't seem particularly comparable to a game whose developers are often non-programmers. Also, they're far more active projects so there's much more value in them using a DVCS than in us using one. Almost nobody pulls the code just to play - players use releases, and developers will have to learn a load of new commands and a fundamentally different conceptual model.
- 
	Reducing Boost DependencyYkkrosh replied to janwas's topic in Game Development & Technical Discussion Native filesystems don't handle any encodings, they just handle strings of '/'-separated bytes. (Similarly NTFS doesn't handle encodings, it just handles strings of 16-bit integers - it's highly conventional to interpret them as UTF-16 or perhaps UCS-2 rather than as anything else, but NTFS itself doesn't care). A path could be "\xEF\xBF\xBF/\xC3" and nothing in the OS would care. The only time encoding ever matters is user input and output, since it's nice to convert between bytes and a human-readable string, for which the convention is to use the current locale settings. If your filesystem has some weird messed-up combination of encodings then applications should still work correctly, they'll just look weird when they try displaying paths to you. If we implement locale support, I don't think there's any need to try UTF-8 first - the locale is more authoritative so we should just use that.
- 
	Collecting data from usersYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion (I've had to disable the CPU report for now, though, since parsing tens of megabytes of JSON on every page load doesn't make my server happy. I'll try to restore it some time soon.)
- 
	The graphics changes aren't significant enough to cause serious instability - it'll be incremental additions, and you could edit the config file to switch back to the old mode if it's particularly broken. Also problems will be found and fixed much earlier if development is in the trunk. Extremely few users use SVN versions directly (compare feedback count before/after release) so they won't hit any problems that do exist. Also the graphics work will probably be finished before we could decide anything about Git. So I don't think this case is a reason to switch now. I like DVCSs in general, though, so it's still worth considering for the longer term. I think a much better approach would be: * Set up a read-only Git mirror of the SVN repository. (Already done). Maybe set up an Hg one too, for people who prefer that (e.g. me). * Ensure that mirror is always up-to-date (perhaps automatically via an SVN post-commit hook or similar) so people have no reason to prefer SVN. * Encourage people to check out from that mirror instead of from SVN. (We mention it somewhere here but not prominently.) * When people have changes they want to submit, they can push changes to their fork on GitHub and then post on Trac with a link to their changeset or attach a patch or whatever, which is pretty trivial to do. * Some WFG member can review the change, give feedback, wait for a new patch, make tweaks, etc, then apply it as a patch to SVN and commit it. (If the person in the previous step is a WFG member, they can skip the review and apply the patch and commit themselves, which is also pretty trivial - it's what I do anyway since I use a local Hg mirror for development.) That's far less effort and risk than switching all our development to Git, and I think it provides almost all the benefits. If it turns out that everyone loves the new approach, we could do a more permanent transition later; otherwise we haven't lost anything or wasted any significant time or caused anyone unnecessary bother.
- 
	Graphics hardware compatibilityYkkrosh replied to Ykkrosh's topic in Game Development & Technical Discussion Got lots more data now (from over a thousand users), so it's slightly more meaningful to look at numbers. There's some relatively common devices in the stats that aren't interesting: * "GDI Generic" - the useless Windows XP software fallback, never going to work. * Radeons with "OpenGL 1.4 ([...] Compatibility Profile Context)" - misconfigured with indirect rendering which causes poor performance; users should be told to configure their systems properly. * GeForce2 MX, GeForce3 MX, RAGE/SiS/VIA, probably Mesa DRI R100 - not enough texture units to easily support decent rendering even without shaders, and not enough users (~1% of total) to be worth expending effort on. The most relevant extension here is GL_ARB_fragment_shader. Excluding the things above, about 14% of users don't support that. That's mostly a load of old Intel chips plus Mesa R300 (fairly old Radeons with recent but feature-poor Linux drivers). I guess the R300 driver situation could improve over the next year or so, by people moving to the Gallium driver, but the long tail of old Intels and miscellaneous others won't disappear any time soon, so it'll still be maybe 5%-10% of users. Compare to GL_ARB_fragment_program, which (excluding the above) is only missing for 2% of users, and only on very old hardware. I'd conclude we definitely can't require GL_ARB_fragment_shader, since that would block over a tenth of our potential users, but it's widespread enough that I think it's worth optimising for GLSL shaders at the expense of that tenth. But I'm now thinking it probably would be worth supporting GL_ARB_fragment_program too, if it doesn't add huge code complexity (which I don't think it should): it'll allow us to have better performance and better graphical effects (water cheaply reflecting the sky map (not reflecting units/etc), shadows (I think), particle effects) for an extra ~12% of users, and the remaining ~2% of users will be on such terrible hardware that we just need to limp along and don't need to bother rendering properly (e.g. we could skip all lighting if that saves some code, as long as it remains playable). So, new (probably final) proposal, trying to be more concrete this time: * Implement a shader abstraction that can handle both GL_ARB_{fragment,vertex}_shader and GL_ARB_{fragment,vertex}_program, so the renderer can easily switch between whichever is appropriate. * Gradually reimplement all current behaviour using GL_ARB_{fragment,vertex}_program. * Gradually remove current fixed-function behaviour once we reimplement it, if it's not trivial to keep that code around and it's not critical for playability. * Prefer using GL_ARB_{fragment,vertex}_program when adding new effects. Use GL_ARB_{fragment,vertex}_shader only if it's impossible with _program, or if it would be awkward with _program and is an optional high-end feature (soft shadows, maybe normal mapping, etc). That sounds to me like the most reasonable compromise between making use of 'modern' (only 8 year old) graphics features, and retaining compatibility so we don't make a noticeable number of players unhappy, given that we apparently have significant interest from players with pretty old and low-end hardware.
