Jump to content

Ykkrosh

WFG Retired
  • Posts

    4.928
  • Joined

  • Last visited

  • Days Won

    6

Everything posted by Ykkrosh

  1. The explanation here actually sounds pretty reasonable to me, and is quite similar to what we do for 0 A.D. . Installing in "C:\Program Files" is bad for various reasons (requires admin access to install and to update (some users don't have admin access at all, and the rest shouldn't trust dodgy insecure downloaded software like games enough to give them admin privileges), and it breaks in multi-user environments if one user is trying to play while another is trying to update), and %LOCALAPPDATA% is the best place that avoids those problems. Seems like the main difference is that we let users choose to install in a different location if they want, whereas AoEO apparently expects you to use standard Windows mechanisms to move the entire Local directory at once.
  2. That sounds inconvenient . You might be able to force it into a different location with something like Steam Mover (not restricted to Steam - it just moves files from one location to another, and sets up a link so that it still looks like it's in the old location).
  3. That looks like the right calculation to do. But techSolid/shaderSolid are used for the solid black sides of the map, not for the actual terrain tiles - you should specify the uniform inside TerrainRenderer::PrepareShader so that it's available to all the base/blend/decal terrain rendering calls. Circular terrain is just square terrain with permanent shroud-of-darkness around the corners, so GetTilesPerSide() gives you the diameter.
  4. The shader code is in binaries/data/mods/public/shaders/ in SVN, though if you're using one of the alpha releases you'll have to look inside the installed public.zip to find it. It looks like you're using a slightly old TerrainRenderer.cpp either way - there were quite a few changes to the shader system recently so files might be in different places if you don't update to the latest SVN version. In the current SVN, shaders/arb/terrain_common.vp is the default vertex shader, or if you open the in-game console (F11, I think) and enter "renderer.preferGLSL=true" (or put preferglsl=true in the config file) then it'll use the GLSL version instead. There's also a bit of high-level documentation that may be relevant (though terrain uses shader effects directly, it doesn't use materials, so ignore that part).
  5. Done - hopefully that page makes some sense to people.
  6. That's possible, though it'd probably be more useful to swap to sprites which should give more significant performance savings when zoomed out (since the overriding cost with a large number of units is probably the number of geometry batches, not the number of polygons). (See e.g. Rome Total War, which does lots of LOD with model resolutions and sprites since it supports extreme zooming). The usual problem with LOD is that you can see the units flip between the different meshes/sprites/etc, which is kind of ugly, and it's more work (low-res models need artists, sprites need coding). We could also do other LOD-related stuff like reduce the temporal precision of animations, e.g. run all animations at 10fps (vs the current implementation which is as precise as your framerate) and then if two units are at the same frame we only need to compute the skinning once and share it between them, which might help. In general, I think it's sensible to try optimising the simple high-quality approach first (which is what I was trying (failing) to do here), and if that turns out to be insufficient then add performance hacks onto it later. I don't know whether the current approach is insufficient in practice (we have too many other performance problems that need to be fixed before it might become a bottleneck), and if so I don't know when will be a good time for "later" - I suppose it's not yet, so I'll try to avoid spending more time experimenting with this myself, but hopefully it wouldn't be too far away
  7. In almost all cases we do detect hardware capabilities and ignore options if they're not supported. The only exception is with the new GLSL stuff, since it's a bit trickier (different shaders require different GLSL versions and can be mixed with non-GLSL shaders) and is all disabled by default so it doesn't matter yet - that would need to be cleaned up before being properly supported. We already select default graphics settings based on hardware to some extent (here), for performance or to avoid bugs, but that's pretty crude since we don't have very relevant data. I think it'd probably be useful to add a benchmarking mode which can compare various settings (fixed-function vs ARB shaders vs GLSL shaders, shadows, dynamic reflections, resolution, antialiasing, various minor implementation details, etc) and report performance, so we can see if there's unexpected performance problems with various devices/drivers/OSes/etc and so we can pick sensible defaults.
  8. I committed the GPU skinning code now - you probably shouldn't test it, but if you really want to then you need to set preferglsl=true and gpuskinning=true in the config file, and need a device that supports OpenGL 3.0 (or GL_EXT_gpu_shader4). (Other combinations are likely to crash, which is intentional.)
  9. Tried that - arrays of 4 joints/weights are shared between each model, and the CPU just computes the animated bone matrices (multiplied by inverse bind pose) and uploads that per model, and the vertex shader does all the weighted blending per vertex. Old mesh / GF560Ti: 7 msec/frame New mesh / GF560Ti: 35 msec/frame Old mesh / HD3000: 35 msec/frame New mesh / HD3000: 285 msec/frame It's better than the previous approach for the old mesh (since there's less uniforms to upload), but worse for the new mesh (since there's the same number of uniforms and more computation in the vertex shader; especially on HD3000 which is seemingly slow at vertex shaders). So... If we were targeting fast GPUs (not Intel ones), and/or we had high-poly models (e.g. this was an FPS game), it'd make sense to do as much work as possible in the vertex shader. Since we should optimise for slow GPUs, and we have low-poly models, it seems like sticking with CPU skinning (and optimising it a bit more) is best. So that's good to know No need to bother with that, I think - it shouldn't affect the conclusions much, and it'll probably break the method in post #2 entirely (there's a hardware limit of ~250 blend matrices which would likely be exceeded). Yeah, I didn't intend it to be a realistic mesh - it's just to get a feeling for how performance varies with mesh complexity, and it's easier to see that when testing extremes. It's not necessary to test ones in between since you can just interpolate between those extremes to get a rough but adequate view: if ~200 units with 6K-tri meshes are alright, and ~1000 units with 0.4K-tri meshes are alright, then 1K-tri meshes should be perfectly fine for several hundred units on screen at once. So, please feel free to use a thousand triangles on unit meshes if you want, but not a lot more than that
  10. Hmm, I did a simple test with doing the vertex transforms in the (GLSL) vertex shader instead of on the CPU. Models share mesh data; the mesh data has a GL_UNSIGNED_BYTE blend-matrix index attribute per vertex (via glVertexAttribIPointer); blend matrices are uploaded per model into a "uniform mat4 blendMatrices[128]" with glUniformMatrix4fv. Both models have 30 bones (counting the implicit root bone). The low-res mesh has 121 blend matrices (the number of distinct combinations of bone weights). The high-res mesh has 30 (because it's using the buggy PMD exporter which only uses one bone per vertex). Total frame times: Old mesh / GF560Ti: 16 msec/frame New mesh / GF560Ti: 28 msec/frame Old mesh / HD3000: 67 msec/frame New mesh / HD3000: 160 msec/frame The only case that's faster is the new mesh on GF560Ti. I presume that's because it has few blend matrices and many vertices (unlike the old mesh), and the GPU has more processing power than the CPU (unlike the HD3000). With old mesh on GF560Ti, profiler says 31% of time is computing the blend matrices, 15% is inside the blend matrix glUniformMatrix4fv, most of the rest is in drivers. The uniform cost (and presumably much of the driver cost) could possibly be reduced with GL_ARB_uniform_buffer_object, but only new drivers support it; or for the old mesh (with many more blend matrices than bones) it could be reduced by uploading the bone matrices and weights and doing the blending inside the vertex shader, instead of blending on the CPU (which would also save that CPU time). I suppose I should try that too, to see if it makes the old mesh any faster.
  11. Re this (split off from that thread to try to minimise disorganisation): Testing with 1024 animated actors (the new mesh vs the old mesh skeletal/m_tunic_short.dae, both with animation biped/inf_hoplite_walk.psa and texture skeletal/hele_isp_e_1.dds; no props etc), with all actors on screen. Empty map (no water etc, so no reflections/refractions rendered). Shadows enabled (so each model will be rendered twice). Core i5-2500K 3.3GHz, Windows, vsync disabled, ARB shader rendering, 1024x768 window. Ran on GeForce 560 Ti (pretty high-end compared to most users; ought to run at 60fps with no problem) and Intel HD Graphics 3000 (the current fastest Intel one; generally expected to be usable for gaming at low quality settings, so probably a realistic target for decent performance). Old mesh: * Triangles: 390 * Vertexes: 302 * Model triangles drawn: 798,720 * Vertex buffers allocated: 10,372,852 bytes New mesh: * Triangles: 6656 * Vertexes: 3402 * Model triangles drawn: 13,631,488 * Vertex buffers allocated: 112,016,048 bytes Old mesh / GeForce 560 Ti: * Total frame time: 12.5 msec/frame * Time in "prepare models": 3.5 msec/frame * Total frame time when paused: 2.5 msec/frame New mesh / GeForce 560 Ti: * Total frame time: 45.5 msec/frame * Time in "prepare models": 38.0 msec/frame * Total frame time when paused: 24.0 msec/frame Old mesh / Intel HD Graphics 3000: * Total frame time: 26 msec/frame * Time in "prepare models": 3.5 msec/frame * Total frame time when paused: 17 msec/frame New mesh / Intel HD Graphics 3000: * Total frame time: 145 msec/frame * Time in "prepare models": 100 msec/frame * Total frame time when paused: 130 msec/frame There's 17x as many triangles in the new mesh, and 11x as many vertexes. Vertex buffers are 32 bytes per vertex, for each instance of the mesh. "Total frame time" is limited by the CPU or GPU, whichever is slower (since they run in parallel). "Time in "prepare models"" is the CPU cost of the skinning computation and vertex data upload - in the "New mesh / GeForce 560 Ti" case, "prepare models" is about 60% skinning and 40% upload. (Skinning should have the same cost in the Intel HD 3000 case, but the upload is much slower.) "Total frame time when paused" means the meshes aren't animating, so there's no skinning or vertex data upload - it's basically just the GPU cost of rendering all the triangles. Based on the paused times, GF560Ti can render about 600M tri/sec, HD3000 can render about 100M tri/sec - those figures sound vaguely plausible so I'll assume they're right. If we want 30fps on HD3000, that means at most 3M tri/frame. With the new 6656-tri mesh (keeping shadows enabled, ignoring props and buildings and trees which will eat into the polygon count), we could have ~200 units on screen at once before hitting the triangle count limit. Half as many triangles would allow twice as many units. Independent of this triangle rendering, the CPU skinning takes about 25 msec/frame for these 1024 units. 200 units should therefore be ~5 msec/frame. This is a fairly fast CPU, so multiply by perhaps 2 for a reasonable lower-end CPU. Running at 60fps means we only have 16 msec/frame in total, and 5ms (or 10ms) is a big chunk. So I think we'd be primarily limited by CPU skinning cost, before being limited by triangle rendering cost, except on especially slow GPUs and fast CPUs. Vertex data upload seems unpleasantly expensive; 100MB of vertex data per frame at 60fps is approaching the PCIe 16x bandwidth limit so that'll never work especially well, and with smaller numbers of units it's still a lot of bandwidth. I think our current vertex data upload code is somewhat inefficient (it updates lots of tiny chunks instead of throwing out the entire vertex buffer each frame, which'll probably prevent some driver optimisations) and could be improved, but that wouldn't solve the fundamental bandwidth problem. So... I don't think the 6656-tri mesh is obscenely high resolution, but it's a bit too much if we want 200 units on screen at once (and much too much if we want more). But what we should really try is to do skinning on the GPU instead of on the CPU - that wouldn't increase the GPU's maximum renderable tris/sec, but it would eliminate the CPU skinning cost (at the expense of putting more load on the GPU vertex shaders) and would also eliminate the vertex data upload. That shouldn't be technically complex (I hope), so I suppose I'll experiment with that to see how it influences performance. With that data it should be possible to make a more informed tradeoff between gameplay design (number of units) and art design (number of triangles per unit).
  12. I suspect dynamically growing snow like that could be done by creating a new greyscale texture that determines the snowiness percentage at which it should be drawn white, e.g. if the building is 20% snowy then render white wherever the texture is >0.8, or something like that. Wouldn't be technically complex, but it'd require more work from artists (and would presumably also require new UV unwrapping for buildings). DDS does 4 channels (RGBA usually). The more important thing is the pixel formats supported by OpenGL textures, which has the same limit. But for more than that, you can just use multiple textures when rendering a model, which isn't much of a problem. Users or developers shouldn't have to deal with DDS files - the input data should usually be PNG, and DDS is just generated and loaded by the engine (which is all platform-independent custom code so the OS doesn't matter).
  13. That sounds nice, and would be technically doable assuming you just want it to preserve the current locations of entities and not the rest of the gameplay state. (I guess the best UI would be an "open saved game" in Atlas, which is implemented by making it load the saved game then save as a temporary .pmp/.xml then load that again, so that it ends up in a clean state.)
  14. When we want normal maps, I imagine it might be easier for artists if we keep them in separate files from the specular maps, given the apparent troublesomeness of editing alpha channels, then the game can be made to automatically combine them both into a single DXTC-compressed DDS file on first load for efficient memory usage. (For better DXTC compression of normal+specular it actually seems common to use the RGBA channels to store not XYZS, but something more like 0XSY, so the game will need to do some swizzling when importing the textures anyway.)
  15. Hmm, odd - in that case I don't think there's much you can do about it
  16. Day 18 I finished and committed those renderer changes (74 files changed, 2315 lines inserted, 3503 lines deleted). In general, there ought to be no visible changes. The exception is that I fixed the non-shader-based rendering mode so that its lighting matches the shader mode - the difference is that it allows the sunlight to be brighter than pure white (by a maximum factor of 2). I've also been experimenting with specular lighting, so this seems like a good opportunity to show some vaguely pretty pictures of the lighting system. (This is all very technically simple - other games have been doing this for most of a decade, but at least we're advancing a little bit.) The first component is the basic textures for models and terrain: (click images for larger higher-quality versions) Then there's the diffuse lighting - surfaces that are facing towards the sun are bright, surfaces that are perpendicular to the sun or facing away are dark: The scenario designer can control the colour and brightness of the sun, which affects this diffuse lighting. Surfaces that aren't lit directly by the sun shouldn't be totally black - they'd still be lit by light bouncing off nearby objects. As a (very rough) approximation, we add an ambient lighting component: The scenario designer can control the colour and brightness again, with separate values for terrain and for models to give them more control over the final appearance. Finally there's the shadows: All these components get multiplied and added to produce the final result: This is what the game currently looks like. If you compare it against the first image, you can see that some parts of the scene are brighter than the unlit textures - that's what happens when the ambient plus diffuse lighting is brighter than pure white. (OpenGL generally clamps colours to the range [0, 1] so you can't exceed white, so what we actually do is compute all the ambient and diffuse lighting at 50% of its desired value and then multiply everything by 2 just before drawing it onto the screen.) I also added some shader code to do specular lighting, to simulate the sun reflecting off shiny surfaces. For testing I've applied it to every model, which looks like: and that gets added to all the previous lighting so you end up with: Unlike diffuse lighting, specular depends on the position of the camera, so it looks better in motion. Also it obviously shouldn't be applied to every model, and should preferably be controlled by a new specular texture for models that want it (so e.g. the metal parts of a soldier's texture could be marked as highly reflective and the cloth parts as non-reflective), but that should be easy to add thanks to the changes I made to the renderer, and then it might allow some nicer artistic effects. Performance of fancier lighting is a potential concern, since it does extra computation (in this case a vector normalisation and an exponentiation) for every single pixel that's drawn. In practice, with specular lighting applied across an entire 1024x768 screen, the extra cost on an Intel GMA 4500MHD on Linux (which is barely fast enough to run the game decently anyway) looks to be about 2msec/frame, while on Intel HD Graphics 3000 on Windows it's too small to easily measure. So it should probably be optional to help the bottom-end hardware, but is fine for anything slightly more advanced than that.
  17. I assume this is the relevant Sony page, which only has the old drivers you've already got. This page looks like the appropriate Intel one, with the latest versions - they're not specific for your machine, so there's a few warnings to be careful, but I believe they ought to work fine (and in the worst case you can just reinstall the old Sony ones).
  18. If you're on Linux, this bug caused poor Atlas behaviour for me and might be relevant - the display freezes until there's some kind of X event (e.g. mouse input). The patch is released in libxcb 1.8, but I haven't tested yet whether it fixes the problem.
  19. Those look like fairly old drivers - it'd be good to see if you can update to the latest Intel graphics drivers for your device and OS. (I think I had similar water problems with Intel drivers on Windows a while ago, but it looks fine when I test it now.) (My recent renderer changes should only affect the rendering of models, and shouldn't influence water at all, so there's no need to use a newer version of the game.)
  20. Those are all completed heavily-optimised games built by dozens of experienced developers, whereas 0 A.D. is currently an incomplete alpha release built mostly by a small number of inexperienced part-time volunteers, so there are a few areas in which we're not as good as them yet
  21. That's the same - each model has an actor XML file (of which we have hundreds) that points at a material XML file (currently there's about four). The difference now is that the material XML file explicitly points at the shader effect XML file, whereas the old renderer had C++ code that picked which shader effect to use for each material. That's what we currently do. The downside is that you have to draw every model twice - the first pass draws with alpha testing, and the second pass with alpha blending. That means twice as many draw calls and twice as many polygons to render, which hurts performance when there's a lot of transparent models.
  22. Days 16 and 17 Trying to do something other than pathfinding for a bit, since I'm not very good at concentrating on that (continued getting distracted by doing the release and reviewing and life and other stuff). The game's current renderer is hard-coded to support three different materials: * Plain old diffuse maps - a mesh has a single RGB texture, which is just multiplied by lighting and shadows etc and then drawn. * Diffuse maps plus player-colouring - a mesh has an RGBA texture, where the A channel determines where to use the RGB channels and where to use the player colour (blue, red, etc) instead. (Most units use this.) * Diffuse maps plus alpha-blending - a mesh has an RGBA texture, where the A channel determines where the mesh should be drawn as opaque or transparent or semi-transparent. (We use this a lot for trees and other vegetation.) Also, each model is rendered in several different modes: * The basic mode that draws a visible model onto the screen. * Shadow-map generation: the scene is rendered from the direction of the sun, computing only the depth (i.e. distance from sun) of each pixel, not the colour, to support shadowing calculations. * Silhouette blocking: to support silhouettes, i.e. units being rendered as solid colour when behind a building/tree/etc, the buildings/trees/etc are drawn to a 1-bit stencil buffer (no colour, it just wants to know which pixels were covered). * Silhouette display: after rendering the blockers, it then renders the units that will display a silhouette, as a solid colour, using the depth and stencil buffers so it's only drawn when behind a blocker. Different materials behave differently in each mode. E.g. in shadow-map generation we ignore colour, so non-alpha-blended models don't have to load their texture at all, which can improve performance; but alpha-blended models do have to load their texture so that the transparent areas don't cast a shadow. The renderer therefore has a "shader effect" per mode per material, where a shader effect defines a "shader technique" that defines how to render a mesh (i.e. what vertex data it needs, what textures it needs, what computation to perform to produce the colour of each pixel, what OpenGL state to set up (blending, depth test, masks), etc). The renderer stores a separate list of models for each material, and in each mode it renders each of those lists with the appropriate technique. Alpha-blending is special because you have to draw polygons in order from furthest to nearest, to get the correct rendering: Graphics cards store a depth buffer so that you can draw opaque objects in any order, and if you try to draw a pixel that's behind another previously-drawn nearer pixel then it will be rejected (so you'll end up with only the nearest object being visible). If the pixel in front is meant to be semi-transparent, you actually do want to draw behind it, but the hardware doesn't store enough data per pixel to be able to detect that case. In practice, you have to sort transparent objects by distance from camera and draw each one twice to get it working well enough, which is not fast. As an extra complication, there's an "instancing" optimisation for non-animated models. Animated models store a separate copy of their mesh data for every unit (since we compute the vertex positions on the CPU, and each unit will be in a slightly different animation state, so they can't share), but for non-animated models we only need to store a single copy of the mesh data and can easily tell the GPU to translate/rotate it as necessary for each instance, which saves memory and helps performance. As yet another complication, we want to maintain support for old graphics cards that don't support shaders at all (or that have unusably buggy or slow support), since there's a non-trivial number of them. Every shader effect actually defines three techniques: one that doesn't use real shaders (for maximum compatibility), one that uses GLSL shaders (for compatibility with OpenGL ES, currently just for Android), and one that uses GL_ARB_fragment_program shaders (for typical use, since GLSL is less widely and more buggily supported than ARB shaders). The problem with this rendering system is its inflexibility. Say we wanted to add specular lighting to some models to make them look shiny - that would be a new material, and it would require a significant number of changes to the C++ code and a new shader effect (with non-shader/GLSL/ARB variants). But actually we'd want to combine the specular lighting with all the other materials: diffuse+specular, diffuse+playercolour+specular, diffuse+alphablend+specular. Add in the instancing vs non-instancing versions of each of those, and the number of combinations explodes and becomes unmanageable. The most useful new material (and what prompted me to work on this) would be one that uses alpha-testing instead of alpha-blending: that is, it has a texture with a 1-bit alpha channel and every rendered pixel is either fully opaque or fully transparent. (The image here gives an example - compare the sharp edges of the tree on the left, vs the softly faded edges of the tree on the right). That means you avoid all the ordering problems of semi-transparent blending, so performance can be much better. If we could use that for most of the game's vegetation, framerates should improve significantly. The compromise is that artists probably have to be more careful to make it look good - light fluffy branches are generally out, but you can still do things like this/this/this/this/this/this etc (if I'm not mistaken) without alpha-blending, and it's what basically every other game seems to do. As well as materials, we sometimes need to render models in slightly different modes. E.g. if you're constructing a building and dragging the placement preview object around the screen, it looks a lot like the normal rendering of a building, but if you drag it into fog-of-war or shroud-of-darkness then it shouldn't turn grey/black like normal buildings do (it should remain visible and bright red to indicate you can't build there). That would require a change to the shader code used to render that model (to stop it applying the LOS texture), but we currently have no way to implement that other than creating yet another variant of every single material and shader effect. To improve on that, I've been changing the renderer to work more flexibly, and to be more data-driven rather than code-driven. Every model is associated with a single material (via its actor XML file). The material refers to a shader effect, and also defines a set of, uh, "definitions". (Need a better name for them...). E.g. the material XML file might say <material> <shader effect="model"/> <define name="USE_PLAYERCOLOR" value="1"/> </material> The material might include some dynamically-generated definitions too, e.g. "PLACEMENTPREVIEW=1". The C++ rendering code will have its own definitions, e.g. "MODE_SHADOWCAST=1" when it's rendering the shadow map. It will collect all models into about one list, regardless of material. Then, for each model, it combines the mode definitions with the material definitions, and loads the material's shader effect. The "model.xml" shader effect file might say <effect> <technique> <require context="MODE_SHADOWCAST || MODE_SILHOUETTEOCCLUDER"/> <require shaders="glsl"/> <pass shader="glsl/model_solid"/> </technique> <technique> <require shaders="glsl"/> <pass shader="glsl/model_common"/> </technique> </effect> so it will select the "model_solid" shader if one of the relevant modes was defined, else it'll pick the next suitable technique. Then the shader might say ... #ifdef USE_PLAYERCOLOR color *= mix(playerColor, vec3(1.0, 1.0, 1.0), tex.a); #endif ... which is depending on the USE_PLAYERCOLOR defined by the material. So the renderer loads the appropriate technique for each model based on the material and mode. Then it can group together models that use the same shader (to improve performance by minimising state changes) (then it groups by mesh, then by texture), and render them all. There's lots of caching so that loading shaders for every model for every mode, every frame, has a very small cost. It's not perfectly fast but it seems no worse (and sometimes better) than the old renderer implementation, and it allows much more flexibility, which is nice. Still to do: Clean up the code; merge with the non-shader-based code as much as possible; add the new materials (at least alpha-testing); document stuff; then it should probably be alright.
  23. Prop polygons are cheaper than skinned polygons (since we don't have to run the skinning code for them), so we might need to be a bit careful here. It may help if someone could create some animated skinned demo units with varying triangle counts - the current dude.pmd looks like 246 triangles, so a range of like roughly 500 and 2000 and 5000 could be useful for testing. They don't need to look any good (they can be made with automatic smoothing or tessellation etc) and the animation doesn't need to be any good, but they should have a realistic number of bones (which should be as few as possible) and should be realistically skinned (in terms of number of bones influencing each vertex, in particular), so we can stick a few hundred units on screen and see at what point performance becomes a real problem on modern hardware. (Incidentally, it's best if new meshes are fully closed (they don't have any holes in e.g. their necks or the bottoms of their feet to save a few polygons - I think some of the current ones do that), since closed meshes can help with some stuff like shadowing.)
  24. We already did Alpha 1 Argonaut, so I think we should avoid Jason as being too similar. (Also we don't need to honour Wijitmaker that many times, he might get big-headed )
×
×
  • Create New...