Jump to content

Ykkrosh

WFG Retired
  • Posts

    4.928
  • Joined

  • Last visited

  • Days Won

    6

Everything posted by Ykkrosh

  1. If someone did want to start over, the first step would be to get a solid understanding of the current implementations, to learn from their technical requirements and their qualities and their mistakes. After that, finishing the current WIP implementation would probably not feel like so much more work than starting from scratch (I'd still like to finish my WIP stuff, and I'm trying to make more time for 0 A.D. stuff recently, but I'm hopeless at committing to any schedules or anything.) Ignoring AI players (which do silly things like pathfinding in JS), I don't remember seeing JS as a particularly significant cost in profiles - usually something on the order of 10%. (I could be misremembering - better profiling data would be helpful). In that case it's still nice to make it faster, but making it infinitely fast would only give ~10% framerate increase, and the C++ bottlenecks will still need fixing either way. Isn't that table showing that 0.56% of the total per-frame time is in CTextRenderer::Render? That sounds plenty fast enough already to me If I run with vsync disabled, I see CTextRenderer::Render going up to 6%, but then the menu is rendering at 1000fps so that's not really a problem. Anyway, I definitely like profiler usage, and the MSVC one is okay - just need to be careful to apply it to the areas that really do need optimisation and not get distracted by inelegant code that doesn't actually matter in practice Rendering runs of text to textures at runtime (vs the current approach which uses FreeType and Cairo to render all the individual glyphs to a texture then draws a quad per glyph at runtime - see source/tools/fontbuilder2) would probably be nice for a few reasons - mainly nicer English text (proper kerning etc) and much better support for non-English language (no need to pre-render every glyph from every possible language in every font/size/weight, which is a lot of texture memory even for non-CJK languages; proper support for combining characters and ligatures and other substitutions that some scripts depend on; etc). It'd probably be ideal to use Pango for text layout, since that'll deal with the i18n issues. One slight complication is that some of our fonts are drawn as a thick black outline with a white fill on top, and we'd probably want to continue supporting that kind of effect - I'm not sure if we could/should just embed Cairo and use it for font rasterisation like the offline fontbuilder2 does.
  2. There are quite a few that have been noticed years ago and still not fixed fixed::fromInt(1) gets compiled to something like "mov ecx, 10000h", so it has no cost. fixed::fromFloat is more expensive but is used very rarely (CCmpRallyPointRenderer is the only non-trivial one I can see (and that looks like it would probably be better if it used fixed everywhere instead of mixing it with float and converting back and forth)). Most of the other operations on fixed get inlined into the equivalent operation on ints. What situation did you measure the 20% performance improvement in? If I run e.g. Combat Demo (Huge) and start a fight for a couple of minutes, then the profiler indicates about 35% of the total runtime is in CFixedVector2D::CompareLength, which is called by CCmpPathfinder::ComputeShortestPath calling std::sort(edgesAA...). Almost all the time in CompareLength is doing lots of expensive-on-x86 64-bit multiplies, so that's the kind of thing that might well be made a lot faster by using floats (though I'd guess it could also be made quite a bit faster while sticking with ints, with some x86-specific code to do a 32x32->64 bit mul or with SSE2 or something). But the real problem here is that the short-range pathfinder is terribly unscalable - it needs a different algorithm, which'll mean it won't do a crazy number of std:sort(edgesAA...), and then the performance of CFixedVector2D::CompareLength will hardly matter. Were you measuring that performance issue or something else? The point is that you can't just add a compiler option like /fp:precise and get the same FP results on all platforms. I think it's unreasonable to require the game to always be built with SSE2 on x86 (particularly given how hard it is to control the build flags used by Linux distros), but then anyone running an x87 FP build of the game will eventually get out-of-sync errors. The way to avoid all those problems is to not use floats. (I think ARM should generally give the same results as x86 SSE2 (ignoring how compilers rearrange expressions), since its VFP does 32-bit/64-bit operations - the problem is just x87 which does 80-bit operations without rounding.)
  3. Fundamentally redesigning the entirely multiplayer architecture isn't something that should be taken so lightly Where precisely did that 20% come from? I'd expect it to be one or two call sites that are doing some relatively expensive maths, and it would be far easier to find and focus on optimising those areas than to make changes across the entire codebase with unpredictably harmful side-effects. #include <stdio.h> #include <float.h> static float x = 1.0f; static float y = FLT_EPSILON/2.0f; int main() { float t = x + y - x; printf("%g\n", t); } > cl test.cpp /O2 /fp:precise /arch:SSE2 > test 0 > cl test.cpp /O2 /fp:precise /arch:SSE > test 5.96046e-008 (MSVC 2012, x86) I don't mean a debug build - I mean a release build that you are running in a debugger (e.g. when you press F5 in Visual Studio). (If I remember correctly, it's any time IsDebuggerPresent() returns true when the CreateProcess() is called, unless you set an environment variable _NO_DEBUG_HEAP=1.) (That caused me quite a bit of confusion when first trying to measure and optimise stuff ) It takes about 60 bytes per character just for the GL vertex arrays to draw the text, so the sizeof(wchar_t) bytes per character plus overhead per string to store all the currently-displayed text won't be a significant part of the total cache usage either. I mean the old one that's currently in the game. (The new not-currently-in-progress one uses the same short-range pathfinder, it just makes the long-range tile-based pathfinder faster and fixes some bugs due to mismatches between the two.)
  4. That'll cause OOS in multiplayer, because floats aren't consistent between different builds of the game. That's almost the whole point of using CFixed. (The rest of the point is that floats cause nasty imprecision issues, so you might find that e.g. pathfinding works okay at the bottom-left corner of the map but not at the top-right because the coordinates are larger so the resolution is coarser. That's usually a very hard problem to reason about, whereas using fixed-point numbers makes it a non-issue). And they're just ints, so almost all operations on them will be at least as fast as the same operations on floats. I think /fp:precise doesn't solve this, because the MSVC documentation says the compiler can still do intermediate calculations with 80-bit precision and only round at the end, which means it can give different answers to platforms that do 64-bit precision throughout (e.g. SSE2, or a debug build that spills things onto the stack). Also various library functions (sqrt, sin, etc) differ between platforms. Also requiring /fp:precise globally would make our graphics code (which doesn't really care about precise floats) slower. (And using different compiler options for different modules of the code would be insane and would probably break LTCG etc.) If I remember correctly, one of the parts of CFixed that was actually slow was CFixed::Sqrt, since that's hard to do with ints. I tried changing the implementation to use floating-point sqrt instead, which made it much faster and appeared to give the same output in all the cases I tested, but I never quite trusted it enough to be fully portable. It probably wouldn't be infeasible to exhaustively test it on every interesting platform to see if it's safe enough, so that might be a nice thing to do. (I don't remember what was actually calling Sqrt though - it might have just been the pathfinder computing Euclidean distances, and I think the new pathfinder design avoids that by computing octile distances instead.) Have you done any profiling to see which of the CFixed operations are actually causing a performance problem? Have you done any profiling of this? (Note that if you're running in MSVC you have to be careful to disable the debug heap (even in release builds) else a lot of the STL usage will be massively slower than when run outside the debugger). When I implemented this, I think I measured the per-frame overhead as pretty insignificant (and it allowed a lot of flexibility with changing materials and effects and lighting etc at runtime, without the hassle and bug-proneness of invalidating lots of cached data). It'd be nice to have numbers of how much it currently costs, to see what the potential benefits of improving it would be. We've solved that by caching the parsed XML in a binary format (XMB) that takes no time to load, and release packages have all of the files combined into a zip so there is no filesystem overhead. Players have got at least half a gigabyte of RAM - why is 48KB worth any effort? I agree that's a problem - QueryInterface is a bit slow, and BroadcastMessage likely has terrible cache usage patterns. Is your suggestion to create a new class for every type of entity (UnitEntity, InfantryUnitEntity, MeleeInfantryUnitEntity, SpearmanMeleeInfantryUnitEntity, etc)? That's the kind of inflexible hard-coded hierarchy that component-based designs are intentionally trying to get away from. There's a load of documentation from game developers over the past several years about how the component systems make their lives much better. When an entity is accessing its own components, maybe a better way to minimise the QueryInterface cost without significantly changing the design would be to let a component cache the returned IComponent* the first time it does QueryInterface (after instantation/deserialization) and use the pointer directly after that, which is safe since components can't be deleted dynamically unless the whole entity is deleted. When accessing components of a different entity you've still got to do a lookup on the entity ID though. (Storing pointers to other entities would be bad because it makes deserialization hard and makes dangling pointer bugs much more likely). Again it'd be useful to do some profiling to see which uses of QueryInterface have a significant cost - I'd suspect it's probably a small number that would be easy to fix without redesigning anything much. It does that already (where "inaccurate" is "tile-based and ignoring mobile entities"). The different modules store different data and have different access patterns - e.g. collision uses obstruction shapes and needs to do queries against typically short lines, whereas range tests use the centres of entities and need to do queries against huge circles with filtering on player ID. (And the renderer needs 3D bounding boxes of models for frustum culling, which is conceptually similar but practically very different again.) I do agree the current subdivision scheme is naive and could do with improving, but I don't think trying to use a single instance for all the different modules would help. I don't think the subdivision-data-structure queries are currently the slow part of the range tests - even if the queries were instant, they might return several hundred entities and just trying to figure out which ones have entered or left each other entity's range is too expensive. Usually only a very small proportion have recently entered/left the range, so ideally we wouldn't have to even look at the majority of entities that have remained clearly inside/outside. I don't know how best to do that, though. I'd prefer to leave it in JS and redesign it . (I have no idea what design it should have, but its current design is certainly not great). It needs to access a load of other components that are only implemented in JS and don't have C++ interfaces yet, so moving it to C++ would be a pain and would reduce the engine's flexibility, plus C++ is just generally more of a pain than JS to write in. I don't think I've ever noticed UnitAI performance as a serious problem - it doesn't run hugely frequently (given that it's all event-driven) and doesn't do any expensive computation, and we should be able to cope happily a few thousand JS functions calls per second. Is there some profiling data showing which parts of it are a problem?
  5. Yeah, I think it's valuable to have a single step for "make my game up-to-date", since it saves effort and reduces the risk of people reporting bugs due to mismatched file versions. Also it's valuable for programmers to be able to jump to a certain point in history and get the matching source and executables, to debug minidumps from crashes reported by users. That probably could be done with some custom tools on top of Git, but someone would need to make those tools. (There's other difficulties with moving to Git, like dealing with non-public files (data/mods/internal/ etc) and whatever else I've complained about before, so I think more work is needed before migrating would be a net improvement.)
  6. Supporting orders while paused introduces some non-trivial implementation complexity - normally when you click a button in the GUI, the effect doesn't occur until maybe a quarter of a second later (to be compatible with multiplayer where we need to allow time for the message to propagate to all the players), so if the game is paused then you won't see anything happen at all when you click on GUI buttons, which is unhelpful and confusing. We'd need to change it so the GUI predicts what the game state is going to become (e.g. if you click to train a unit, it needs to predict that unit will be added to the training queue and predict the reduction in resources, and then somehow deal with unexpected situations like the building being destroyed before it's really started training the unit). It shouldn't be impossible to make it work well, but I don't think it would be easy. It's definitely a feature that I consider very useful as a player, though - I only ever do single-player, not multiplayer, so I don't care about being good at the latter, and I like to have time to think while playing.
  7. Yeah, should just delete it. Some changes I made recently to the renderer had the consequence of removing the "old" lighting model and forcing everything to "standard", which allows much brighter lighting; only a few maps were still using "old", and they should either be deleted or have their lighting parameters fixed. (Look for <LightingModel>old</LightingModel> in the map XML file.)
  8. Not sure what you mean by "dies out on this line", since that line looks incapable of failing by itself . Do you mean the "ENSURE(SUCCEEDED(ret))" inside GetFolderPath was failing? (If so, what value was 'ret'?) I don't think that should matter - it should be supported forever for compatibility (since it's a very widely used API), and nobody else has reported similar errors on Vista/Win7, as far as I'm aware.
  9. Terrain already casts shadows. (There are some known bugs with frustum culling and shadow-map bounds that make it break if you don't have some objects near the edges of the screen to stretch the frustums out, though.)
  10. Can you run it in the VS debugger (in debug mode), and select "break" when it brings up the continue/suppress/break/etc dialog box, and then check the value of the "path" argument in the call to CreateDirectories?
  11. Easiest way is to rename the .c file to .cpp; put both .cpp and .h into source/graphics/; re-run update-workspaces (which will add the new source file into the project); add "#include "precompiled.h"" into the top of the .cpp file before any other non-comment code; then use it by doing "#include "graphics/mikktspace.h"" wherever you want it.
  12. That's what the current implementation does. It's only 'realistic' if you imagine you're in a (silent) helicopter high above the ground watching through binoculars, i.e. not very realistic at all . (And it's not good for playability since off-screen objects will usually sound about the same as on-screen objects, which is confusing and unhelpful and unintuitive.)
  13. Yeah, you shouldn't include the ":20595" when trying to connect from inside the game. I don't know whether it breaks things or will just be ignored, but it definitely won't help. One possible issue to be careful about with port forwarding is that you need UDP port 20595, not TCP port 20595. (I think some routers let you select one or the other or both.)
  14. Did you get any visible error messages where you had to click "continue"/"suppress", before it finally crashed? Both cases have "Access violation writing location 0x00000004" with call stacks like > pyrogenesis.exe!std::_Tree<std::_Tmap_traits<JSObject *,unsigned int,std::less<JSObject *>,ProxyAllocator<std::pair<JSObject * const,unsigned int>,Allocators::Arena<Allocators::Storage_Fixed<Allocator_Aligned<16> > > >,0> >::_Buynode() Line 1390 + 0x8 bytes C++ pyrogenesis.exe!CBinarySerializerScriptImpl::CBinarySerializerScriptImpl(ScriptInterface & scriptInterface, ISerializer & serializer) Line 33 + 0x75 bytes C++ pyrogenesis.exe!CBinarySerializer<CStdSerializerImpl>::CBinarySerializer<CStdSerializerImpl><std::basic_ostream<char,std::char_traits<char> > >(ScriptInterface & scriptInterface, std::basic_ostream<char,std::char_traits<char> > & a) Line 100 + 0x65 bytes C++ pyrogenesis.exe!CStdSerializer::CStdSerializer(ScriptInterface & scriptInterface, std::basic_ostream<char,std::char_traits<char> > & stream) Line 29 + 0x14 bytes C++ pyrogenesis.exe!CAIWorker::Serialize(std::basic_ostream<char,std::char_traits<char> > & stream, bool isDebug) Line 374 C++ pyrogenesis.exe!CCmpAIManager::Serialize(ISerializer & serialize) Line 557 + 0x26 bytes C++ pyrogenesis.exe!CComponentManager::ComputeStateHash(std::basic_string<char,std::char_traits<char>,std::allocator<char> > & outHash, bool quick) Line 140 C++ pyrogenesis.exe!CNetClientTurnManager::NotifyFinishedUpdate(unsigned int turn) Line 409 C++ pyrogenesis.exe!CNetTurnManager::Update(float frameLength, unsigned int maxTurns) Line 176 C++ pyrogenesis.exe!CGame::Update(double deltaTime, bool doInterpolate) Line 288 + 0x15 bytes C++ pyrogenesis.exe!Frame() Line 377 C++ pyrogenesis.exe!RunGameOrAtlas(int argc, const char * * argv) Line 529 + 0x5 bytes C++ pyrogenesis.exe!main(int argc, char * * argv) Line 572 + 0xf bytes C++ pyrogenesis.exe!wmain(int argc, wchar_t * * argv) Line 380 + 0xb bytes C++ pyrogenesis.exe!__tmainCRTStartup() Line 583 + 0x17 bytes C pyrogenesis.exe!CallStartupWithinTryBlock() Line 397 C++ kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes which I expect means the arena allocator failed to allocate its arena and returned a null pointer. Haven't looked into why that might happen.
  15. In particular, pretty much all users without that extension are on "GDI Generic", which is Windows' rubbish software fallback implementation when you don't have any working hardware-accelerated OpenGL drivers at all. We probably ought to detect that case specially and show users a nice error message telling them to install real drivers. (Added a ticket.)
  16. The Fedora packages had failed to build, so you got the old executable with the new 0ad-data package. They've rebuilt successfully now so you should be able to update to the r11863 version of the 0ad package, which ought to work better
  17. No - for any changes inside binaries/data/ (like this), you need to either rebuild and re-upload public.zip, or else just copy the single modified file into /sdcard/0ad/data/mods/public/shaders/glsl/ (so it'll override the file from the old public.zip).
  18. Top-left icon in the WYSIWYG editor ("Toggle editing mode"). It defaults to the mobile theme now (at least in Opera Mobile), with a "full version" link at the bottom of the page, but once you switch to the full version there's no way to go back to the mobile version, as far as I can see.
  19. Thanks, that's much better . (Would anybody fancy changing the font in the non-WYSIWYG post editing box too? )
  20. To be clear/pedantic about terminology: Ambient occlusion (as discussed by e.g. GPU Gems in ~2004) is the general concept of computing the ambient lighting at a point based on how much of the sky is visible from that point, which involves casting rays outwards from each point and seeing if they reach the sky or hit some geometry. You can store that data either per vertex (if your triangles are high-res enough) or with a texture map. In contrast, SSAO (screen-space ambient occlusion) is a more recent (~2007) approximation of AO that doesn't use the real geometry (it sort of reconstructs an approximation of the 3D geometry based on just the 2D depth buffer after you've rendered everything). So if we're implementing AO with textures (like what Blender does), it's not SSAO, it's just plain old AO. </pedanticness> To fit with our current art pipeline, any AO texture computations really ought to be done by the game engine (along with the .dae->.pmd conversion etc), not by Blender. Redesigning the art pipeline is not impossible, but wouldn't be much fun (we probably don't want to batch-convert in Blender on every SVN user's machine since that seems slow and fragile, and quite awkward for modders, but then we'd have to find some way to store and distribute and update the converted files without it getting too painful, and AO computation needs some integration with the actor/prop system, etc). That GPU Gems chapter suggests how to implement the AO algorithm and it's not particularly complex, so I don't see why the engine couldn't do it in principle. So, it should be possible
  21. You need to edit binaries/data/mods/public/shaders/glsl/model_common.vs and change the line "uniform vec3 sunColor;" to "uniform mediump vec3 sunColor;". (I suppose that to be able to share shaders between desktop GLSL 1.20 and GLSL ES 1.00 properly, we should globally define lowp/mediump/highp to the empty string in desktop GLSL and then the shaders can use them whenever required by ES.)
  22. Which drivers? The shadow map texture is set up with GL_INTENSITY so it should have the same value in all .rgba components, so if .b works and .a doesn't then that sounds like a driver bug.
  23. Not currently, but it's easy to implement multiple UV coordinates per mesh. I don't know how hard it is to create the new set of UVs though, either manually or with some automated unwrapping tool. (It mustn't use the same UV value for multiple points on the mesh (so it can't be the same as we currently use for the diffuse texture), and it ought to be biased to give more texels in the areas of highest-frequency lighting variation.) The colour of a pixel is simply the AO texture value plus the diffuse lighting factor, all multiplied by the diffuse texture. (...and modified by specular and shadows etc) The idea is that it should be equivalent to baking AO into the diffuse texture in Blender, except that we combine the AO and diffuse components in the engine's renderer instead of when exporting from Blender. The advantage would be that we can still share the high-res diffuse textures between multiple buildings (minimising memory usage and download size etc), while having a unique low-resolution lighting texture per building (or even per combination of randomised props on a building), and without making life hard for inexperienced artists/modders or for those who don't use Blender. The advantage compared to SSAO would be that it's computationally cheaper (it should be usable on even the lowest-end hardware), and (as far as I'm aware, not having actually tested any of this in practice) it should give a higher-quality appearance (since SSAO is fundamentally a total hack and can suffer from ugly artifacts). I think the main disadvantage compared to SSAO is that it's much more work to implement, but probably not infeasibly so. The effect of buildings occluding nearby terrain could probably be approximated adequately by having a decal underneath the building, which just darkens the terrain around the building.
  24. Roughly, yes. To be more precise: ShaderModelVertexRenderer (in HWLightingModelRenderer.h) is used for: * All models when using the 'fixed' renderpath (fixed-function pipeline, no programmable shaders, CPU lighting). * All skinned models when using the 'shader' renderpath and not using the GPU skinning option. InstancingModelRenderer (in InstancingModelRenderer.h) is used for: * All unskinned models when using the 'shader' renderpath. * All skinned models when using the 'shader' renderpath and using the GPU skinning option. ShaderModelRenderer (in ModelRenderer.h) and ShaderRenderModifier (in RenderModifiers.h) are used for all models, to do the batching and the per-batch shader setup. The GPU skinning option requires GLSL, and is highly experimental and broken and nobody should ever use it. (I just added it to do performance comparisons, and it mostly lost). The class names and filenames are very misleading so ignore the words like "Shader" and "HWLighting". (They really need renaming.)
  25. The important parts are: E/SDL ( 3152): No EGL config available W/pyrogenesis( 3152): ERROR: SetVideoMode failed in SDL_GL_CreateContext: 800x480:32 1 ("Couldn't create OpenGL context - see Android log for details") which means it's trying to load an unsupported graphics configuration. See the "int[] configSpec" in build/android/sdl-project/src/org/libsdl/app/SDLActivity.java . Probably the most likely problem is the "EGL10.EGL_SAMPLES, 4," line (which enables 4x MSAA) - comment that out and try again. (Need to do a "make clean" and "make" in sdl-project and then rebuild and reinstall the .apk, I think). If that doesn't help, maybe try changing the RED/GREEN/BLUE_SIZE - currently they're 8,8,8, and it might be happier with 5,6,5.
×
×
  • Create New...