Jump to content

Ykkrosh

WFG Retired
  • Posts

    4.928
  • Joined

  • Last visited

  • Days Won

    6

Posts posted by Ykkrosh

  1. It is a bit silly that our terrain has a horizontal resolution of 4m and a vertical resolution of 1.3mm. We could probably reduce vertical resolution by some factor like 8 or 32 or something (with a corresponding increase in vertical range) without anybody noticing the loss of precision, and it should be easy to make that change without breaking compatibility with the existing maps (we can just divide their heightmaps when loading them).

  2. Most of the information in the wiki is wrong and should be ignored :)

    The only thing the engine technically requires is OpenGL ES 2.0, with drivers that don't have bugs we can't work around. It worked okay on my Qualcomm-chipset Nexus 7 so at least some version of their drivers is okay. The bigger problems are performance (usually terrible), and input (especially on small screens), and the unsuitability of the gameplay to that kind of device, but they're not device-specific problems.

  3. I think that's entirely OT since it's a GLX extension, and Android and RPi use EGL instead of GLX :). But I added it anyway - it adds some fields to hwdetect like

     "GLX_RENDERER_VENDOR_ID_MESA": 32902, "GLX_RENDERER_DEVICE_ID_MESA": 10818, "GLX_RENDERER_VERSION_MESA[0]": 10, "GLX_RENDERER_VERSION_MESA[1]": 0, "GLX_RENDERER_VERSION_MESA[2]": 0, "GLX_RENDERER_ACCELERATED_MESA": 1, "GLX_RENDERER_VIDEO_MEMORY_MESA": 1536, "GLX_RENDERER_UNIFIED_MEMORY_ARCHITECTURE_MESA": 1, "GLX_RENDERER_PREFERRED_PROFILE_MESA": 1, "GLX_RENDERER_OPENGL_CORE_PROFILE_VERSION_MESA[0]": 0, "GLX_RENDERER_OPENGL_CORE_PROFILE_VERSION_MESA[1]": 0, "GLX_RENDERER_OPENGL_COMPATIBILITY_PROFILE_VERSION_MESA[0]": 2, "GLX_RENDERER_OPENGL_COMPATIBILITY_PROFILE_VERSION_MESA[1]": 1, "GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA[0]": 1, "GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA[1]": 1, "GLX_RENDERER_OPENGL_ES2_PROFILE_VERSION_MESA[0]": 2, "GLX_RENDERER_OPENGL_ES2_PROFILE_VERSION_MESA[1]": 0, "GLX_RENDERER_VENDOR_ID_MESA.string": "Intel Open Source Technology Center", "GLX_RENDERER_DEVICE_ID_MESA.string": "Mesa DRI Mobile Intel® GM45 Express Chipset ",
    (It sounds like Mesa 10.0 is going to ship with that extension in a couple of weeks, so presumably they're unlikely to change it incompatibly, and if they do then we'll still have time to fix our code before our next alpha release.)
  4. I added a bit to the F11 profiler's renderer stats to show texture memory usage, in case that helps identify out-of-VRAM problems. (It's only an approximation since it doesn't count sky textures (which don't use CTextureManager (but probably should)), and doesn't count e.g. padding added by the GPU for alignment, but it does handle mipmaps and compressed textures so it shouldn't be too wrong.)

  5. It sounds like the problem is with clients trying to connect to the server's TCP port 5222, behind a firewall or proxy that doesn't allow arbitrary outgoing TCP connections. NAT hole punching can't help with that.

    Gloox apparently supports BOSH, which is an HTTP-based protocol and is more likely to be able to pass through HTTP proxies successfully that the standard protocol - it may be best to set up the server to support BOSH on port 80, and use that as a fallback when the standard connection fails.

  6. (It would be nice to implement that in the game itself, eg. resizing the textures before loading the map, instead of resizing them manually)

    All non-GUI textures are using mipmaps, so "resizing" just involves ignoring some of the high-res mipmap levels and using the low-res ones. The game already does that for textures larger than the GL implementation supports - see get_mipmaps in source/lib/res/graphics/ogl_tex.cpp. There's also a OGL_TEX_HALF_RES flag that can halve the resolution again, though that's not a great API - might be better to add an ogl_tex_set_max_size(Handle, int), and then CTextureManager can be smart about what max size it picks (e.g. we might want separate controls for terrain texture resolution and unit texture resolution).

    Also I think we should simplify the menu's (without backgroundanimations, just a wallpaper.)

    Yeah, the multiple huge overlapping alpha-blended textures on the menu screen are fairly terrible for very low-end GPUs - an option for a single lower-res non-blended background texture might be nice.

    Another problem on Android/RPi is the lack of texture compression - ETC1 is the only widely-supported format with OpenGL ES 2.0 (and it's the only one on RPi), while our game only supports S3TC, so we have to decompress everything from 4/8 bits per pixel to 24/32bpp, which obviously uses a load more VRAM. In theory we could add support for ETC1, but it doesn't support alpha channels so we'd have to split RGBA textures into compressed RGB + uncompressed(?) A, and update all the shaders to do two texture loads, which is probably a pain. GLES 3.0 requires ETC2/EAC, which I think should be much less painful, and ASTC may become widespread in the future, so those are probably the more interesting long-term targets.

  7. If I have time, I might look into improving Pi support in SDL2, so I can use arbitrary resolutions, toggle fullscreen mode, etc.

    How would non-fullscreen mode work? As far as I can see, SDL doesn't talk to any kind of window manager at all, it just uses dispmanx to set up a new EGL surface as being drawn on top of everything else (X11, console, etc). You could easily change that surface to be not fullscreen, but it would have no window decoration (since there's no window) and it'd be permanently on top of everything else (you couldn't switch to another window).

    I suppose maybe you could make a Wayland backend for SDL work on RPi, so that it does run in a proper window - do you mean something like that? Sounds non-trivial but would be nice :)

    Scaling a low-res EGL surface to desktop resolution would hopefully work with just changing src_rect/dst_rect in SDL_rpivideo.c, but I guess you'd need to do an inverse mapping of mouse coordinates, so maybe that'd get a bit messier than I thought :(

  8. That's quite cool :)

    It's also possible NVTT wasn't working 100% correctly as there were a lot of GL errors being logged.

    NVTT doesn't touch OpenGL at all, so I think it's unlikely those errors are related. (I got GL errors running on Android too, I think because of invalid enums (but I didn't try to check exactly where).)

    There are either some software bugs, it runs out of memory, or the hardware/power supply is being overstressed.

    I think you can check if it's running out of graphics memory by doing something like "vcgencmd cache_flush; vcdbg reloc" which'll show all the GL textures and how much is free. Running out of graphics memory will probably make VideoCore go really really slow as it tries to shuffle things around in memory to free up some contiguous space, and then in theory it should fail and return an error but in practice it might just randomly corrupt memory since I don't think the RPi GL drivers have had much extensive testing. Either way, you probably want to avoid that situation.

    It's hard to troubleshoot because whatever is going wrong kills the network connection (so I can't SSH into it) and locks up input, so I can't close the game window.

    Maybe try using a serial console? That should work as long as the ARM hasn't totally locked up.

    Sadly I don't have a smartphone with nice CPU and GPU, that would be a much better test platform, but the nice thing is I can develop, test and debug much of the same code on a $50 Pi, that would be reused by a $400 smartphone or tablet.

    The RPi's GPU is roughly equivalent in power to the Galaxy S II that I started the Android port on, which had nearly bearable performance :). But I guess you may be running at 1920x1080 while I was running at 800x480 (does SDL let you change the fullscreen resolution? (I think the hardware ought to be able to do arbitrary scaling for free)), and the CPU is massively slower though :(
  9. $ ./test Running 286 tests...[...]...OK!
    That looks like more than 64 to me :P

    It's still true that's not really a lot, though, and more would be nice. We're using CxxTest which I think works fine, with some custom code to run a few mostly useless tests on simulation component scripts.

    I think the main reason for a lack of unit tests is that (as far as I'm aware) we don't have many bugs that unit tests would find - the usual bugs nowadays seem to be undesired gameplay behaviour, or problems with the interactions between gameplay components, and I don't think it's feasible to find them through automated testing. Unit tests aren't an end goal in themselves - the goal is to find existing bugs and future regressions - so there's not much motivation to write tests for code that already seems to work fine.

    Another problem is that a lot of the engine was written before we had any interest in unit testing at all, and it was written in a way that makes unit testing hard (e.g. modules that have global state and lots of dependencies on other modules, so it's impossible to test part of it in isolation). Rearchitecting that code to be testable would take a lot of effort and probably introduce plenty of bugs, so it doesn't really seem worthwhile. So we have some unit tests for newer things like simulation2 that were designed with that in mind, but none for older things like the GUI engine since it's all tightly-coupled code.

  10. Also, there's a non-graphical replay possible, that runs at full speed (without rendering, just using the CPU to calculate the simulation as fast as possible), but I don't know of an option to allow this for live games (non-replays).

    You can start a game manually to generate a commands.txt, then open it in a text editor and follow the obvious format to extend it by thousands of turns, then run the replay command with that file, if you want to run the game non-visually for an arbitrary game length. You'd need to add some extra code to let you detect the result of the game, though. You could then run multiple instances of the game in parallel, to make use of all your CPU cores.
  11. Given that the megapatch is currently almost entirely unusable (I started trying to split it up here, but nearly every change it makes is adding bugs and replacing working code with more flawed designs - there are some real improvements in there (our existing code is certainly flawed too) but those improvements all still need a lot of fixing and are mixed up with all the other changes, so it generally takes more effort to extract and fix them than to just rewrite them from scratch), hopefully there's also a plan to make sure any future work is focused in a more productive direction? :)

  12. That's the fancy-water GLSL fragment shader, and the error message is coming from the Nouveau drivers. Not a very informative error message, though... It's not a complicated shader so a GeForce 7 ought to be able to handle it fine, as far as I can see. It might be best to report the error to the Nouveau developers, to see if they can work out why it's failing. (But update to the latest drivers first, in case it's already fixed.)

    You can disable the fancy water effect by putting

    fancywater=false
    in ~/.config/0ad/config/local.cfg which ought to work around that error.
  13. - The old pathfinding spends 35% of its time in the isqrt function. That function itself can be made more efficient with some lookup tables and the goldschmidt algorithm but the best solution is to replace the CFixedVector2D::Length() function by an approximation. The one I use for my algorithm basically uses 2-3 multiplications and one division for an approximation that has an maximal relative error of 0.3%. It is worth considering to change to this approximation for all calculations.

    I think the best solution is to avoid calling CompareLength at all. Euclidean distance is actually a pretty bad heuristic for grid-based A*, because the paths consist of only diagonal/horizontal/vertical movements and are significantly longer than the Euclidean distance, so A* expands more nodes than it should. My newer JPS A* thing computes the heuristic as octile distance (i.e. sqrt(2)*numDiagSteps + numHorizVertSteps) which is more accurate and cheaper.

    - fixed::Pi(), fixed::Zero(), fixed::Epsilon() create new objects and perform two deep copies per call. My algorithm actually spends 4% of its time inside fixed::Pi() (That is more time than I need for all of my euclidian distance approximations :D). All three functions should be replaced by constants. (Is there a reason the return value of those functions is not 'const CFixed &' but rather 'CFixed' ??)

    As far as the compiler is concerned, fixed is just an int - there should be no more cost to constructing the object or copying it than with any other int (assuming optimisations are turned on). If you're calling Pi() a lot, you might just want to define it inline in the .h file, so the compiler can recognise it's a constant int and remove the function call cost.
  14. I think the main problem is that players will probably expect to get some feedback from the GUI when issuing commands while paused. E.g. if you click to train a dozen units, you'd expect to see your resource counters go down, so you can tell when you've spent all resources. We don't currently have a good way to implement that - your commands don't get executed until the next simulation turn (so they'll never get executed while the game is paused), and the GUI just reflects the current simulation state, so we can't update the resource counters while paused. I'm not sure how we could cleanly fix that, and it's easier to just prevent the player issuing commands while paused.

    But there's no technical problem other than the GUI - you can open the console (F9) and enter Engine.SetSimRate(0) to pause the game and close the console and issue some commands then Engine.SetSimRate(1) to resume, to see what the behaviour is like.

    • Like 3
  15. Traversing one polygon is not much more complex than expanding the neighbors of a cell in a grid

    Hmm, I'm not sure the constant overheads are entirely negligible. E.g. if you're doing a Starcraft 2 style triangular navmesh, each triangular cell has 3 neighbours, and you might store those as 3 pointers, which is 24 bytes per cell (on x86_64), whereas square grids require 0 bytes to represent neighbours (you just compute them with pointer arithmetic), so it'll use up some more cache space. Or e.g. you probably want to store the unexplored/open/closed status for each cell, and you need to reset them all to 'unexplored' before starting a new path computation - if you've got a square grid then you can have a separate array of those status flags and initialise it to 0 easily, whereas I think it's trickier to do it efficiently if you've got an arbitrary graph of cells. (That initialisation can be a non-trivial cost if you're doing a lot of very short paths). So I think there's some extra cost per cell and the question is how well the reduction in cell count outweighs that.

    ...I'm not really convincing myself that those are insurmountable problems, though :)

    • Like 1
  16. So I'm a bit confused now. I guess entities have an extent, but the RangeManager ignores it (which is what I meant to ask)? Does the range vary very much or is it quite uniform?

    Entities have positions, and they have obstruction shapes (units are small axis-aligned squares, buildings are typically large non-aligned rectangles; I think a few buildings are multiple rectangles) centered on those positions. Obstruction shapes should (usually) never overlap other obstruction shapes, and should never overlap an impassable terrain tile.

    RangeManager only cares about the distance between entity positions - it ignores obstruction shapes entirely. (That is arguably a bug, since it means units behave badly around very large buildings (e.g. pyramids), but I think it's an acceptable simplification in most situations and there are probably ways to minimise the problems with very large buildings (like splitting them into multiple smaller obstructions).)

    Only the pathfinders and ObstructionManager care about obstruction shapes.

    (Then units also have footprint shapes (rectangles or circles, used for rendering the selection outlines), and 3D graphical bounding boxes, and 3D selection boxes. And the new WIP pathfinder gives them clearance values too, which determine how close they can get to impassable terrain or to buildings, instead of using the obstruction shape for that. But that's all irrelevant for RangeManager.)

    Ah, yes, I didn't see the static initialization.

    (Which is not threadsafe, by the way :))

    If we take the AI again as an example, it could already crunch away while the engine is busy with other stuff.

    That's how CCmpAIManager is designed already - StartComputation() is called at the end of one turn, and takes a snapshot of the current simulation state, then PushCommands() is called at the start of the next turn to push the AI's commands onto the command queue. The implementation doesn't actually use threads yet, so it does all the computation synchronously in the PushCommands call, but that's just a matter of writing code.

  17. Perhaps we could use regular grid subdivision and implement SAP in every subdivision grid? That would eliminate a lot of X-axis alignments for sure.

    I'd suggest giving up on SAP, because it seems like a completely unhelpful algorithm for this problem :). There may be some things it's good at, but it sounds like this isn't one of them, and trying to twist it to work non-terribly is unlikely to result in something that actually works well.

    The current implementation in 0AD SVN would place entity B into both grid tile 1 and grid tile 2. This means we have to sort the results and call std::unique to remove duplicate entities.

    CCmpRangeManager deals with points rather than bounding boxes, so it only has to put entities into a single cell (except for the rare cases where an entity is precisely on the edge between cells, which we could avoid by being more careful about the edge conditions). We have to sort the results so that we can do std::set_difference to see which entities have entered/left the range (there's no way to avoid putting an O(n log n) operation somewhere, when we're doing set comparison), and once we've sorted them it doesn't cost much to do a std::unique.

    (CCmpObstructionManager does deal with bounding boxes, but that's all tied in with the pathfinder design and needs to be changed anyway.)

    I just had a look at the parallel for patch. Don't you think that is a quite painful way of doing multithreading in games? First, I'd assume it would be easier to use some existing implementation with threadpools etc., like even OpenMP. Secondly, I'd think the easiest (and most common afaik) way of doing it is completely putting one component in a seperate thread, like the AI. That way you don't need to make sure that every called method is threadsafe etc., which will be horrible to maintain over time, at least in my experience.

    Agreed (I said similar things on IRC yesterday) :). Moving self-contained latency-tolerant things like the AI and pathfinder into separate threads should give much greater benefit, with much less pain than will come from uncontrolled access to the entire simulation state from multiple threads.

    1 thread: 24ms/turn

    On Combat Demo Huge? That's a fairly unrealistic extreme case, and if we were running at full speed then 24ms/turn would still only be about 10% of total CPU time. I had assumed it was worse than that...

    • Like 2
  18. Unfortunately it will be necessary to switch to arbitrary polygons (or triangles) to allow buildings and other obstructions to be placed in arbitrary angles relative to the coordinate axes.

    The not-quite-completed new pathfinder design (around #1756) takes completely the opposite approach: it converts all the buildings into square grid cells, and uses that grid for all pathfinding. (And it uses 4x4 cells per terrain tile, to ensure there's enough precision for units to fit through tight gaps without noticeable quantisation issues). I liked that approach since it means there's no difference between terrain and buildings, the lack of angles simplifies a lot of code, and it's (relatively) easy to be really fast at pathfinding over a grid. With navmeshes I assume you have to try hard to minimise the number of cells, e.g. I guess you may need to smooth the terrain-tile passability grid to avoid creating hundreds of cells around the edges of 45-degree rivers etc. It'd be interesting to find out whether it's a feasible approach in practice for the kinds of map layouts we have :)

    Things that can be included very easily are e.g.: keeping track of the connected components in the navigation graph to instantly return from an impossible navigation attempt (trying to find a path to a point that can never be reached is the worst case of A*)

    Yeah, I think that's the biggest reason for slowness in our current pathfinder. The new pathfinder patch fixes that by constructing a reachability graph from the grid representation - if a unit tries to move to an unreachable goal then it'll use the graph to pick the nearest reachable location and pathfind to there instead. (The goal can be a point or a circle or a square, which makes things a little trickier - e.g. it's fine for an archer to attack a ship in deep water, as long as it can find a tile on the shore that's within attack range.)

    check whether it is possible to place a building at a given position (don't know where and how it is done at the moment)...

    (CCmpObstruction::CheckFoundation.)

  19. In the video it looks like it let you place the buildings on really steep places, but I guess is just for demonstration.

    Yeah, that was in Atlas so it ignored the normal restrictions.

    Incidentally, would scenario designers want a feature in Atlas to disable flattening on an individual building so they can have more manual control, rather than it applying universally? Or should flattening be disabled entirely for buildings placed in Atlas?

    Is this something that can be applied to structure templates so that some buildings (notably walls) don't flatten terrain?

    Yes - I'm expecting we'd put a <TerritoryFlattener/> component in template_structure.xml, then <TerritoryFlattener disable=""/> in wall templates etc. (And some templates might want to change other parameters from the default, which should be easy enough. The default will probably be to use the Footprint shape with some standard falloff curve around the edges, but some might increase/reduce the strength of the flattening effect, and sanderd17 suggested some might flatten toward the average slope rather than the average height, and I suppose docks should flatten towards the water height, etc.)

  20. Regarding terrain-flattening: I did a quick thing like here to experiment with it. It's purely visual (it doesn't affect pathfinding) and stateless (it can be recomputed from the sim heightmap plus the current list of entities with TerrainFlattener components, so there's nothing to be serialized in saved games or network-synchronised). Currently it averages the terrain heights in a circle around the building, with a falloff so the vertexes nearer the edges have less weight, and then it moves every vertex in that circle towards that average height (using the same falloff weight, so vertexes nearer the edges move much less). That means the area under the building remains mostly flat, while the area just outside the building slopes relatively smoothly towards it. (The normal build restrictions stop you placing a building too near a steep slope, so you can't distort the terrain too much anyway). That's easy to implement and probably helps with most cases of building on bumpy ground.

    • Like 1
  21. Even worse are constructs like CStrIntern::GetString, which creates a huge map of shared_ptr<>'s to CStrInternInternals to 'save space on duplicate strings'.

    That's not what CStrIntern is for - it's just for fast string comparisons and copies, since string equality is simply pointer equality. In particular it's for cases where you might expect to use enums, but you want to let data files define new values too and don't want the hassle of maintaining some centralised definition of every possible value, so you use strings and convert them at load-time into some unique identifier (which happens to point to the character data). Memory usage is not a concern at all, since it's only used for a few short strings. (It should certainly never be used for UI text.)

    (But I think quite a bit of the code is silly in that it converts a string literal into a CStrIntern many times per frame, which isn't cheap - the CStrInterns ought to be constructed once and cached in the appropriate places, and then there should be no extra runtime cost compared to enums. So that'd be nice to fix. And with that fixed, we should never look at the contents of the strings at runtime, so they're not going to be pulled into cache.)

    In the P2P case we send 5 packets per second (one every 200 ms) per peer per peer.

    The game's current network architecture is client-server: every player sends their list of commands to the server, and the server sends those commands back to every player, then the server tells each player when it's safe to start the next turn. (The server doesn't know anything about the simulation state - it's just blindly forwarding packets and counting turns). So if nobody is doing anything, each client will just receive one packet from the server per turn, and the server will just send one per player per turn.

    • Like 1
  22. I wondered if it's possible to simply render as many frames as possible while the sim update is running.

    To do rendering concurrently with update, you really need some kind of double-buffering system for the simulation state - you don't want to be rendering from the same data structures you're modifying in another thread, because that leads to race conditions and madness, so you want to render from an immutable copy of the state. I suppose that's not terribly hard in principle, since we mostly just need to copy CCmpVisualActor and some bits of CCmpPosition at the start of a turn, but there's probably lots of tricky details with other things that are updated more than once per turn (building placement previews, projectiles, everything in Atlas, etc). (Also it'll add an extra turn of latency between player input and visible output, so we'd need to make our turn length shorter to compensate for that.)

    I think the text format of profiler1 was a bad choice, it has way too much overhead.

    Yeah, it was never meant to be used like this - it was designed just for the interactive in-game table view, then someone added a mode that saved the table to a text file because it's helpful when debugging other people's performance problems, and then I reused it in the replay mode to save every few turns and draw graphs, which is a totally inappropriate thing to do.

    What I need at the moment is something to measure the performance difference before and after doing some changes.

    For the Spidermonkey upgrade for example I need to know how much faster version 18 is compared to version 17.

    I think it's best to use the (non-visual) replay mode for that, since it's about optimising one component rather than about profiling the entire system - rendering introduces a lot of unpredictability across hardware (e.g. someone with a faster graphics card will have higher FPS, so we'll spent more CPU time on per-frame rendering overhead per second, so simulation will look relatively less expensive than on an identical CPU with slower GPU), and I'd guess it introduces more variability on the same hardware (e.g. a very small change might push you over a vsync threshold and double your framerate), and it takes much longer to simulate a whole match. Replay mode lets you focus on just the simulation CPU cost, and a better version of profiler2 would let you see the worst-case simulation cost per turn over the whole match (which would ideally be under maybe 30msec so we can maintain smooth 60fps rendering (the graphics drivers do a bit of buffering which can cover an occasional extra gap between frames)), and that should be enough to see how well an optimisation works. (And then it can be compared to not-so-easily-reproducible whole-system profiles to see whether the thing you're optimising is a significant cost in the wider picture.)

×
×
  • Create New...