Jump to content

Ykkrosh

WFG Retired
  • Posts

    4.928
  • Joined

  • Last visited

  • Days Won

    6

Posts posted by Ykkrosh

  1. I have no idea why , but I'm more than happy to share the changes that I did to the engine to compress the textures as ETC2. Considering that I did it in a few minutes, maybe I did something wrong ...

    Ah, looks like the ALPHA_PLAYER is the problem - that still needs to be 8-bit alpha. I think there are approximately no cases where we can use 1-bit alpha formats (since even if the original texture has only 1-bit alpha, its mipmaps will need higher alpha resolution). I think we currently use no-alpha DXT1 (4bpp) for any textures whose alpha channel is entirely 0xFF, and DXT5 (8bpp) for everything else.
  2. But if we do generate the blended terrain textures at runtime like that, we need to compress them, to get the best quality and memory/bandwidth usage.

    Mmm... Having read the ASTC spec, I'm not sure there's much chance of a real-time encoder with decent quality - there's a huge number of variables that the encoder can pick, and if it tries to pick the optimal value for each variable then it's probably going to take a lot of time, and if it doesn't then it's just wasting bits. ETC2 might be more appropriate for that case, since it's much simpler and will presumably take much less encoding time to get close to optimal output. Might be interesting to do a test of quality vs encoding speed though, to see if that's true.
  3. use ETC2. Because is the only compressed texture format that is mandatory on all GLES 3+ devices. ASTC is not mandatory nor widely used.

    The Android Extension Pack requires ASTC. The AEP is not mandatory, but GLES 3+ isn't mandatory either - what matters is what the GPU vendors choose to support. (I think even GLES 2.0 wasn't mandatory until KitKat, but pretty much everyone supported it long before that.)

    ASTC does seem like it will be widely supported in the future - Mali-T622, PowerVR Series6XT, Adreno 420, Tegra K1. Support is not great right now, but any decent Android port of the game is going to be a long-term project, and if we're going to spend effort on new texture compression formats I think it may be better to spend the effort on one with better long-term prospects.

    (And ASTC has better quality at the same bitrate, and much more flexibility in choosing bitrates (possibly the most useful feature - we can trade off quality against memory usage and texture bandwidth), so if compatibility was equal, it would be the better choice.)

    The (raw) S3TC (dds) images size is ~800Mb and the ETC2 (KTX) images size is ~680Mb, as you can see ETC2 images have ~17% less size, which is good.

    I still don't see how that's possible - S3TC and ETC2 are both 4bpp for RGB (and for RGB with 1-bit alpha), and both are 8bpp for RGBA, and the DDS/KTX headers should be negligible, so there should be no difference, as far as I understand.

    I think on the fly compression on Android is a no go

    Slightly tangential, but a possible use case for on-the-fly compression:

    Currently we render terrain with a lot of draw calls and a lot of alpha blending, to get the transitions between different terrain textures - I think a blended tile needs at least 7 texture fetches (base texture + shadow + LOS, blended texture + shadow + LOS + blend shape), sometimes more. That's not very efficient, especially on bandwidth-constrained GPUs (e.g. mobile ones). And it's kind of incompatible with terrain LOD (we can't lower the mesh resolution without completely destroying the blending).

    I suspect it may be better if we did all that blending once per patch (16x16 tiles), and saved the result to a texture, and then we could draw the entire patch with a single draw call using that single texture. It'll need some clever caching to make sure we only store the high-res texture data for patches that are visible and near the camera, and smaller mipmaps for more distant ones, to keep memory usage sensible, but that shouldn't be impossible.

    But if we do generate the blended terrain textures at runtime like that, we need to compress them, to get the best quality and memory/bandwidth usage. So we'll need a suitably-licensed compression library we can use at runtime (though one that focuses on performance over quality; and if I can dream, one that runs in OpenCL or a GLES 3.1 compute shader, since the GPU will have way more compute power than the CPU, and this sounds like a trivially parallelisable task that needs compute more than bandwidth).

    (Civ V appears to do something like this, though its implementation is a bit rubbish (at least on Linux) and has horribly slow texture pop-in. But surely we can do better than them...)

    - (at least on Android) use KTX format instead of DDS. KTX is flexible enough to be used in combination with any textures format.

    DDS is also flexible enough to be used in combination with any texture format, so I don't think that's an argument in favour of KTX :). Using two different texture containers seems needlessly complex. We could switch the generated textures to KTX on all platforms, but we still need to support DDS input because lots of the original textures in SVN are DDS (from the days before we supported PNG input), and it sounds like more work than just updating our DDS reader to support a new compression format.

    * do we need (or we'll do in the future) cubemap faces?

    We currently use cube maps for sky reflections, though the actual inputs are single-plane image files and we generate the cube maps in a really hacky and very inefficient way. That ought to be rewritten so that we load the whole cube map efficiently from a single compressed DDS/KTX file.

    In general, changing our texture code is perfectly fine - it has evolved slightly weirdly, and it supports features we don't need and doesn't support features we do need, so it would be good to clean it up to better fit how we want to use it now. (I don't know enough to comment on your specific proposals off the top of my head, though :). Probably need to sort out the high-level direction first.)

    I'll volunteer to add KTX support to NVTT :).

    NVTT hasn't had a release for about 5 years, and we want to rely on distro packages as much as possible, and they only really want to package upstream's releases, so changing NVTT is a pain :(. (The distros include a few extra patches we wrote to fix some serious bugs, but I doubt they'll be happy with a big non-upstream patch adding KTX support and changing the API). It'd probably be less effort to write some code to transcode from DDS to KTX, than to get NVTT updated.
  4. - it seems there is another etcpack from Ericsson which might be open source, but I can't access any of their web sites ... I've found their source code in here https://github.com/paulvortex/RwgTex/tree/master/libs/etcpack , its opensource.

    That looks like a non-GPL-compatible (and non-OSI) licence, e.g. you're only allowed to distribute source in "unmodified form" and only in "software owned by Licensee", so it's not usable.

    - RwgTex - seems quite good, lots of input/output formats ...

    That just calls into etcpack, so it has the same licensing problems.

    - etcpak - Check this blogpost. It seems it doesn't support ETC2/EAC thought ...

    Licence looks okay, but it sounds like quality may be a problem (from the blog post: "As for the resulting image quality, my tool was never intended for production usage").
  5. Hmm, ETC2 sounds like it might be good - same bitrate as S3TC (4bpp for RGB, 8bpp for RGBA) and apparently better quality, and a reasonable level of device support.

    (Integrating it with the engine might be slightly non-trivial though - we use NVTT to do S3TC compression but also to do stuff like mipmap filtering and to generate the DDS files, so we'd need to find some suitably-licensed good-quality good-performance ETC2 encoder and find some other way to do the mipmaps and create DDS etc.)

  6. 1gb of ram is huge for a device that has only 2gb for the entire OS.

    Oh, sure, but then it's just a problem of the game being fundamentally not designed to run on that kind of hardware, which is a totally different problem to a memory leak bug (though probably a harder one to fix) :)

    It seems that s3tc is available on quite a lot of android devices. Is it enabled st runtime? Or I need to set some compile flags?

    It's not available on any devices I've used (Adreno 320/330, Mali-400MP, VideoCore) - as far as I'm aware, only NVIDIA supports it on mobile. But I think the game will use it automatically whenever it's available. (public.zip always contains S3TC-compressed textures, and the game will detect whether the drivers support S3TC, and if not then it will just decompress the textures in software when loading them.)
  7. I need someone who knows GL(ES), to help me to fix these errors and also to check/debug the trace(s) to see why it eats that much memory (+1Gb), the problem is that when the game starts, it eats more and more memory until its killed by the OS :(. At the beginning I suspect the engine has some leaks, that why it continuously eats memory, but I've seen the same thing happens with the retracer on Android, so now I suspect that the GL(ES) commands, somehow are responsible for the leaks.

    1GB isn't that much, depending on the map and game settings - if I run on desktop Linux with nos3tc=true (to mirror the lack of working texture compression on Android), on "Greek Acropolis (4)" with default settings, I see resident memory usage ~1.3GB. With S3TC enabled it's ~0.6GB. I think that indicates there's a huge volume of textures, and we don't (yet) have any options to decrease the texture quality or variety, so you just need a load of RAM.

    (Texture compression is kind of critical for Android, but ETC1 is rubbish and doesn't support alpha channels, and nothing else is universally available. It would probably be most useful to support ASTC, since that's going to become the most widely available in the future, and has good quality and features.)

    Even more, having an application that can replay the trace directly on Android, its great because I could run also benchmarks and it seems that not the CPU is the (main) bottleneck for the slowness, but the GPU. There are frames that have +10k of GL commands, which it seems is a little bit too much for an android tablet :)

    The problem with having a very large number of GL calls is usually the CPU cost of those calls, so the bottleneck might still not be the GPU :)
    • Like 1
  8. We care about Ubuntu 12.04? I don't think Canonical does... The next LTS is out and has been for a while. I'm a very late adopter, and I'm already on 14.04 :-).

    12.04 is supported by Canonical for 5 years, i.e. until 2017. Over the past 6 months we had roughly the same number of players on 12.04, 13.10, and 14.04 (~3000 on each, who enabled the feedback feature). That time period overlaps the release of 14.04, so there would probably be different results if we waited another few months and checked the latest data again, but it suggests there's likely still a significant number of people on 12.04 today. So we shouldn't drop support for 12.04 without significant benefits (and I don't think there are any significant benefits here, since 12.04 / GCC 4.6 should be good enough for the SpiderMonkey upgrade).
  9. I think even on LTS distributions like RHEL its no problem to install e.g. gcc 4.7 in case its required.

    To integrate properly and straightforwardly with distro package managers, we should only use the default compilers in the distros we care about, which I think currently means GCC 4.6 (for Ubuntu 12.04).

    On Windows, VS2010 should be sufficient for compatibility with SpiderMonkey (since I think that's what Mozilla still uses), but many of the more interesting C++11 features are only available in VS2012 or VS2013. One problem there is that VS2012 won't run on WinXP, so any developers currently using WinXP (I've got data that suggests there's still a few) will have to upgrade to Win7 or above. (Players can still use XP, this only affects people wanting to compile the code). Is there anyone here who would be seriously affected by that?

    Also, updating VS would be a bit of a pain for e.g. me, since I use Visual Assist (and can't stand VS without it) but only have a license for a VS2010 version. And the VS2012/2013 IDEs have a terrible UI design (uppercase menus, really?). Those aren't blocker issues, but they are downsides.

  10. There might be ways to use a GPU to implement certain kinds of pathfinding efficiently, but that article is not one - it looks like about the worst possible way you could use a GPU :P. If the maximum path length is N (with maybe N=1000 on our maps), it has to execute N iterations serially, and GPUs are very bad at doing things serially; and each iteration is a separate OpenGL draw call, and draw calls are very expensive. And it's only finding paths to a single destination at once, so you'd need to repeat the whole thing for every independent unit. It would probably be much quicker to just do the computation directly on the CPU (assuming you're not running a poorly-optimised version written in JS on the CPU).

    AMD's old Froblins demo used a similar (but more advanced) technique ("Beyond Programmable Shading Slides" on that page gives an overview). So it can work enough for a demo, but the demo has much simpler constraints than a real game (e.g. every unit in the demo has to react exactly the same way to its environment, it can't cope with dozens of different groups all wanting to move towards different destinations) and I doubt it can really scale up in complexity. (Nowadays you'd probably want to use OpenCL instead, so you have more control over memory and looping, which should allow more complex algorithms to be implemented efficiently. But OpenCL introduces a whole new set of problems of its own.)

    And performance on Intel GPUs would be terrible anyway, so it's not an option for a game that wants to reach more than just the high-end gamer market.

    • Like 3
  11. I believe I tried that when first implementing the algorithm. If I remember correctly, the problem was that sometimes a unit starts from slightly inside a shape or precisely on the edge, and the silhouette-point-detection thing will mean it doesn't see any of the points on that shape at all. That makes the unit pick a very bad path (typically walking to corner of the next nearest building and then back again), which isn't acceptable behaviour.

  12. The 4x4-navcell-per-terrain-tile thing and the clearance thing were part of the incomplete pathfinder redesign - #1756 has some versions of the original patch. None of it has been committed yet. (I think the redesigned long-range pathfinder mostly worked, but it needed integrating with the short-range pathfinder (which needed redesigning itself) and with various other systems, and that never got completed.)

    • Like 3
  13. (The notion of "enemy" is complicated because of diplomacy etc, and currently our C++ code is completely ignorant of that concept, so it can only do very generic filtering)

    Hmm, I misremembered the code a bit - it already does filter by player ID in the C++ (and the JS gives it a list of all the enemy player IDs). But it does that filtering *after* getting the sorted de-duplicated list of every entity owned by any player. That's certainly a bit silly.

    (I don't know whether Gaia (owner of trees and rampaging lions) counts as an enemy in these queries...)

    1) Does any game logic really *need* to know which entities entered or left range since the last turn? e.g. if I am a static defense turret, do I care specifically about new enemies that walked *into* my firing range when I can just call GetInRange again and grab all the ones that are in range *now* at much smaller cost? (depending on tile size and object count, but currently the sort's form a clear bottleneck)

    I'd be surprised if it's ever a much smaller cost - sorting a list of ints in C++ is O(n log n) but the constant factors are pretty small, while converting a list of ints into a JS array and then iterating over that list in JS and checking whether each entity is an enemy is O(n) but huge constant factors. The set_difference thing was added specifically to minimise the JS cost.

    2) Is ExecuteActiveQueries a frequent or infrequent (conditional) callee?

    If I remember correctly, it's pretty much the only significant user of GetNear. (...not counting PickEntitiesAtPoint which currently uses GetNear but really needs to be redesigned anyway)

    ... this comment makes me more inclined to think 0AD's default grid resolution is too coarse since a simple Euclidean distance (or w/e) out-of-range check shouldn't break the bank but sending thousands of unfiltered entities to JS per turn certainly would (especially if the overhead isn't carefully managed by sending data across in batches). Have you guys tried experimenting with this?

    Do you mean experimenting with different divisionSizes in SpatialSubdivision? I think I tested a few values when I first wrote this code (too many years ago) and the current size (8*8 terrain tiles) worked best in the scenario I ran, but it wasn't at all careful or comprehensive testing, and the game has changed a lot since then.
  14. AFAICS no calling code actually cares about the ID's being sorted, nor should it

    ExecuteActiveQueries calls set_difference on the list returned by GetNear, so that it can find the (usually very small) lists of entities that entered and exited the range since the last turn. set_difference requires the lists to be sorted - if they're not then it will give incorrect results. (And if we do have to sort them anyway, the extra make_unique is essentially free.)

    If I remember correctly, RedFox made a similar change once and it made ExecuteActiveQueries a lot faster, but it actually made the game slower, because it ended up sending many duplicate entity IDs to scripts which had to do a lot of work before rejecting the out-of-range entities. Am I missing something in your patch, or does it have the same problem?

    Part of the need for this set_difference thing is that we don't really do any filtering in RangeManager - if an idle unit is waiting for an enemy to come into range, RangeManager will typically find all the hundreds of trees and friendly units that are always in range, and we don't want the UnitAI scripts to be looping over those hundreds of entities every turn to detect whether one is an enemy. (The notion of "enemy" is complicated because of diplomacy etc, and currently our C++ code is completely ignorant of that concept, so it can only do very generic filtering). How does Spring handle this kind of thing?

  15. I asked myself if 0AD could be used as an OpenGL 3D engine. Of course it does, but I would like to know if pyrogenesis is a 3D engine or a RTS-game-engine.

    It's an RTS game engine. Some of the graphics code is designed to be fairly generic (the low-level texture loading, higher-level compressed texture manager, shader system, etc) and could be reused in different types of game, but a lot of the graphics code is designed specifically for the requirements of an RTS (and specifically our RTS design) and would be inappropriate for most other games. (E.g. in an FPS you have a lot of static high-polygon-count geometry and complex static lighting, and probably need to spend a lot of effort on occlusion culling etc; whereas an RTS usually has a large number of low-poly-count objects with simple lighting and simple culling from the top-down view, so a lot of the design decisions are based on fundamentally different goals.)

    Reusing the generic bits would probably be a lot easier than starting totally from scratch, but you'd still have to write a load of custom OpenGL code to do pretty much all the rendering of your scene, so it'd still be a lot of effort. An existing general-purpose 3D engine would save a lot more of that effort (assuming that engine is well implemented, well documented, matches your game's requirements, etc).

    Also I would like to know if the GUI used in 0AD is released as a standalone library or if it's a part of pyrogenesis.

    It's integrated with the rest of the game engine. In principle it's reasonably modular and could be split out into a separate library (plus a few extra libraries that it depends on) without a huge amount of work. But it's not a very well designed GUI engine (the original design was too limited for what we need, so we've had to hack lots of extra features onto it and it's got a bit messy) - if you're starting from scratch and don't have to worry about compatibility with already-written GUI XMLs/scripts (like we do) and aren't fussy about what scripting language it uses (like we are), there are probably better GUI libraries available nowadays.

    but the most important of them for me now, it to know what kind of scene graph(s) is used in pyrogenesis (or maybe the one(s) you planed to add)?

    There isn't an explicit scene graph. It's implicit in a variety of different modules that represent parts of the world - e.g. the simulation system has a list of units, which each own a CModel (which is a tree of meshes, decals, etc, with transform matrices being computed based on the tree structure, and the nodes get submitted to the renderer each frame as a flat list) and sometimes own extra graphics objects (like health-bar sprites); then terrain and water and particles etc are controlled by completely independent bits of code. Once everything has been submitted to the renderer's lists of objects, there are just some C++ functions that decide what order to render everything in, based on hard-coded knowledge of all the different types of renderable object.
  16. We shouldn't be using any GLES extensions, and as far as I'm aware we aren't. So either I'm mistaken (quite possible), or there's a bug in those PowerVR drivers, or there's a bug in our code.

    They seem to be caused in quite a few places - mostly in GameSetup.cpp (but that's only because Render() calls other drawing functions).

    The usual approach for debugging GL errors (when there aren't any good tools to help) is to put some ogl_WarnIfError() in randomly, look for the very first error it reports, then look at the region of code between that ogl_WarnIfError and the previous ogl_WarnIfError (which didn't report an error). Then put lots more ogl_WarnIfError() in between those two, and if there are function calls between them, then descend into those functions and put more ogl_WarnIfError there too. Repeat to narrow down the problem further. Eventually you'll end up with an ogl_WarnIfError which doesn't report an error, then a single OpenGL call, then a ogl_WarnIfError which does report an error, and that tells you where the problem. Then print out the arguments that were passed into that GL call, and look at the GLES spec (or ask someone who's familiar with it) to see what's wrong.
  17. If I remember correctly, AoE3 and AoE Online had very similar pathfinders but differed in whether all the units would immediately stop moving once the invisible 'formation controller' thing reached the group's target, or whether each individual would carry on to their assigned spot in the formation. But I can't remember which was which, and I can't remember why I chose the one that I did :(

    Running is not really about formations reaching the destination - it's needed for when the formation starts moving. If the invisible formation controller thing starts moving at the maximum speed of the units, then any unit that didn't start precisely in its assigned spot in the formation will never be able to catch up and reach its assigned spot while the formation is moving, no matter how long it continues moving. Similarly any unit that hits an unexpected obstacle (like another unit or a tree) will never be able to catch up again. It's necessary for either the units to move faster or the formation controller to move slower, at least until everyone has formed up.

    Units probably don't need to run quite as fast as they do now, though - they could be slower while still getting into formation quickly enough. And units separated by long distances shouldn't be considered part of a single formation anyway (so that they don't have to run across half the map to form up) - I think AoM/AoE3/etc did some kind of cluster detection when you gave a move order, so each cluster of units would move as an independent formation, until they got close enough to each other to merge into one. That can be added as an extra layer on top of the basic pathfinder and hopefully wouldn't complicate the lower layers.

    • Like 1
  18. Changing cpu_CAS64 to take 32-bit intptr_t arguments probably isn't ideal - there's a reason it has "64" in the name :). I think that's only used for some non-critical timer stuff though so it shouldn't really matter.

    The screenshot looks like it probably just failed to load any textures. Sounds like Galaxy Nexus is PowerVR - I don't know if anyone has successfully run on that before. (I've used Mali (Galaxy S II) and Adreno (Nexus 7 2013), and historic_bruno has used Videocore (Raspberry Pi).) So it might just be buggy GPU drivers and/or buggy engine code that interacts poorly with those drivers. You should probably make sure it's not trying to use S3TC textures (nos3tc=true in the config file (I assume you've set up the default config file etc already), but I don't think the driver should advertise that extension so it should be disabled automatically anyway), and look in 'adb logcat' for any suspicious error messages, and then try to narrow down the GL_INVALID_VALUE errors to see what's causing them (they may or may not be related to the texture problem, but they should be fixed anyway and they're a place to start) by building in debug mode and adding lots of ogl_WarnIfError() calls until you can see which GL call the error comes from.

  19. All that's needed for prettier fonts (short of a redesign adding far more than the above approach) is for artists to request new fonts or effects somewhere like Trac, and a programmer to generate the textures or walk them through the steps for doing so. I haven't seen that in the last 3 years, so if someone is unhappy with our fonts, they aren't documenting it well, if at all.

    I think Mythos has moaned about it a bit, though partly that was because he'd force-enabled FXAA which antialiased all the text and made it hideous (as it does in other games too) :). There are almost certainly things that would be useful and straightforward to improve, though. I did some experiments recently like this, with different fonts and with shadowing instead of stroking and with the different hinting options - it's easy to do that to see what things will look like, and to regenerate the game's fonts, but someone needs to make an informed judgement call on what would be best.

    (It's fairly easy to add support for non-bidi non-complex-shaped non-CJK scripts (Cyrillic is probably the main one) in the current system too - they get packed reasonably efficiently (so we don't really need to generate the textures at runtime), and it just needs a few improvements to some Python scripts. But bidi (Arabic, Hebrew) is non-trivial since it'll have to interact with our GUI engine's text layout code, CJK would possibly make the textures too large and I think it needs language-sensitive font selection (not a straight codepoint->glyph mapping), and complex shaping (prettier Arabic, Devanagari, maybe nicer English with sub-pixel hinting) is impossible with a texture-full-of-glyphs approach and would need a totally different design. So I think it's worthwhile doing the easy things now, and it would be nice to eventually do a totally different design that fixes all those problems, but it's not worth doing a new implementation of a glyph-texture design since it won't get us any closer to fixing those problems.)

×
×
  • Create New...