Jump to content

DDS compression process


Recommended Posts

  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

If the API is synchronous, the main loop (renderer) is frozen and we're unreactive to input and also network messages (and other players may think we lagged out).
(Networking ought to be handled in a separate thread anyway, so that the communication latency isn't hurt by the framerate. I think that shouldn't be too hard to add to the current network code, except for some issues with needing to deserialize JS objects.)
Let's instead consider an asynchronous API that provides placeholders and swaps them out transparently, and provides a notification that the loader is still working on stuff that's needed now (as opposed to just prefetching). With this notification, you could achieve the same perceived effect as the synchronous API (graphics are paused for a bit), but you're still running the main loop and are responsive to network/input events (boss key :) ).
How are the graphics paused, given that we're already halfway through the rendering when we tell the loader we need a new texture right now? Continue rendering as normal and just skip the SDL_GL_SwapBuffers() at the end, perhaps? (I have no idea if that would really work - sounds a bit dodgy to completely stop repainting the screen. Also it means the best case would be a 1 frame delay (perhaps 50ms), even if the texture would load in under 1ms (which it does with a hot disk cache).)
When the prefetcher can't keep up, we could reduce the wait time by leaning a little bit on the placeholders. I agree that black stuff flickering in would be distracting, but it needn't be that bad. A nice example is Jak&Dexter, which makes the player stumble if the loader isn't keeping up (silly idea for us: cover the terrain with clouds ;)). Instead, we could go with a more neutral color, or a rough approximation of the actual hue. That of course suggests a level-of-detail scheme, where a really low-res texture might be loaded first and replaced by the full version when available.
It shouldn't be hard to e.g. have the release process pack a 16x16 version of every texture in the whole game into a single 0.5MB texture that's loaded at startup, if we want an immediate low-detail version. Might look okay for terrain and object textures, though I guess it would be uglier for large GUI textures.

Textures aren't the whole problem, though - actors and meshes and animations take time to load (much more? much less? I don't have any measurements now), and it'd be nice to handle them similarly, and there isn't any quick low-detail version we could swap in for them. So we need to handle them with some combination of prefetching and render-delaying, and fancy placeholders just for textures probably wouldn't add much benefit.

Players are very likely to see almost all the terrain and all the units in the world, by the end of a game, so we'll have to load it eventually and design it to fit in their memory.
Disagree - only one of several terrain biomes would be active at a time [...]
By "world" I meant the currently loaded map, i.e. just the data that would be required if you zoomed out to see everything at once or if you scrolled across the whole map.
It seems a shame to have this nice centralized loader, only to waste it via an API that just has the renderer call SyncLoadTexture().
I agree ;). I think my current rough API might be asynchronous enough already, so hopefully it's okay and we can relatively easily change the implementation later if needed. Currently it's like:
// Prepare a texture (but don't do any loading yet - we might not ever want to render this texture):
CTextureProperties texture(pathname);
texture.SetWrap(GL_REPEAT);
texture.SetMaxAnisotropy(2.0f);
CTexturePtr m_Texture = g_Renderer.GetTextureManager().CreateTexture(texture);

...

// Use it in the renderer:
g_Renderer.GetTextureManager().LoadTexture(m_Texture, CTextureManager::PRI_HIGH);
m_Texture->Bind();
glDrawElements(...);

...

// Maybe we want to prefetch it earlier:
g_Renderer.GetTextureManager().LoadTexture(m_Texture, CTextureManager::PRI_LOW);

There's three load priorities:

IMMEDIATE = it really must be loaded now, so the bind will succeed and we can read pixel data out of it. (Should be used very rarely.)

HIGH = if there's a cached compressed version then load that immediately before returning; otherwise add to a compression queue, and use the black placeholder for Bind() for now.

LOW = add to a loading queue.

Then the main loop asks the texture manager to process some of its queues each frame, and it'll do some loading and compressing and update the Handle stored behind the CTexturePtr. The rendering code never accesses the Handle directly, so the texture manager can easily update the textures whenever it fancies (for async loading, hotloading, etc). The caller can't make any assumptions about synchrony (except for IMMEDIATE) so it doesn't restrict the implementation.

Is this the kind of thing you are suggesting, or are there serious problems with this?

Link to comment
Share on other sites

I think then it would fetch just the assets from the selected factions?

Yep, that's correct ;) Just wanting to clarify what "all the terrain and all the units in the world" meant.

(Networking ought to be handled in a separate thread anyway, so that the communication latency isn't hurt by the framerate. I think that shouldn't be too hard to add to the current network code, except for some issues with needing to deserialize JS objects.)

That'd be nice.

Continue rendering as normal and just skip the SDL_GL_SwapBuffers() at the end, perhaps? (I have no idea if that would really work - sounds a bit dodgy to completely stop repainting the screen. Also it means the best case would be a 1 frame delay (perhaps 50ms), even if the texture would load in under 1ms (which it does with a hot disk cache).)

That's perfectly legit. Each Render() does glClear and renders into the back buffer. We can do that arbitrarily often; only when the buffers are swapped does it become visible.

Also, latency is entirely acceptable. Consider that some TFT monitors add consistent 70 ms delay, and the network latency is probably 200..300 ms anyway.

Textures aren't the whole problem, though - actors and meshes and animations take time to load (much more? much less? I don't have any measurements now), and it'd be nice to handle them similarly, and there isn't any quick low-detail version we could swap in for them. So we need to handle them with some combination of prefetching and render-delaying, and fancy placeholders just for textures probably wouldn't add much benefit.

Quick unscientific measurement: DDS files are 265 MB, PMD 11.5 MB and the animation folder is 40 MB. There would be a large benefit to placeholder textures after all, and combining them into one large file is a really good idea. I was thinking about just loading one of the higher mipmap levels, but a texture atlas would avoid the disk seek/IO.

Your API sounds reasonable :) I'd be happier if it really were in a thread, but that's not going to be possible within the near future. As long as there is a queue and a spot in the main loop where updates are triggered, that should be easy to retrofit.

One thing that strikes me is the LoadTexture name - that is only fitting for the IMMEDIATE case. Seems better to provide several routines (RequestPrefetch, StartLoading, Load) instead of overloading the single entry point. Those names make clear what is/could be going on under the hood.

Link to comment
Share on other sites

Also, latency is entirely acceptable. Consider that some TFT monitors add consistent 70 ms delay, and the network latency is probably 200..300 ms anyway.
Uh, I thought the main point of doing this stuff asynchronously was to avoid variation in latency and keep a consistent framerate even when loading. Optimising just the worst case (dying hard disks etc), at the expense of the best/typical case, sounds counterproductive.

Maybe we could do both if it was multithreaded, though: LoadTexture can start a background load, wait up to 0.5 sec for a reply, then fall back to the placeholder texture if it takes longer than that, so it'll never freeze for long.

Quick unscientific measurement: DDS files are 265 MB, PMD 11.5 MB and the animation folder is 40 MB.
Hmm... Another quick measurement: with cold cache, loading and uploading terrain/model (not GUI) textures for the initial view on Arcadia takes 2 secs; CObjectEntry::BuildVariation (which I think loads prop actors and meshes and idle animations) cumulatively takes 8 secs. (With hot cache it's about 0.1 secs / 0.2 secs).
One thing that strikes me is the LoadTexture name - that is only fitting for the IMMEDIATE case. Seems better to provide several routines (RequestPrefetch, StartLoading, Load) instead of overloading the single entry point. Those names make clear what is/could be going on under the hood.
Makes sense. (Also I should probably give each texture a pointer to the texture manager, so I can just call m_Texture->Load(). And then make Bind() implicitly do the StartLoading thing, so callers don't have to remember to load it first.)
Link to comment
Share on other sites

Uh, I thought the main point of doing this stuff asynchronously was to avoid variation in latency and keep a consistent framerate even when loading. Optimising just the worst case (dying hard disks etc), at the expense of the best/typical case, sounds counterproductive.

Vastly improving the worst case at the expense of a slight slowdown for the average case is a good engineering principle IMO - consider Quicksort vs. Mergesort.

To elaborate: I think it's acceptable to drop a frame here and then if that is the price of avoiding potentially unbounded freezing of the main loop while converting the input files (e.g. in the case of modders).

Maybe we could do both if it was multithreaded, though: LoadTexture can start a background load, wait up to 0.5 sec for a reply, then fall back to the placeholder texture if it takes longer than that, so it'll never freeze for long.

Yep, sounds good. Win32 async IO does the same (but I would shorten the timeout to say 150 ms to ensure responsiveness).

Hmm... Another quick measurement: with cold cache, loading and uploading terrain/model (not GUI) textures for the initial view on Arcadia takes 2 secs; CObjectEntry::BuildVariation (which I think loads prop actors and meshes and idle animations) cumulatively takes 8 secs. (With hot cache it's about 0.1 secs / 0.2 secs).

Good to know! That is surprisingly long. However, rather than invalidating the case for placeholders, I would say that's an additional incentive for optimizing BuildVariation and thinking about similar 'placeholders' there (e.g. reducing the amount of variations until later in the game).

Also I should probably give each texture a pointer to the texture manager, so I can just call m_Texture->Load().

Yep, that's good, would also avoid shutdown issues when the texture manager gets wiped out before the last texture is destroyed.

And then make Bind() implicitly do the StartLoading thing, so callers don't have to remember to load it first.)

Ugh, that sounds surprising (i.e. bad) and could cause unnoticed weird behavior (e.g. if callers actually wanted sync loads, or only wanted prefetching+placeholder). Why not just assert() that callers have called one of the three APIs before Bind()?

Link to comment
Share on other sites

(Hmm, compression is irritatingly slow - I'll probably see if I can move just that part into a thread for now (but not any of the IO code or anything), so the game stays playable while it's busy compressing (e.g. when you change the compression settings file and it hotloads all the textures).)

And then make Bind() implicitly do the StartLoading thing, so callers don't have to remember to load it first.)

Ugh, that sounds surprising (i.e. bad) and could cause unnoticed weird behavior (e.g. if callers actually wanted sync loads, or only wanted prefetching+placeholder). Why not just assert() that callers have called one of the three APIs before Bind()?

It doesn't seem that surprising to me - these are load-on-demand texture objects, and Bind() is the demand, so it should load them. Requiring an explicit load call would add (a small amount of) complexity to the renderer logic, and assert failures wouldn't be trivial to debug (they might occur in rarely-used codepaths, and if a texture is shared by two different systems then it might depend randomly on the order in which those systems run), so it doesn't sound like an improvement.
Link to comment
Share on other sites

(Hmm, compression is irritatingly slow - I'll probably see if I can move just that part into a thread for now (but not any of the IO code or anything), so the game stays playable while it's busy compressing (e.g. when you change the compression settings file and it hotloads all the textures).)

Roger that. It'll be awhile before we can reliably do threaded IO, anyway.

It doesn't seem that surprising to me - these are load-on-demand texture objects, and Bind() is the demand, so it should load them.

The thing is, which of these three methods would Bind call, if it wasn't done already?

assert failures wouldn't be trivial to debug (they might occur in rarely-used codepaths, and if a texture is shared by two different systems then it might depend randomly on the order in which those systems run), so it doesn't sound like an improvement.

At least you get a warning of incorrect use. If someone accidentally forgets their call to Load or RequestPrefetch, then they get different behavior (-> with different and probably undesirable performance characteristics), and you wouldn't even know that anything is wrong (try debugging "why is the texture loader so laggy", or "why is it doing so much unnecessary work?").

Link to comment
Share on other sites

(Threaded compression is fun - everything is nice and smooth (at least on dual core) and the textures slowly but magically pop into existence. Writing threadsafe code is hard, though - I don't think I can even use WriteBuffer since it can call io_Allocate which can use the global AllocatorChecker, which is unsafe by itself and also calls debug_assert which is unsafe. But I think in this case I can allocate the buffer in advance so it's not too bad.)

The thing is, which of these three methods would Bind call, if it wasn't done already?
Sync loads are needed about once in the whole engine, so they're not very interesting. Prefetching won't call Bind (since the whole point of prefetching is to load textures before the renderer first needs to bind them and render them). So Bind will always be associated with the load-synchronously-unless-it's-going-to-take-too-long behaviour.
At least you get a warning of incorrect use.
Not reliably - e.g. in the sync loading case of Atlas wanting to draw terrain previews, the same terrain textures may or may not have already been loaded by the normal renderer or the prefetcher, so it may or may not give the warning. In the prefetch case, the renderer can't assume the prefetcher is perfect and anyway it'll want to boost the priority of the texture it's trying to render, so it'll always have to call the TryToLoadNowIfItsNotTooMuchBother() function before Bind() anyway, and we wouldn't get warnings about forgotten prefetching.
Link to comment
Share on other sites

(Threaded compression is fun - everything is nice and smooth (at least on dual core) and the textures slowly but magically pop into existence.

heh, sounds nice ;)

I don't think I can even use WriteBuffer since it can call io_Allocate which can use the global AllocatorChecker, which is unsafe by itself and also calls debug_assert which is unsafe.

yep, there's all sorts of stuff to check - but that's also something I can do at work later this year, since the proportion of parallel sections / threaded code is always increasing.

Sync loads are needed about once in the whole engine, so they're not very interesting. Prefetching won't call Bind (since the whole point of prefetching is to load textures before the renderer first needs to bind them and render them). So Bind will always be associated with the load-synchronously-unless-it's-going-to-take-too-long behaviour.

OK, makes sense. As long as that is mentioned in the doxygen, it should be fine :)

Looks like the design here is pretty much hashed out, which is good because I won't have any time tomorrow and will fly out on Thursday.

Link to comment
Share on other sites

Thanks for discussing this - I think I'm getting closer to understanding it now ;)

I'm setting it up to convert all textures into DDS, even when the original is DDS - that's useful since the converter can regenerate mipmaps, making sure they always exist (so we don't need any runtime mipmap generation) and using sharper filtering and gamma correction. NVTT requires BGRA input so I'm just using our existing texture loading and transform code. But after some time I noticed that the converted textures were very slightly greener than the originals - it turns out that our S3TC decoder's unpack_to_8 is bogus, but unfortunately not bogus enough that anybody noticed the graphical problems before now :). I think I've fixed that (and added some basic tests), so the output seems pixel-perfect now. The nice thing is that texture-quality bugs shouldn't be a major concern - we can just change the code or compression settings and it'll automatically reconvert everything with no hassle.

Link to comment
Share on other sites

I've committed a hopefully-working version of this now. If people want to check some of the changes to find problems or design errors, please do ;)

The first time you run the game, everything will look grey while it loads and converts the textures. They'll get cached so subsequent loads should be instant. The cache is stored in %appdata%\0ad\cache\ on Windows, and ~/.cache/0ad/ on Linux - you can delete it if you want to free some space (it shouldn't get more than a few dozen megabytes) or watch the exciting texture-conversion process again.

Only SVN users and modders should see this conversion process - players of releases will get given pre-converted textures, at least once I've got around to implementing that (before the next alpha release).

There's some documentation on the wiki for artists/modders about how to use this system.

Link to comment
Share on other sites

GIMP seems to do the same for alpha channels in DDS and PNG and TGA, so the choice doesn't make a difference there. (Looks like it handles channels automatically, or you can use the Layers thing to add a Layer Mask based on the alpha channel and then delete the alpha channel, which lets you easily view and edit the colour/alpha independently.)

It's still important to support Photoshop too - if that SuperPNG plugin works then maybe that's okay, otherwise we could maybe do something like store PNGs in SVN (since they're smaller and should work for most modders) and have a tool to convert to/from TGA for use with Photoshop (assuming it handles TGA better than PNG).

Link to comment
Share on other sites

Yeah, I didn't realise it'd be a problem, but people say "It's frustrating because back around version 5.0 or 5.5 Photoshop would actually save the full channel on PNGs. Adobe changed the way they handled them for subsequent versions of the software." and everything I've seen agrees with that. I've not seen any explanation of why it was changed.

Link to comment
Share on other sites

It already handles both (as long as you disable RLE compression of TGAs), so there's no technical restriction as far as the engine is concerned. It's more of a logistical problem - it's good to stick consistently with one format, it's good to reduce SVN disk space usage, it's good to be compatible with all significant paint programs, etc. PNG seems to match all of those requirements, except for compatibility with Photoshop ;)

(Did you see the link I had to the SuperPNG plugin? If that works then hopefully there wouldn't be any other problems with PNG.)

Link to comment
Share on other sites

I'm not sure how complicated this would be vs. the benefit. I'll let you decide if you haven't considered this yet. I would recommend the automatic compression to different dxt types based on the folders they reside in.

dxt1 - Ground textures (except water - that is something special if I recall correctly (rgb are ordered differently, or something like that))

Could save a little bit of file space because you don't need alpha channels with ground textures

lossless dxt - UI and Skybox

Use the lossless dxt format, or don't convert them to dxt textures (use .png or .tga files). These elements look incredibly bad if you introduce artifacts.

dxt3 or dxt5 - Everything else

Inteligent ar@#$% could select the most appropriate for a particular instance, but if it is all automatic - doesn't matter I suppose.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share


×
×
  • Create New...