So I guess this report will be pretty epic. I've been working all night on XMB file loading and optimization. Mostly to greatly improve loading speeds. However, I digress, here's my report of last week. Week #3 19.08 - 25.08 19.08 1100-1700 (6) - Debugging and bugfixes on megapatch. Huge breakthrough. 20.08 1600-0200 (10) - Debugging shader issues. 21.08 1200-1900 (7) - ShaderProgram variations reduction. ModelRenderer texture bind bug solved! 22.08 2100-0500 (8) - Windows Stacktrace failure debugging. 23.08 1000-1200 (2) - Alpha sorting removed. 25.08 1400-0500 (15) - Fundraiser footage. Megapatch bugfixes. UTF conversion optimization. From the total of 48 hours, most of it went into debugging, but it finally paid off. The patch is now stable on Linux and OSX, which means it's ready for commit after A14 release. At the end of the week I took some extra time to improve UTF conversion performance (since we're doing a lot of it) and also grabbed some footage for the fundraiser. 1. Debugging breakthrough What's the issue?: Well, until recently the patch crashed on Linux and OSX; on Windows the game ran fine. It was a really frustrating issue, since I couldn't debug the crash at all - I could only hope to fix any bugs that changes to the shader definitions systems caused. Funnily enough, the failure was simply due to incorrect hashes of CShaderDefines. End result: We can now deploy the patch after A14 is released and start refining out any bugs that pop up. 2. ShaderProgram Variations What's the issue?: For each rendering ability such as Shadows, Specular, Normals a combination of ShaderDefines is formed. For each unique combination a new shader is compiled. This is very inefficient. When running 0AD in an OpenGL debugger, I noticed that the amount of shader programs generated totaled at around 300. Each shader compilation actually takes a pretty long time during loading, so generating over 300 shaders from just a few sounds like a high crime. The biggest problem is the batch-sorting that is done prior to rendering models - the larger the amount of shaders, the more inefficient rendering becomes due to constant resource binding/unbinding. Batching is also inefficient, resulting in more texture state changes than are actually needed. My solution was to implement a second layer of caching inside CShaderProgram itself and hash any input shaders. This allows me to check if the current source code has already been compiled and if so - retrieve a reference counted handle to the shader program. This is really great and reduced the amount of shader programs from 300 to around 120. What we could do more to improve this situation is to use less shader defines - the smaller the number of variations, the smaller the number of shaders compiled. End result: The annoying load time at the end of the loading bar was reduced by half and is hardly noticeable now. 3. Windows Stacktrace failure What's the issue?: Several error reports on windows fail to generate a proper stacktrace and usually another error occurs while generating the error message. This was actually pretty hard to debug. On VS2008 the issue was somewhat improved with /Oy- flag, which forces usage of frame pointers. On VS2012 generally disabling Full Program Optimization gave improved results. Still, a lot of cases failed and no stacktrace was generated at all. Apparently if the top-level function is inlined, WinDbg.dll is unable to resolve the function reference. On that case the only fix was to change the stacktrace behaviour to simply display all functions and skip any filtering on the callstack. This at least gave some kind of stacktrace, which is better than nothing. End result: Error reports can now be expected to always give a stacktrace on windows. 4. Alpha sorting What's the issue?: A noticeable amount of time during rendering is spent sorting transparent models - improvement in this is essential for better rendering performance. Even though I spent the least amount of time on this issue - it probably had the biggest FPS impact on the renderer. The current renderer distance sorted all transparent models prior to rendering, resulting in some pretty complex batching before rendering. This takes almost half of the rendering time itself and is pretty useless because OpenGL employs a Z-Buffer which, in combination with proper alpha testing gives perfect results. Since 0AD already employs this functionality, all I had to do was remove <sort_by_distance/> and any code related to distance sorting in the modelrenderer. End result: Visually no difference. About 33% gain in performance (depending on amount of trees), 50 fps -> 70 fps. 5. UTF Conversion What's the issue?: There is a lot of string conversion going back and forth in the 0AD engine: UTF8, UTF16, UCS-2, UTF32 strings are all being used and constantly converted from one type to another. My first goal was to reduce the amount of conversions done, but that's a really huge change. The next best thing I could do was streamline the UTF8 conversion code. 1) Added conversion of UTF8 -> UTF16 and UTF16 -> UTF8 for faster SpiderMonkey interaction. 2) Added special case for UCS-2 <-> WCHAR_T on windows, resulting in faster conversion performance on windows. 3) Improved, optimized and streamlined the code to do UTF conversion much faster than before. However, these changes are intended for gradual movement from WCHAR_T strings (UCS-2 on Windows and UTF32 on Linux) to simple UTF8 strings. There is a lot of code that uses WCHAR_T strings, even though there is no real need for it. The only part of code that needs to deal with UCS-2 strings is Windows CreateFileW, which is rarely called. End result: Less string conversions, faster UTF8/UTF16/UTF32 string conversion To end Week #3: I still didn't manage to do any patch reviewing, so I'll /have/ to do it first thing tomorrow (otherwise I'll procrastinate again and work on some awesome module instead). I think it was an excellent week nevertheless - I was able to squash the annoying runtime bugs thanks to everyone on the IRC helping me test it out. Since I finally got my 8GB of RAM, I can dedicate a day for memory performance comparisons. ----------------------------------------- This is my current TaskList: -) Patch review -) Performance Improvement Fancy Graphs -) PSA and PMD to collada converter. -) Migrate to C++11 -) Collada -> PMD cache opt to improve first-time cache loading.