Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 2013-09-02 in all areas

  1. This is my activity status report thread for August 2013. There's a pretty large gap with the previous update and I really started working once all the legal issues were finally taken care of. Subsequently, the progress report starts from the middle of the week and isn't anything really impressive. Week #2 12.08 - 18.08 14.08 1900-0600 (11) - Debugging on Ubuntu: OpenGL and GLSL. TextRenderer drawbatch optimization. 15.08 1600-0200 (10) - Debugging on Ubuntu: no success. TerrainRenderer optimization. 16.08 1300-0500 (16) - TerrainRenderer optimization. Removed FFP. ShaderProgram optimization. 17.08 0000-0600 (6) - Debugging, first breakthroughs. From the total of 43 hours, most of the time went into debugging, though I was able to squeeze in some improvements that seemed to make enough of a difference. I think the most disappointing aspect is the ModelRenderer which is pretty much fully back to its original form - solely because of debugging the crash on Linux. Hopefully I can get around to changing it back - to get the improvements in ModelRenderer that it previously had. 1. Debugging on Ubuntu What's the issue?: The new patch causes a mysterious crash on Linux and OSX I decided to pick up a copy of Ubuntu and do some debugging. Even though I got some relevant data out of this, it wasn't a success - the game ran fine on my x86 Ubuntu. So the problem is most likely related to some size_t changes between x86/x64 - Josh is running x64 Ubuntu and he experiences the crash. This is actually a relevant breakthrough and has pointed me towards the possible causes. I expect to finally find the reason in the coming week. 2. TextRenderer DrawCalls What are Text DrawCalls?: Text DrawCalls are clumps of text that contain the text color, style and font. I didn't really intend to do anything about TextRenderer, but while debugging on Linux, I noticed that OpenGL keeps rendering text one word at a time. This is an extremely inefficient way to do it, so I improved the DrawCall batching to clump similar lines of text together. DrawCall batches before: { "This ", "is ", "an ", "example." } DrawCall batches after: { "This is an example." } So depending on the actual amount of text, the rendering speedup varies. It noticeably raised the FPS on text heavy pages. I don't know how much justification can be done for optimizing text rendering - but one thing is certain: We will have to migrate to something better like FreeType fonts and true OpenGL VertexBuffered rendering. This would not only reduce the amount of code we have, but it would also speed the engine up. We could remove a lot of redundant/duplicate code and rely solely on graphics card based buffers - which is the modern way to program in OpenGL. For now, I'll leave it be. 3. TerrainRenderer What is TerrainRenderer?: TerrainRenderer obviously renders the in-game terrain. It works in 3 stages 1) base tiles 2) blends between tiles 3) decals on the terrain. What are Patches?: Patches are 16x16 tiles of the terrain. This is the main unit used in frustrum culling and rendering. So this is actually a pretty important part of the engine. And not that surprisingly, it's also quite difficult to optimize. The previous implementation used a memory pool to manage the implementation and utilize a rather complex 3-dimensional mapped sorting algorithm. Since I was already hard pressed for time, I couldn't manage a complete redo of the rendering algorithm itself, but I was able to improve the pre-sorting and batching of Terrain Patches. Instead of relying on a memory pool and some complex usage of 3 dimensions of std::map, I wrote a simple and very straightforward structure that does the bare minimum and takes into account the hardcoded limits of number of patches and number of blends per patch. Since Terrain rendering actually takes a lot of the renderer time, this change was pretty noticeable. In both Debug and Release builds I experienced roughly 33% performance improvement. If the batching and sorting took about half of Terrain rendering time, now the sorting is rather insignificant compared to actual rendering. I'll use timing data from Profiler2 for comparison later. Before: --todo-- After: --todo-- 4. Removed FFP What is FFP?: The Fixed Function Pipeline (or FFP) was a module of 0AD that emulated shader support. It was a really nasty hack to support ancient PC's with no GPU's or no Shader support. I think it's best to say this is the biggest change of this week. It took a lof careful editing of the code to make sure it works properly. Patience paid off and I was abled to remove a lot of complexity from the whole shader pipeline. It's difficult to measure the performance improvement this gives, but it's safe to say that it's actually quite negligble. The main gain is that we have a lot less code to maintain. Previously quite a lot of optimizations were out of the question due to FFP being in the way. Now that it's removed, we will be able to slowly move on to a cleaner, more maintainable and of course, faster Shader system. 5. ShaderProgram optimizations What is ShaderProgram?: This is an internal class that abstracts away GPU's programmable shaders and automatically prepares the OpenGL shader program for rendering. This was actually a really tiny change in the way that OpenGL shader uniform and attribute bindings are stored, but it's necessary to make way for a more efficient binding system in the future. I intend to move away from the current string based binding lookups and replace it completely with constant values. There are two ways we could go about this: #1 (less preferred): Preassign binding locations for the shader input stage. For example, attribute location 0 will always be for Vertex data. Attribute location 1 always for UV0 set, etc. This is somewhat tedious, since you'll have to explicitly declare an attribute location in the shader: layout (location = 0) vec3 a_vertex; // our vertex attribute in the shader Its name can be anything, really. All we care about is binding location 0 in the layout. Vertex data would always be sent to that location (of course, only if there is an attribute with that binding location, otherwise vertex data would not be uploaded). #2 (preferred): Variable names in the shader program have explicit meaning behind them. For example, a_vertex would always be for vertex data. If you fail to declare a variable with the name a_vertex, your shader won't receive any vertexdata. This is somewhat perfect for shader development - we enforce a consistent variable naming this way and we can remove a lot of superfluous data in shader xml's (probably even negate the need for shader xml's for non-effects). In the shader it would look nice and clean: vec3 a_vertex; // our vertex attribute in the shader Having explored the two possible ways to go about it, it's pretty much obvious that #2 would be the way to go. This would allow us to seriously streamline both shader compilation and shader pipeline and attribute / uniform binding during shader's input layout stage. The most obvious reason why I would go this way, is because a very large number of shader variable names have already been hardcoded into the engine by the previous developers. Since we probably won't be redesigning all shader files, the #2 option would leave shader files as they are (with some changes to variable names). Here is the list of currently hardcoded shader variables and definitions: It is obvious that the current shader system is far from anything truly moddable. However - if we document all the "hardcoded" uniform names, such as u_TextureTransform, other people can program their own shaders without much hassle. We can also finally throw away ShaderDefStr which is a boon for performance and resolve all the definition names during compile time. We would have something like this: enum ShaderAttributes { a_Vertex = 0, // vertex position attribute a_Color, // vertex color attribute }; static std::string mapping[] = { "a_Vertex", "a_Color", }; int GetShaderVariableID(const std::string& var) // for shader compilation stage { for (int i = 0; i < ARRAY_SIZE(mapping); ++i) { if (var == mapping[i]) return i; } return -1; // mapping does not exist for this variable } const std::string& GetShaderString(int id) // for shader compilation stage { return mapping[id]; } This naive implementation would map any incoming strings from the shader file to internal indices that match the appropriate enums like ShaderAttributes enum. Since the actual number of variables isn't that big, we can get away with a simple loop. Due to effective CPU cache, simple loops are always faster than using std::map. I'll stop my tedious explanation of "what's to change" and leave it here. 6. GLSL Compilation optimization What is GLSL?: Currently we support 2 types of shaders: 1) Old ARB shaders that are fast; 2) New GLSL shaders that provide all sorts of fancy effects. All modern cards (since 2004?) support GLSL - the only issues is our own unoptimized GLSL shader code which also has some bugs on certain systems with certain drivers. However, migrating completely to GLSL is pretty much the only way to go. ARB shaders are completely unmaintained and obviously enough, GLSL is much easier to write than ARB assembly. GLSL also supports a large variety of extra features such as C-like loops, control statements, macro definitions, functions. It pretty much looks like C without its standard library. However, if we ever wish to support Android systems, we will need solid GLSL support. With all this in mind, I changed explicitly how the current GLSL compilation preprocessing is done - leaving most of the work to the driver (even if that's more inefficient) by sending 3 separate sources with: 1) GLSL version - defaults to #version 120 (OpenGL 2.1, minimum required to use GLSL with our shaders right now) 2) Shader defines - All #defines set for this shader compilation stage 3) Shader source - The actual shader source The OpenGL driver will append all 3 sources into a single source and will take care of the preprocessing. This greatly reduces code on our side and also allows to reduce complexicity and overhead of the ShaderDefines structure. The changes I've made lay some groundwork for future changes on the GLSL shader system. To end Week #2: Currently I'm mostly working on debugging the patch to get it working on linux. As soon as A14 is released, I'd like to commit this large patch to avoid any further SVN conflicts along the road. The patch is gigantic already. You can check it out here: http://trac.wildfire...com/ticket/1995 No statistics this time around, though the numbers are obviously a lot higher than before ----------------------------------------- This is my current TaskList: -) Patch review -) Megapatch Debugging -) ModelRenderer WIP -) Performance Improvement Fancy Graphs -) PSA and PMD to collada converter. -) Migrate to C++11 -) Collada -> PMD cache opt to improve first-time cache loading.
    1 point
  2. I'm pretty sure this is always the same old issue.
    1 point
  3. it could be that you the player, representing the government of the faction you're playing as, are more correctly endorsing a particular deity, the act of which then inspires the people of that civilization to do what they do better (religious fervor, really). for example:Player: We the Macedonians live by the sea! We intimately associate ourselves with the ocean! Therefore, we shall venerate Poseidon, god of the seas! Sailor: He's right! Poseidon must be watching over us! Step lively, men! We have nothing to fear out on the sea with Lord Poseidon watching over us!
    1 point
  4. For the Horse, yes, I have ideas to experiment, it's going to be rather fun.
    1 point
  5. Yes it is fairly strong for a beginner, but I can assure you it doesn't spawn unit. Check A14 again, the difficulty settings should be better (ie easy should be easier) and there will be a new sandbox mode where the AI won't attack.
    1 point
  6. The Tomb of Cyrus the Great would be a good Atlas building:
    1 point
  7. I definitely think the Gate of All Nations should be the Persian Wonder. We'll keep the Hanging Gardens of Babylon for either Atlas purposes or some kind of "capture the monument" gametype. http://www.persepoli...all_nations.htm Stairs: Front Gate: Back Gate:
    1 point
  8. The following profiling is done on 2 separate builds of pyrogenesis in release mode. Both have exactly the same optimizations applied and are built on VC++2012. I'll refer to these versions as: 1) A14 Release - The SVN version of A14, r13791 2) A15 Dev - The megapatch on r13791 First we'll test everything on a "visual" glance. This means I don't use any profiling tools, we only monitor the FPS and "how it feels". Both of these tests will be run with cache fully loaded on Cycladic Archipelago 6. Once that is done, we can compare memory peak usage, memory allocation histograms and loading times in general. The testing system is: Windows 7 x64, Intel i7-720QM 1.6GHz, 8GB DDR3, Radeon HD 5650 1GB, Force GS 240GB SSD Game settings: Most fancy options set to high, Postprocessing disabled, Windowed 1. First Glance @ 1280x720 -) A14 Release This is the version that will be packaged and tested before release in the next couple of days. We've been working hard on optimizations, but most of these never made it to A14. This will give us a fair comparison on how big a performance gain we'll be looking at. The menu is a good place to test the core speed of the engine. Very fast engines usually get over 1000fps. A14 gets around ~480 fps, which is not that bad at all considering we run a very complex scripting engine behind the scenes. To further test general game speed, lets enter Match Setup chatroom. At first it starts pretty strong at ~300 fps: But once more and more text piles up, the FPS drops to a meager ~50-60fps !! This is because text rendering is still extremely inefficient in A14. -------------- Now let's load Cycladic Archipelago 6. It's very hard to profile loading times, because I have a 550mb/s SSD. The loading was fast around 6 seconds, though it stuck around 100% for half of that. The last 100% is where all the shaders get loaded. I get a fairly steady ~46 fps in the initial screen. Zooming in, the FPS obviously increases to ~58, because there is less stuff to render. Once we zoom out with a revealed map, the fps drops to ~40. ------------------- -) A14 Release summary: The chatroom showed how big a bottleneck current GUI can be; it's not very efficient. With a revealed map I get 40 fps, which is a bit low, considering my system can play Mass Effect 3 in 1080p with the same fps. -) A15 Dev This one has about 2 months worth of optimizations put into it. I used to think that I would achieve more with such a long period of time, but despite my previous experience, working on pyrogenesis has been different. Mostly because it's cross-platform, thus restricting many optimization options available to the programmer. Secondly because code that worked and ran fine on Windows, often didn't work at all on Linux. This meant a few weeks of coding was lost and had to be reverted. The patch adds 7376 lines and removes 5507 lines of code. It has also gained a nickname "megapatch", due to how big the SVN .patch file is (~1mb). The menu in the patched version runs at ~630 fps, so at first glance at least something appears to have improved. Now lets check how Match Setup chatroom fares on A15 Dev. About ~300 fps just like before. Looks like there's some other bottleneck in the code, but then again 300 fps is more than enough. What happens if we spam a few hundred lines of text at it? Only a slight drop to ~280 fps, which is a lot better than before. It means long lobby times won't hurt the game in A15. -------------- Now let's load Cycladic Archipelago 6. The loading is slightly faster, seems like 4 seconds. Again half of that is spent at 100%. However, this time it's faster because A15 Dev optimizes shader compilation, reducing the amount of shaders compiled from ~300 to ~130. The initial look shows us ~61 fps, which is roughly +33% faster than A14. It's far less of an improvement than expected though. I'm slightly dismayed at that. If we zoom in, we see a similar improvement ratio of +25% at ~73fps: And with reveal map and zoomed out we get ~51 fps, which is about +27%. -------------- -) A15 Dev Summary: I'm a bit disappointed. After all the optimizations, I expected much better results. However, it's nice to see that textrenderer optimizations paid off. The loading time of 4s is already fast enough for me, so I can't complain. Also, the general improvement of +~25% fps is enough to make the game much more playable. I think the best improvement is the new input system - it's much smoother than the previous one, so it just "feels" faster, even though it's not very much so. This is the end for first glance, which is part 1 of the profiling session. Next part will show some memory usage data.
    1 point
  9. Hi all, I've spend the day improving a lot my Stonehenge model, by making a more realistically scaled texture (still 1024px wide) and by making the profile of each stone random as it should be for a megalithic structure. It was a lot of work to get this all looking real, but I'm, well, very happy with the way it looks now. And as always, I've had some fun with warm light atmospheres. For the next post, I rescale the model to the size of the wonders. For what comes after that, I'm going to need some help, how to test it into the game and how to deal with the texture skin file, shadow map, etc.
    1 point
×
×
  • Create New...