Jump to content

matthewlai

Community Newbie
  • Posts

    4
  • Joined

  • Last visited

Everything posted by matthewlai

  1. I don't really see anything in your post that supports the position to not use multithreading? Yes, obviously more efficient computations is better than doing less efficient computations faster, but given the same level of optimisation, a well designed multi-threaded CPU-bound program will almost always be faster (by how much depends on Amdahl's Law). In most architectures the cores have some levels of independent caches, so increasing number of threads also increase the effective memory bandwidth (for things that are in cache, taking into account cache invalidation, etc, etc). Cores also have their own prefetchers. Modern memory systems are quite well optimised for concurrent access. It's definitely not a case of all cores having to go through the same narrow pipe. For another example, if two cores require data from two different memory banks (last level cache miss, on the order of 100 cycles), the memory controller can issue read requests for both, and wait for both at the same time. Most of what you described are basic optimisation concepts that anyone who cares about performance should be familiar with. However, I think there's one big idea that took me much longer to REALLY understand and get onboard with - always profile first. Humans are absolutely terrible at estimating where the performance hotspots are, and if left with our guesses, we will spend all our time optimising things that just don't matter. Now when I program, I never do any non-trivial optimisations ahead of time. Strictly only after profiling. Of course, that doesn't mean I pessimise unnecessarily. I still try to not make unnecessary copies of data, etc. Just nothing that requires spending more time. Just a few notes about the specific things you mentioned: * Manual prefetching helps in theory, but is extremely hard to do it usefully in practice. You have to make sure you prefetch far enough ahead that it makes a difference (if you do it just 10 cycles ahead of when you need it, that's not going to make any difference, and you are paying the instruction decoding cost, etc), but not so far that it gets retired before you need it (in which case you wasted memory bandwidth that maybe could have been used by another thread). Also, in your example of a linked list, if you traverse the list on every frame, the hardware prefetcher is probably already doing it for you, because it would have recognised the data access pattern. I have been working on performance-critical stuff for about 10 years now, and have only seen one instance of manual prefetching that's actually helpful (and only marginally). Many have tried. If I'm trying to optimise a program, this is going to be one of the very last things I look into. Yes, using a vector over a list where possible is a good idea. Mostly because continuous access means each cacheline fetch can pick up more elements, and for a very long vector, DRAM has burst mode that is more efficient. * Virtual functions are fine in reality. The v-table is almost always going to be in L1i cache, so those fetches are essentially free with good instruction scheduling. Also, in many cases, the actual function called can be proven at compile time, and the compiler can de-virtualise it, in which case it's absolutely free. Virtual functions offer a lot of readability and maintainability benefits. I would need very concrete evidence showing that they are a significant bottleneck before taking them out. Again, profiling first is important. I have not encountered a single case where this makes a difference.
  2. Oh I see! I had the wrong mental model of how actors work. I thought each variant chosen in each group is supposed to be self contained and rendered separately. I was able to render a lot of structures that way (because they have both mesh and textures in the same variant), but not props. This makes a lot more sense. Does that mean once all the variants are resolved for an actor, there should always be one mesh, one set of textures, and zero or more props that should be recursively rendered as their own actors, and attached to the parent actor mesh at the specified node?
  3. Hello! I am making a tile-based game based on 0ad assets, and I'm having some trouble rendering some actors. For example, structures/persians/stable.xml has a props/structures/persians/stable_horse_a.xml prop in the first group, but the first group in that file only has a mesh (skeletal/horse_persian.dae), and no textures. The mesh does have UV coordinates so I assume there's a texture that should be applied? What textures would be used here? Also, the second group in stable_horse_a.xml has a bunch of variants in separate files, and quadraped/horse/stable/brown.xml for example has a base texture without a mesh. How does that work? Thanks!
×
×
  • Create New...