Jump to content

Leaderboard

Popular Content

Showing content with the highest reputation on 2013-05-13 in all areas

  1. This is a good point to start! Maintaining a more detailed list of 'good C++ design patterns' seems like the way to go. It shouldn't be so brief though and should explain these concepts to non-expert C++ programmers. Not everyone has expert knowledge on x86 assembly and how the operating system maps physical memory to virtual memory, nor how the stack and heap exactly work. I haven't run any long-time tests on the pyrogenesis engine, but usually if your program doesn't have any obvious leaks, yet the total paged memory keeps increasing, the answer is always fragmentation. Even though I'm not familiar with linux allocation schemes, Windows Vista/7/8 use the Low Fragmentation Heap (a cluster of memory pools). Even though it sounds promising, it only prevents the adverse effect of malloc trying and failing while finding a proper-sized memory-block. If your dynamic allocations vary too much, it will end up creating a lot of different sized clusters, thus increasing the memory paged by the process anyways. It's a bit too forgiving to think that something like that would magically solve all the problems - the memory management problem is a design issue that is not easy to fix and will always take a lot of time. Nevertheless, sometimes its easier to spend those 2 extra hours of your evening to create a quick memory pool, rather than let others spend hours maintaining that code a few months later. Sounds familiar? In the ComputeShortPath example, an edge consists of 2 FixedVector2D-s, which each consist of 32-bit CFixed objects. Since each edge is 128-bits / 16-bytes in size, the programmer obviously counted on the SSE2 extensions which move unaligned 128-bit memory blocks through the XMM registers in a single op (Edge is a POD type, so G++/VC++ will definitely use MOVUPD [move unaligned packed doubles]). This of course requires compiling with SSE2 flags But 0AD does not compile with any enhanced instruction set flags, so perhaps that wasn't the best decision. Overall, this seems like the perfect place to use a memory pool. If you are lazy, you can just pre-determine the maximum possible pool node count, align it to default page size (4KB) and just allocate one huge block of memory. To note though, any self respecting stack is 1MB in size (that's the default on the Win32 platform anyways), so if you know your dynamic data fits into a reasonable size (64KB?), you can use alloca(size_t) to dynamically allocate on the stack instead. Given that you stated that 6 std::vector<Edge> with 16*16 edges was the possible max (? haven't looked that closely?). It would give us a required reserve of sizeof Edge(16) * 256 * 6 => ~24.5KB. This is actually so small, its a bit evil to fragment the heap with vectors like that. Its easier to just feed 64KB of stack memory into a generic pool and never look back again. No fragmentation: check. No heap alloc: check. No mutex lock on process heap: check. Triple win, people? Yeah, alloca wins huge in small scenarios like this. I think in general terms (aside from some obvious nomenclature errors regarding stack/heap) his assessment was pretty straight on. If you design an entire program with the idea that 'a single vector won't fragment memory', then you'll have a program full of a lot of vectors with a lot of fragmented memory. Some profiler data is always good to look at, but proper analysis can take hours or days - it's not that simple after all. In this case std::vector is a classic example of memory fragmentation, since the initial allocation size depends on the implementation and thus varies between platforms. Looking at VC++ STL, its std::vector implementation is one of the worst, since it always allocates by sizeof(T) and always expands by +50%. Because of that, something like this usually happens: SIZE CAPACITY BYTES 0 0 0 1 1 4 2 2 8 3 3 12 4 4 16 5 6 24 6 6 24 7 9 36 8 9 36 9 9 36 10 13 52 And you wonder where the fragmentation comes from? There are 7 reallocations required in only 10 elements added. And C++ isn't exactly known for its fast reallocation, since it requires: 1) new buffer, 2) copy data, 3) delete old buffer. If you used a custom vector implementation that always makes initial allocations in, say, 32-bytes, with each successive allocation being +50% aligned to 32-bytes, you would get something like this: SIZE CAPACITY BYTES 0 0 0 1 8 32 2 8 32 3 8 32 4 8 32 5 8 32 6 8 32 7 8 32 8 8 32 9 16 64 10 16 64 This scales better to heap memory management - if we truncate a 32 byte block and allocate a 64 byte block, another instance of our custom vector will definitely pick the 32 byte block we just used - it's a perfect fit after all. Yeah, sorry for going off-topic so suddenly. I'm just used to programming very performance critical applications and most of the time custom implementations such as this are needed to fit to the current scenario. This scenario favors speed over memory, so allocating a few bytes ahead is not a problem.
    1 point
×
×
  • Create New...