Wow! This is an amazing tool! Plus rep, man! You're now my personal hero, really, for finding this awesome tool. First test: Pyrogenesis rev #13491, Release, VC++ 2012 Only Main Menu (lets start it slow) The first thing that makes leak detection really hard is the GC due to the constant memory increase and free cycles. Next stop is to analyze the first heap, which is the mozilla js engine heap. A note about the fragmentation graph: Red blocks represent allocated nodes, white blocks represent free nodes AND memory node headers. I don't know why they didn't exclude those headers, since it makes finding frags very hard. The slightly larger white blocks between red blocks are fragmentations. The js heap could be described by a very large amount of small allocs that are luckily rounded off to 32byte alignments. This greatly reduces fragmentation, though there still is quite a bit of it. What should be considered again is that this is just the 'quiet' main menu. Next is the CRT heap, the heap used by C, C++ and pretty much every other library. This is the most important heap of the program. What could describe this heap is: 1) Lots of tiny allocations - This is bad for malloc/free speed since there is a lot of coalescing going on. 2) Fragmentation - This is bad for malloc speed since these nodes are in the freed list, meaning malloc has to iterate through them every single time, making it gradually slower. In this case the smaller nodes are the main culprits - the bigger blocks can be broken down and used, but the small ones stay in the freed list. Now if we look at the distribution of allocations during the entire long run of the main menu, we can notice that most allocations are aligned to neat 8-byte boundaries, which makes them easy to manage. However, the sheer volume of same-sized allocations is staggering. There are two ways to increase general performance: 1) Localized usage of custom pool allocators - This is hard to achieve across the entire program, but Arena allocators are nothing new. And it's fast. 2) Use jemalloc - This is basically a collection of thread local Arena allocators sorted into bins and is superior to dlmalloc (http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf). It's probably the fastest general purpose solution you can think of and it has seen years of debugging. Writing a custom pool allocator is surprisingly easy, BUT jemalloc uses VirtualAlloc / mmap to expand its pools by mapping more virtual addresses to the end of the memory block, which is the most efficient way out there. Of course, to get the speed-up of releasing a whole bunch of nodes in a single go, you'd still need to use a memory pool. I'll post a sample of memory allocation during actual gameplay later.