RedFox's Content - Page 5 - Wildfire Games Community Forums

3d viewing inside forum

RedFox replied to tuan kuranes's topic in Art Development

This looks really excellent! I think its a great idea.

[Discuss] Memory aka game slowing over time.

RedFox replied to tuan kuranes's topic in Game Development & Technical Discussion

The emplace_back function seems a lot better than the old push_back for objects that are non-trivial / non-POD. Nevertheless all the arguments have to be pushed onto the stack not only once, but twice, (1 for emplace_back, 1 for the type contructor), before being copied for the third time into the actual buffer. For POD types, something like this would be preferred, since it involves 1 initialization and 1 copy: Data d = { 1, 2, 3 }; vec.push_back(d); // argument is passed as 'const Data&' And I still think the move constructor is a hack that tries to fix one of the biggest problems C++ has: temporary non-trivial objects, which is something that should be avoided at all cost. At least now they've introduced a way to escape any heap allocation/deallocation triggering 3 times, which somewhat lessens the impact. Still, the implementation of std::string is non-trivial, and looks like this on VC++ 2012 (28 bytes): class string { iterator_proxy* _Proxy; union { char _Buf[16]; char* _Ptr; }; size_t _Size; size_t _Reserved; }; With the notion that small strings can be contained inside a temporary buffer and longer than 15 char strings will be allocated on the heap. Regarding std::move and move constructors/destructors in general, they are still non trivial, containing dozens of operations for copying and nulling the string objects. C# solves this by just turning every class into reference counted objects (like Microsoft's COM, but with a Garbage Collector touch). In C++ the closest parallel is a shared_ptr<> wrapper, however it's again completely different from C# references: // again from VC++ 2012 STL template<class T> shared_ptr { T* _Ptr; // pointer to object allocated with new _Ref_count_base* _Rep; // pointer to a dynamic object that counts the shared_ptr reference count }; So this just keeps a dynamically allocated pointer to a reference counting object, which is rather inelegant. If you use a lot of shared_ptr<>'s in your project you'll start feeling the overhead. Luckily, or rather surprisingly, Microsoft designed an (in my opinion) elegant solution to the problem: class IUnknown { int refcount; IUnknown() : refcount(1) {} // initial creation has refcount 1 void AddRef() { ++refcount; } // adds to the reference count of the object void Release() { if(!--refcount) delete this; } // decreases refcount, if refcount reaches 0, deletes the object }; class MyObject : public IUnknown; // implement your own object So now you introduce a reference counting overhead for every object you intend to pass by reference. For actual use you will need a reference wrapper. In Microsoft's COM it's a comptr<>: template<class T> struct comptr { T* ptr; // add copy operator overloads to AddRef and Release as needed }; // comptr<MyObject> obj = new MyObject(); // compiles fine because you MyObject inherits IUnknown. Yeah, when I said it's elegant I might have been a bit too generous. I think its more accurate to say it's more elegant than a shared_ptr<>. What it does achieve though is only a single allocation for the object! No matter how many copies of comptr<MyObject> you pass around. For true C++ use you sometimes need objects allocated on the stack (because its fast and your data is sometimes trivial) and other times you need it to be dynamic. So the COM solution will never work for stack allocated objects, it will result in a crash or heap corruption. That's awful. Another solution would be to create a custom reference object that proxies a struct of the object pointer and its reference count: template<class T> refptr { struct proxy { T* ptr; int refcount; }; proxy* p; // proxies should be allocated from a special dynamic memory pool inline refptr() : p(nullptr) {} // default to a null pointer inline refptr(T* ptr) { p = proxyAlloc(); // gets a new object from proxy memory pool p->ptr = ptr; // set the proxy ptr p->refcount = 1; // initial refcount is 1 } inline ~refptr() { release(); // try to release the object } inline void release() { if(p && !--p->refcount) { // decrease reference count delete p->ptr; // delete the actual object proxyFree(p); // return proxy to its memory pool } } inline T* operator->() { return p->ptr; // will cause a nullptr exception if proxy is nullptr } inline operator bool() { return p ? true : false; } // so we can test the refptr in conditionals }; Even though it's not a completely working example (its missing the copy constructors and operators and actually a ton of more operators to make it usable), it still illustrates the idea of creating a very simple reference object (only 4 bytes wide on x86) that can give a proper reference counted object: { Data d1; // allocate a Data object on the stack d1.a = 5; refptr<Data> d2 = new Data(); // allocate on the heap and store as a reference d2->a = 5; refptr<Data> d3 = d2; // share the reference d2 = nullptr; // decreases reference d3->a == 5; d3 = nullptr; // since refcount reaches 0 here, the Data object is deleted } So yes, C++11 provides many ways for manipulating data for us, but it doesn't provide the one true solution that we really need. I think the compiler should be extended to treat the following patterns as special cases: vector<int> GetData() { vector<int> data; data.push_back(10); return data; } { vector<int> list = GetData(); // case 1 - initialization list = GetData(); // case 2 - temporary copy replace } And it should generate altered code for these special cases (regardless of optimization levels): // case 1 - initialization void _init_GetData(vector<int>& out) { out.vector(); // call default ctor out.push_back(10); } // case 2 - temporary copy replace void _replace_GetData(vector<int>& out) { out.~vector(); // call the dtor out.vector(); // call the default ctor out.push_back(10); } { vector<int> list; _init_GetData(list); _replace_GetData(list); } Optimizations like this are only partly achieved with certain compiler flags and they don't always work. I don't see why the compiler can't generate this kind of code. It's true that with this change, the assignment operator will not be called, but isn't this what was actually desired? Even if you replace the code with std::string or even refptr, the end result will still be same. So yeah, rvalue references are a hack for a mistreated feature (operator=) of the C++ language. And I guess we'll have to live with it, unless someone comes around with a clean new language somewhere inbetween C++ and D.

[Discuss] Memory aka game slowing over time.

RedFox replied to tuan kuranes's topic in Game Development & Technical Discussion

That's an excellent point - introducing proper move constructor/operator to the objects will make a huge difference to the current code. Though it would still introduce quite a bit of copying and the temporary object destructors are still called - they just won't have to delete any resources, since the data has been moved already. The following code will still be the most optimal: void GetSomeData(vector<Data>& outData) { outData.push_back(Data()); } { vector<Data> list; // ... GetSomeData(list); } Whereas the following code would introduce a rather large number of operations: vector<Data> GetSomeData() { vector<Data> result; result.push_back(Data()); return result; } { vector<Data> list; // ... list = GetSomeData(); } What happens behind the scenes here: void GetSomeData(vector<Data>& rvalue) { vector<Data> result; // ctor result.push_back(Data()); rvalue.operator=(result); // copy the result result.~vector(); // calls the dtor of the initial vector } { vector<Data> list; vector<Data> rvalue; // compiler generates a temporary for the return result GetSomeData(rvalue); // get the result into the temporary list.operator=(rvalue); // copies the temporary rvalue.~vector(); // calls the dtor of the temporary rvalue } Basically what C++11 introduces is a new type of copy operator/constructor called the 'move operator/constructor': vector(vector&& rvalue) : size(rvalue.size), capacity(rvalue.capacity), buffer(rvalue.buffer) { rvalue.size = 0; rvalue.capacity = 0; rvalue.buffer = nullptr; } vector& operator=(vector&& rvalue) { // we don't have to check for 'this != &rvalue', since rvalues are always just temporary objects if(buffer) delete[] buffer; // make sure any data on this vector is freed size = rvalue.size; capacity = rvalue.capacity; buffer = rvalue.buffer; rvalue.size = 0; rvalue.capacity = 0; rvalue.buffer = nullptr; return *this; } So, instead of calling the standard 'vector& operator=(const vector& other)', the compiler will use 'vector& operator=(vector&& rvalue)' whenever a temporary object is assigned. This is a lot better than copying the data 3 times! But its still bad. Its basically a hack to fix an ugly design problem that C++ has. And this is something we have to live with. The only proper way to deal with this is to just use 'void GetSomeData(vector<Data>& outData)'. It's less intuitive, but at least it's not ugly behind the scenes.

[Discuss] Memory aka game slowing over time.

RedFox replied to tuan kuranes's topic in Game Development & Technical Discussion

This is a good point to start! Maintaining a more detailed list of 'good C++ design patterns' seems like the way to go. It shouldn't be so brief though and should explain these concepts to non-expert C++ programmers. Not everyone has expert knowledge on x86 assembly and how the operating system maps physical memory to virtual memory, nor how the stack and heap exactly work. I haven't run any long-time tests on the pyrogenesis engine, but usually if your program doesn't have any obvious leaks, yet the total paged memory keeps increasing, the answer is always fragmentation. Even though I'm not familiar with linux allocation schemes, Windows Vista/7/8 use the Low Fragmentation Heap (a cluster of memory pools). Even though it sounds promising, it only prevents the adverse effect of malloc trying and failing while finding a proper-sized memory-block. If your dynamic allocations vary too much, it will end up creating a lot of different sized clusters, thus increasing the memory paged by the process anyways. It's a bit too forgiving to think that something like that would magically solve all the problems - the memory management problem is a design issue that is not easy to fix and will always take a lot of time. Nevertheless, sometimes its easier to spend those 2 extra hours of your evening to create a quick memory pool, rather than let others spend hours maintaining that code a few months later. Sounds familiar? In the ComputeShortPath example, an edge consists of 2 FixedVector2D-s, which each consist of 32-bit CFixed objects. Since each edge is 128-bits / 16-bytes in size, the programmer obviously counted on the SSE2 extensions which move unaligned 128-bit memory blocks through the XMM registers in a single op (Edge is a POD type, so G++/VC++ will definitely use MOVUPD [move unaligned packed doubles]). This of course requires compiling with SSE2 flags But 0AD does not compile with any enhanced instruction set flags, so perhaps that wasn't the best decision. Overall, this seems like the perfect place to use a memory pool. If you are lazy, you can just pre-determine the maximum possible pool node count, align it to default page size (4KB) and just allocate one huge block of memory. To note though, any self respecting stack is 1MB in size (that's the default on the Win32 platform anyways), so if you know your dynamic data fits into a reasonable size (64KB?), you can use alloca(size_t) to dynamically allocate on the stack instead. Given that you stated that 6 std::vector<Edge> with 16*16 edges was the possible max (? haven't looked that closely?). It would give us a required reserve of sizeof Edge(16) * 256 * 6 => ~24.5KB. This is actually so small, its a bit evil to fragment the heap with vectors like that. Its easier to just feed 64KB of stack memory into a generic pool and never look back again. No fragmentation: check. No heap alloc: check. No mutex lock on process heap: check. Triple win, people? Yeah, alloca wins huge in small scenarios like this. I think in general terms (aside from some obvious nomenclature errors regarding stack/heap) his assessment was pretty straight on. If you design an entire program with the idea that 'a single vector won't fragment memory', then you'll have a program full of a lot of vectors with a lot of fragmented memory. Some profiler data is always good to look at, but proper analysis can take hours or days - it's not that simple after all. In this case std::vector is a classic example of memory fragmentation, since the initial allocation size depends on the implementation and thus varies between platforms. Looking at VC++ STL, its std::vector implementation is one of the worst, since it always allocates by sizeof(T) and always expands by +50%. Because of that, something like this usually happens: SIZE CAPACITY BYTES 0 0 0 1 1 4 2 2 8 3 3 12 4 4 16 5 6 24 6 6 24 7 9 36 8 9 36 9 9 36 10 13 52 And you wonder where the fragmentation comes from? There are 7 reallocations required in only 10 elements added. And C++ isn't exactly known for its fast reallocation, since it requires: 1) new buffer, 2) copy data, 3) delete old buffer. If you used a custom vector implementation that always makes initial allocations in, say, 32-bytes, with each successive allocation being +50% aligned to 32-bytes, you would get something like this: SIZE CAPACITY BYTES 0 0 0 1 8 32 2 8 32 3 8 32 4 8 32 5 8 32 6 8 32 7 8 32 8 8 32 9 16 64 10 16 64 This scales better to heap memory management - if we truncate a 32 byte block and allocate a 64 byte block, another instance of our custom vector will definitely pick the 32 byte block we just used - it's a perfect fit after all. Yeah, sorry for going off-topic so suddenly. I'm just used to programming very performance critical applications and most of the time custom implementations such as this are needed to fit to the current scenario. This scenario favors speed over memory, so allocating a few bytes ahead is not a problem.

Profiles

Forums

Everything posted by RedFox