Sunday, September 6, 2015

Hunting of memory corruption

This is the story of memory corruption hunting and what I learned thanks to it.

So the story started when after one of many big changes in component system I needed to re-save all maps and objects templates :] Because I'm lazy bastard some time ago I created processes which loading each of it and re-save. All this after pressing one "magic" button. Great idea and in the same real stress test how object system works.

Sadly after all this time I didn't use it, whole process failed. And in worst possible way: memory corruption. Everybody who have occasion to deal with at leas one not trivial one know that this are not nice bug. Especially in multi-threaded environment:/ In my case normal corruption wasn't enough and sometimes even my callstack was corrupted. Fun :|

So well my hunt begin because I couldn't left this issue without fix.

1st approach: Debugger
The first approach was the most trivial one check what happening when app crashed. Simple and useful but not in this place. The debugger stop when it's already too late. I tried to find some pattern in memory which is corrupted and nothing there :/ I failed.

2nd approach: Enabling all my memory debugging options. 
Yeah in all this years I accumulate some memory debugging tools like: full tracing what memory is allocated, memory guards, validation if memory is not release second time and full allocation/releasing logging. Final result: another defeat :/ 

Non of this tool helped. I only found that my guard sometimes get replaced from "WREGUARD" to "WREGWARD" :/ and it's happening in TiXML node memory (I using XML right now for storing RAW version of map which is use in editor. But let's return to this topic in next approach).

3rd approach: Checks on memory release (Desperation begin somewhere here)
Microsoft _CrtCheckMemory(..) is so nice thing. But well after I done that I can say: I was stupid. XML and full memory check on memory release. Heeeee heeee heee hee he (laugh of crazy person). I started process and started play Uncharted 2 on PS3 and after more than hour later it was still releasing the first map. 

This as bad idea as idea to use XML for project:/ My To-Do task for replacing it with Jason gain higher priority after this.

4th approach: Decoding of callstack from stack memory.
Yep I was so desperate that I learn how to decode callstack from memory for win32 (In the fact it's really easy :D but well. Right now I know how to decode PowerPC, x86 callstack so only x64 left). 

After reading corrupted stack I learned: nothing more than I already know. Releasing of xml again :/

5th approach: Luck
And this one was in the fact funny because I noticed that very often memory which is released is not managed by my memory tools. So after short check I find out sad truth: Allocating memory in one module and releasing it in another is not the best idea :/

Another funny side story:
To be sure that I removed the issue I installed even Valgrind on my Linux computer but it was crashing in initialization of libGlew. First thought: well probably Valgrind do something that glew crash. Because I didn't want to spend too much time into it I just ignored. 

This was mistake. One simple check: run game without Valgrind would show me that the problem was in my changes that I done on Windows not tool itself. But well :/ Day later I figure out this and will soon run full check of editor using Valgrind (This is one of reason why its worth to have Linux build :) )

Summary 
This one issue shown me so much problems with my code. After it I properly secured my code for releasing of memory from other module, guess what ? I found more of similar problems :/ 

But well right now everything what I found is fixed but this is not the end. I decided that after animations it will be good occasion to spend some time on improving my memory system: make it quicker, allow better control over it and add more tools for checking stuff (just in case). 

From things I will do I will for sure switch on explicit use of dlmalloc and switch on memory tags stack:
 MemStack memstack(EMemTag::System); int32* memory = wrNew int32 [32]
Which will coexist with my existing system:
 int32* memory = wrNewEx(EMemType:System) int32 [32];
Do you have some recommendations what you would want to see in your memory system? How you would trace stuff like that? Do you have maybe some nice secret tool that can help? Or maybe some nice trick in your slave that could help me with future issues ?

Greg

No comments:

Post a Comment