A failed attempt at identifying a developer using data in the PE file format

01 Jan 2014

Summary

This was a failed idea, but I feel it's important document your failures to save others time. I tried to see if .exe files left any evidence of who the creator was. Specifically, I wanted to know if the GUID in the PDB section identified the machine the binary was compiled on. It does not, as best as I am able to determine. This post hopefully saves others the trouble and lets them fact check me.

Introduction

Back in October, someone posted how they were able to compile a matching binary for TrueCrypt for Windows. What interested me most was the note in there that each time a binary is compiled, it is given a unique GUID that changes for every compilation, in what is known as the RSDS section, which is basically a reference to PDB debugging info.

I knew every PE file (.exe file) for Windows ends up being different due to compilation/link time stamps, but I hadn't thought about what else might be in there, and this looked promising. Specifically, I was reminded of how the author of the Melissa virus was discovered due to a GUID in the .doc file for the virus matching the GUID used in other files he was known to have created. In that case, the GUID was the same, but GUIDs are dangerous for those practicing OPSEC. Specifically, Type 1 GUID's are "randomly" generated numbers based on the MAC address of the system and timestamp, so if the GUID in the PE file happened to be a Type 1, then you'd be able to know the MAC address for the developer's system. Perhaps something similar happened for these GUID's in PE files.

See RFC 4122 for more info on GUID formats.

Good OPSEC tradecraft for developers entails compiling binaries without the debug references. For example, the sloppy developer that linked together the pieces of Stuxnet forgot to do this, resulting in the full path to the project folder ("b:\myrtus\src\objfre_w2k_x86\i386\guava.pdb") being written to the binary and this naming scheme correlated a few pieces of that puzzle together. Sometimes the developer uses an identifying username for their dev system and this ends up in that path string.

Only a small subset of the developers of the world need to practice good OPSEC. It's only those developers that wish to remain anonymous, such as malware developers, authors of tools to help citizens in naughty countries, and a few others.

So anyway, my hypothesis was that this GUID may be used to identify developers in some way. Given that a security company can't do much with a MAC address to identify a system in the world, I was more interested in if a nation state with greater resources could do such a task.

Every once in a while you have to question your assumptions in life and think "What if what I thought was true was false, and how can I confirm that?" (that the UUID might identify a system) and "What if an adversary had far greater resources than I can imagine?" (the ability to find a system given a MAC address). Also, every once in a while you should just reverse some random thing to see if you can figure out anything interesting about it, and if nothing else, it keeps your skills sharp.

What I'm looking for

In Visual Studio 2005 and up, all builds of binaries include this reference to the PDB file and a random GUID within the RSDS section. Even Release builds include this. You can change a project setting though to disable this.

Reversing

I used Visual Studio 2013, and created a simple project. I turned on procmon, and built the project and recovered the build command. I then brought up a command-line, ran vcvars.bat to create a Visual Studio environment for this command-line terminal, and ran that same build command and ensured I could build binaries without using Visual Studio, as this just simplifies things for me. In this case, I had a project called recursive at C:\Users\user\Documents\Visual Studio 2013\Projects\recursive, and after running "C:\Program Files\Microsoft Visual Studio 12.0\VC\vcvarsall.bat", I was able to build the project using the command:

"C:\Program Files\Microsoft Visual Studio 12.0\VC\bin\link.exe" /ERRORREPORT:QUEUE /OUT:"C:\Users\user\Documents\Visual Studio 2013\Projects\recursive\Release\recursive.exe" /INCREMENTAL:NO /NOLOGO kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /MANIFEST /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /manifest:embed /DEBUG /PDB:"C:\Users\user\Documents\Visual Studio 2013\Projects\recursive\Release\recursive.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:"C:\Users\user\Documents\Visual Studio 2013\Projects\recursive\Release\recursive.lib" /MACHINE:X86 /SAFESEH recursive\Release\recursive.obj recursive\Release\stdafx.obj

Using windbg, I attached to cmd.exe and set a breakpoint on CreateProcessW (I also set one on CreateProcessA but only the wide-char version ever gets used). Then I ran my link command. I stepped out of the CreateProcessW call and attached to the link process in a separate windbg process.

By looking at the different functions (you can do this with IDA, or in Windbg using x link!) and setting breakpoints, and looking in IDA where the string "RSDS" is being used, I managed to figure out the following:

  • The string "RSDS" is written by IMAGE::WriteDebugInfo but the GUID has already been generated at some earlier point.
  • IMAGE::BuildImage calls IMAGE::Pass2 which calls IMAGE::WriteDebugInfo
  • The GUID ends up at ecx+22c, where ecx maintains some sort of structure throughout the entire link process.
  • By setting a break on any writes to the address for ecx+22c, I figured out the GUID gets written by link!DBG_QuerySignature2
  • The GUID is retrieved by making an RPC call into mspdbsrv.exe in the function mspdbcore!PDB1::QuerySignature2, but the GUID already has been generated somewhere in mspdbsrv.exe by the time this is called.
  • I set break-point on mspdbcore!PDB1::PDB1, expecting this to an important constructor for my needs, and then set a break-on-write on where I expected the GUID to end up, based on my super sloppy idea of looking at the size of the memory allocations for one of size 0x26000 which I noticed was where the GUID ended up, and set a break-on-write on the offset within that memory region that I expected the GUID to be written.
  • That idea worked, and my break-point hit within the function RPCRT4!rc4 which is applying the RC4 encryption algorithm from within RPCRT4!GenerateRandomNumber, which is in RPCRT4!UuidCreate, which is exported by that DLL (RPCRT4.dll), and I'm kicking myself for not reading the imports/exports of the loaded modules earlier, because that would have saved me TONS of time on this.

Identifying a system with UuidCreate?

According to the MSDN, UuidCreate "generates a UUID that cannot be traced to the ethernet address of the computer on which it was generated. It also cannot be associated with other UUIDs created on the same computer."

However, doing some more research, led to this page, which claims that UuidCreate worked differently on Windows 95/98/NT4! However, it's probably very unlikely/impossible that someone is using Visual Studio 2005 or a more recent version on Windows 98 or less.

I also saw ReactOS had coded this function reactos/dll/win32/rpcrt4/rpcrt4_main.c to call RtlGenRandom, but this isn't what was happening in the code. UuidCreate calls an internal function called GenerateRandomNumber which does have the ability to call RtlGenRandom (referred to in IDA and online in some places as SystemFunction036), but this call does not get exercised. Instead there is a call to rc4_safe_select and then rc4_safe.

If you know Russian, or can stumble through Google's translation, you can read http://www.gotdotnet.ru/blogs/denish/1965/, which has an amazingly accurate reversing of the UuidCreate and cryptanalysis of it, but it seems to require they dump the memory of the process and have multiple copies of the UUID, so I've decided that given that we will only have a single UUID (and obviously no memory dump of the developer's mspdbsrv.exe process) that I'm not going to be able to figure out any way to identify the developer's system.

Conclusion

I don't believe the UUID in the RSDS/PDB section of a PE file can be used to identify the developer for that binary. However, due to the file path to the .pdb file existing there (which identifies the project name and often the username), a developer practicing OPSEC should set their linker settings to avoid having the RSDS section in their binary.