”a condition at an interface under which more input can be placed into a buffer or data holding area than the capacity allocated, overwriting other information
attackers exploit such a condition to crash a system or to insert specially crafted code that allows them to gain control of the system
Buffer overflow happens because when a process attempts to store data beyond the limits of a fixed-size buffer, it overwrites adjacent memory locations that could hold variables, parameters or program control flow data.
buffers can be located on the stack, heap, or data section of a process
example
vulnerable code:
int main(int argc, char *argv[]) { int valid = FALSE; char str1[8]; char str2[8]; next_tag(str1); // function call to populate str1 with a known value to match gets(str2); //unsafe function to read user input into a fixed-size buffer if (strncmp(str1, str2, 8) == 0) valid = TRUE; printf("buffer1: str1(%s), str2(%s), valid(%d)\n", str1, str2, valid);}
(assuming that str1 and str2 are adjacent)
the buffers are of fixed size, and the function gets() reads input until it encounters a newline character or end-of-file, therefore performing no bound checking
if the user provides an input string longer than 8 bytes, the excess data overflows str2 and overwrites adjacent variables on the stack (e.g. str1, the return address of the function)
execution:
## valid$cc -g -o buffer1 buffer1.c$ ./buffer1STARTbuffer1: str1(START), str2(START), valid(1)## 14byte-string, corrupts str1$ ./buffer1EVILINPUTVALUEbuffer1: str1(TVALUE), str2(EVILINPUTVALUE), valid(0)$ ./buffer1## targeted overflow: input is 16bytes long, and the first half matches the second half. the latter half will overwrite the initial value of str1, changing the match parameter and forcibly making the condition trueBADINPUTBADINPUTbuffer1: str1(BADINPUT), str2(BADINPUTBADINPUT), valid(1)
printf() reads str1 and str2 until it encounters a NULL termination character
(if gets() were a function made to read a user’s logged password, for example, with this buffer overflow exploit we could log into an account without knowing the password)
attacks
To exploit a buffer overflow, an attacker needs:
to identify a (buffer overflow) vulnerability in some program that can be triggered with data under their control
to understand how the buffer is stored in memory and determine potential for corruption
Vulnerable programs can be identified by inspecting the program source code, tracing the execution as they process oversized input, or using tools such as fuzzing (a process where random data is passed to an application in the hopes that an anomaly will be detected) to automatically identify potentially vulnerable programs.
stack buffer overflows (stack smashing)
Happen when the target buffer is on the stack.
stack frame
section of the computer’s call stack that holds data for a single function call, including its arguments, local variables, and the return address to know where to resume execution after the function finishes.
void hello(char *tag){ char inp[16]; // Fixed-size buffer of 16 bytes printf("Enter value for %s: ", tag); gets(inp); // VULNERABILITY: No bounds checking printf("Hello your %s is %s\n", tag, inp);}
execution:
$ cc -g -o buffer2 buffer2.c$ ./buffer2Enter value for name: Bill and LawrieHello your name is Bill and Lawrie## the input overflows and continues writing across the stack frame, until it hits and overwrites the return address (and potentially other data). this causes the sex fault$ ./buffer2Enter value for name: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXHello your name is XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXSegmentation fault (core dumped)## the perl script takes a string of hexadecimal bytes, covertes them into binary data and pipes that binary data directly as the input of ./buffer2## notice that the program calls hello() twice, and the second time is completely unintended: the attackers have injected a return address that points to a memory location which causes the program to jump back to the start of the `hello()` function$ perl -e 'print pack("H*", "41424344454647485152535455565758616263646566676808fcffbf948304080a4e4e4e4e4e0a")' | ./buffer2Enter value for name: Hello your Re?pyyJuEa is ABCDEFGHQRSTUVWXYZabcdefguyuyEnter value for Kyyu:Hello your Kyyu is NNNNSegmentation fault (core dumped)
the attacker’s attempt (if any) to execute their payload ultimately failed, as the execution flow eventually hit an invalid memory address or attempted an illegal operation
shellcode
Shellcode is code supplied by the attacker (often saved in the buffer they want to overflow). Traditionally, it transfers control to a shell.
Shell code is often machine code (specific to the processor and OS) - to create this type of shell code, good assembly skills are needed. More recently, a number of sites and tools have been developed to automate the process.
it must be:
self-contained: it cannot rely on external shared libraries or system files.
position independent code (PIC): it must be able to run corectly no matter where it is located in the process’s memory space.
since the stack address can shift, the shellocode needs to calculate its own location at runtime
no null bytes (\x00): the gets() function stops reading input when it encounters a null byte, therefore shellcode cannot contain any
shellcode functions can do many things:
launch a remote shell when an attacker connects to it
create a reverse shell that connects back to the hacker
use local exploits that establish a shell
fulsh firewall rules the currently block other attacks
break out of a chroot environment, giving full access to the system
example of UNIX sellcode
the shellcode executes the /bin/sh shell
int main(int argc, char *argv[]) {// sets the pathchar *sh = "/bin/sh";// creates an array of arguments that terminates with a NULL pointerchar *args[2];args[0] = sh;args[1] = NULL;// replaces the current running process with the new process (sh)execve(sh, args, NULL);}
buffer overflow defenses
There are two types of defenses: compile-time and run-time.
compile-time defenses
Using a modern high-level language, which is not vulnerable to buffer overflow attacks.
the compiler enforces range checks and permissible operations on variables
The main disadvantages are:
additional code must be executed at runtime to impose checks - flexibility and safety come at a cost (resource use)
since the language is very high level, access to some instructions/hardware resources is lost (so writing programs that need them is not recommended)
Safe coding techniques.
Programmers need to inspect the code and rewrite any unsafe coding (caused for example by the prioritisation of efficiency over type safety of languages like C).
Language extensions/safe libraries.
Handling dynamically-allocated memory is very complicated because information about it is not available at compile time: there are libraries that can help with solving this problem (like C’s libsafe).
Stack protection.
The programmer can add function entry and exit codes to check stack for signs of corruption.
For example, a canary can be used:
small, unique secret values placed on the stack to detect if overflow has occurred; the canary is placed immediately before the crucial control data, and, before returning control, the function checks the canary. if its value has been altered, the stack has been corrupted and the program is aborted.
Another option is using Stackshield and Return Address Defender (RAD), a compiler extension that stores a backup copy of the return address in a safe memory region (the program checks its return copy against the stored one).
run-time defenses
Executable address space protection.
Hardware + software defense that splits memory in sections that are either writeable or executable, but not both.
trying to execute code in a non-executable area (like the stack for example) causes a segfault which halts the attack
This method doesn’t support executable stack code.
Address space randomization
This solution randomly manipulates the location of key data structures (stack, heap, global data), heap buffers and standard lib functions. This prevents the attacker from reliably guessing the target address in which to place the shellcode.
Guard pages
Guards can be placed between critical sections of memory - these are flagged as illegal addresses, and any attempt to access them aborts processes.
buffer overflow variants
Replacement Stack Frame
Attack: Overwrites the buffer and the saved frame pointer address on the stack.
The saved frame pointer is changed to refer to a dummy stack frame.
When the function returns, control is transferred to the replacement dummy frame.
This ultimately directs execution to the shellcode in the overwritten buffer.
Off-by-one attacks: A specific coding error that allows an attacker to copy one more byte than the buffer size.
Defenses:
Stack protection mechanisms (e.g., canaries) to detect modifications to the stack frame or return address before function exit.
Use non-executable stacks (e.g., DEP/NX bit).
Randomization of the stack memory and system libraries (ASLR).
Return to System Call (Return-to-libc)
Attack: Stack overflow that replaces the return address with the address of a standard library function (like system()).
This is a response to non-executable stack defenses.
The attacker constructs suitable parameters for the library function on the stack, placed above the overwritten return address.
When the function returns, the library function executes with the attacker-supplied parameters.
It may require knowing the exact buffer address.
Can chain multiple library calls for complex payloads.
Defenses: (Same as Variant 1, as they target the stack)
Stack protection (canaries).
Non-executable stacks.
Randomization of stack and system libraries (ASLR).
Heap Overflow
Attack: Targets a buffer located in the heap memory region, typically used for dynamic data structures (like linked lists).
Exploitation: Unlike a stack overflow, there is no return address on the heap for easy control transfer.
Exploitation often involves overwriting or manipulating function pointers stored on the heap.
Alternatively, it can exploit management data structures the heap uses to track allocations.
Defenses:
Making the heap region non-executable.
Randomizing the allocation of memory on the heap.
Global Data Overflow
Attack: Targets a buffer located in the global data region (or BSS).
This buffer may be located adjacent to function pointers or adjacent process management tables.
The primary goal is to overwrite a function pointer that will be called later in the program’s execution flow.
Defenses:
Making the global data region non-executable or randomized.
Moving function pointers to a safe, protected memory location.
Using guard pages (unmapped pages) between critical data structures to detect overflows immediately