How to haq
by adamt
Talk 1/n for UNSW SecSoc on basic into to binary exploitation
Prereqs to reading this
- Assume you know how to write and compile C code
- Assume you are somewhat familiar with
gcc
and the linux command line - Would help to have some knowledge of gdb & assembly but not required
- This will cover linux 32bit stuff
How to haq
- How does C work
- How do computers work
- How do I break computers
Fast forward to part 3
A few questions before we start
- What does the following code do
- What’s wrong with it?
- How can we, without changing the source, call the function
notcalled()
?
#include <stdio.h>
void notcalled() {
printf("What\n");
}
void echo() {
char buffer[16];
printf("Enter some text\n");
gets(buffer);
printf("You entered %s\n", buffer);
}
int main() {
echo();
return 0;
}
Back to part 2: How do computer work?
Let’s start with some basics
- What is memory? (google this)
- What is little endianness???
- What is a stack?
- How does it work?
- What is assembly?
lets start by looking at the typical memory layout of a C program, in particular the stack, it’s contents and job during fuction calls and returns.
Photoshoot time…
Memory layout of a running process
put something here about little endianness
Stack
The stack is used by the program for
- local variable storage (
buffer
in the code above) - return addresses (will talk about later)
- Function paramaters
The stack is a LIFO structure, it grows downward in memory (from high addresses to lower addresses) as new functions are called.
Every function gets its own section of the stack, to store its local variables
Heap
The heap is …
- All the dynamically allocated memory resides here
- Dynamically allocated memory => Whenever we use malloc to get memory dynamically
The heap grows upwards in memory(from lower to higher memory addresses) as more and more memory is required.
Uninitialised Data (BSS)
- Any global variables in your program are stored here
- This consists of all global and static variables which are not initialized by the programmer. The kernel initializes them to 0 by default.
Initialised data segment
- This is all global and static variables which is initialised to some default value by the developer
Text
This is the fun section where all your executable code lives (as assembly)
The CPU
Instruction Pointer
The instruction pointer register stores the address of the next instruction to be executed by the CPU
After every instruction, this is incremented
Stack Pointer
The stack pointer stores the address of the top of the current stackframe (will describe below what a stackframe is)
This is the address of the last element on the stack
Since the stack grows down from high addresses to low addresses, ESP
points to the value in the stackframe at the lowest memory address.
Base Pointer
The base pointer is usually set to ESP
at the start of a function.
This is done to keep tab of function paramaters and local variables, as ESP
is constantly changing, EBP will always point to the top of the stackframe.
Local variables are stored below EBP
and Function paramaters are stored above EBP
Stackframe
Assembly
From wikipedia:
An assembly language (or assembler language),[1] often abbreviated asm,
is any low-level programming language in which there is a very strong
correspondence between the program's statements and the architecture's
machine code instructions.
x86 assembly
From wikipedia:
x86 assembly language is a family of backward-compatible assembly languages,
which provide some level of compatibility all the way back to the Intel 8008
introduced in April 1972......
Like all assembly languages, it uses short mnemonics to represent the
fundamental instructions that the CPU in a computer can understand and follow.
Import x86 instructions
- Instructions have variable length in x86
x86 has some cool instructions that make it easy for us to manipulate the stack
- push register (Push a value onto the stack, then decrement ESP)
- pop register (Pop a value from the stack into a register, then increment ESP)
- call address (Push current EIP to stack, then jump to address)
- ret (Pop a value from stack, then jump to that value)
Some cool maths instructions
- add
- mul
- div
Jumping around instructions
- jmp addr (jump to an address and execute)
- jne/je/jz/jlt/jle/…. (jump if equal/notequal/lesthan/greatherthan/etc)
And other stuff which isn’t really important rn
Function calls / C -> assembly
Consider the following C code..
void func(int a, int b) {
int c = 10 * a; // 4. Now EIP points here
int d = 10 * b;
}
int main() {
func(1, 2); // 1. EIP points here at beginning
return 0; // 3. This is the return address of func
}
- The program starts with the
EIP
register pointing to the start of main - A function call is found, so the arguments to the function are
push
ed onto the stack in reverse order, So 2 will bepush
ed, and then 1. - Before the function
func
is called, we need to knowwhere
to return after it is finished executing, so a return address is pushed onto the stack last, this address points to the next thing to be executed after func (look above) - Now find the addres of func, and set
EIP
equal to it. Now func is executing - Right now,
EBP
points to the bottom of mains stackframe, andESP
points to the top - The new function needs to setup a stackframe to store its local variables, so it does so by saving
EBP
onto the stack (by pushing it) then updatingEBP
to point toESP
. NowEBP
points to the current stack pointer. - As we allocate local variables, they are pushed onto the stack, and as such,
ESP
is incremented. - The stack looks something like below
Stack at step 7
-----------------------------
Ram | Registers
|
0x10: local var c | <--- [EBP] - 8 <---- [ESP]
0x20: local var d | <--- [EBP] - 4
0x30: mains saved EBP | <--- EBP
0x40: return value for func |
0x50: paramater 1 | <--- [EBP] + 8
0x60: paramater 2 | <--- [EBP] + 12
0x70: ...mains stack frame |
0x80: . |
0x90: .. |
0xA0: ... |
-----------------------------
How do I break computers?
What is a buffer overflow?
- Vulnerability found in low level C/C++ style code
- Can be used to crash program, corrupt data, steal info, run own code
tl;dr: Buffer overflow means you have control to enter data into a buffer(array), past the end of the array.
From what we learnt on the stack/stackframe above, we know that all local variables are stored onto the stack, so if we overflow a buffer, our data/input will continue over into the other variables.. Let’s do a fun example
Shitty little login program
Binary located here
void login() {
int is_admin = 0;
char username[16];
gets(username);
if (is_admin) {
printf("You now have admin permissions");
} else {
printf("You aren't an admin");
}
}
- What’s wrong with it?
- How can we set ourselves to admin?
This is a clean example of how we can corrupt data structures with buffer overflows. But it gets more fun…
- Can you make the program crash? (Hint look at the stackframe, what else is on the stack?)
- What if we can like.. make the return address point to our own code?
- Whats the difference between code and data??
Part 3 finally
#include <stdio.h>
void notcalled() {
printf("What\n");
}
void echo() {
char buffer[16];
printf("Enter some text\n");
gets(buffer);
printf("You entered %s\n", buffer);
}
int main() {
echo();
return 0;
}