⚠️ This is not the current iteration of the course! Head here for the current offering.

Lecture 8: Stack, Buffer Overflow, and Intro to C++

» Lecture code (Stack)Lecture code (C++)

S1: The Stack, continued

You will recall the stack segment of memory from earlier lectures: it is where all variables with automatic lifetime are stored. These include local variables declared inside functions, but importantly also function arguments.

The arguments and local variables of f() live inside f()'s stack frame. Subsequent arguments (second, third, fourth, etc.) are stored at subsequently lower addresses below %rsp (see call02.s and call03.s for examples with more arguments), followed eventually by any local variables in the caller.

How does %rsp change?

The convention is that %rsp always points to the lowest (leftmost) stack address that is currently used. This means that when a function declares a new local variable, %rsp has to move down (left) and if a function returns, %rsp has to move up (right) and back to where it was when the function was originally called.

Moving %rsp happens in two ways: explicit modification via arithmetic instructions, and implicit modification as a side effect of special instructions. The former happens when the compiler knows exactly how many bytes a function requires %rsp to move by, and involves instructions like subq $0x10, %rsp, which moves the stack pointer down by 16 bytes. The latter, side-effect modification happens when instruction push and pop run. These instructions write the contents of a register onto the stack memory immediately to the left of the current %rsp and also modify %rsp to point to the beginning of this new data. For example, pushq %rax would write the 8 bytes from register %rax at address %rsp - 8 and set %rsp to that address; it is equivalent to movq %rax, -8(%rsp); subq $8, %rsp or subq $8, %rsp; movq %rax, (%rsp).

Return Address

As a function executes, it eventually reaches a ret instruction in its assembly. The effect of ret is to return to the caller (a form a control flow, as the next instruction needs to change). But how does the processor know what instruction to execute next, and what to set %rip to?

It turns out that the stack plays a role here, too. In a nutshell, each function call stores the return address as the very first (i.e., rightmost) data in the callee's stack frame. (If the function called takes more than six arguments, the return address is to the left of the 7th argument in the caller's stack frame.)

The stored return address makes it possible for each function to know exactly where to continue execution once it returns to its caller. (However, storing the return address on the stack also has some dangerous consequences, as we will see shortly.)

We can now define the full function entry and exit sequence. Both the caller and the callee have responsibilities in this sequence.

To prepare for a function call, the caller performs the following tasks:

  1. The caller stores the first six arguments in the corresponding registers.

  2. If the callee takes more than six arguments, or if some of its arguments are large, the caller must store the surplus arguments on its stack frame (in increasing order). The 7th argument must be stored at (%rsp) (that is, the top of the stack) when the caller executes its callq instruction.

  3. The caller saves any caller-saved registers (see last lecture's list). These are registers whose values the callee might overwrite, but which the caller needs to retain for later use.

  4. The caller executes callq FUNCTION. This has an effect like pushq $NEXT_INSTRUCTION; jmp FUNCTION (or, equivalently, subq $8, %rsp; movq $NEXT_INSTRUCTION, (%rsp); jmp FUNCTION), where NEXT_INSTRUCTION is the address of the instruction immediately following callq.

To return from a function, the callee does the following:

  1. The callee places its return value in %rax.

  2. The callee restores the stack pointer to its value at entry ("entry %rsp"), if necessary.

  3. The callee executes the retq instruction. This has an effect like popq %rip, which removes the return address from the stack and jumps to that address (because the instruction writes it into the special %rip register).

  4. Finally, the caller then cleans up any space it prepared for arguments and restores caller-saved registers if necessary.


S2: Base Pointers and Buffer Overflow

Base Pointers and the %rbp Register

Keeping track of the entry %rsp can be tricky with more complex functions that allocate lots of local variables and modify the stack in complex ways. For these cases, the x86-64 Linux calling convention allows for the use of another register, %rbp as a special-purpose register.

%rbp holds the address of the base of the current stack frame: that is, the address of the rightmost (highest) address that points to a value still part of the current stack frame. This corresponds the rightmost address of an object in the callee's stack, and to the first address that isn't part of an argument to the callee or one of its local variables. It is called the base pointer, since the address points at the "base" of the callee's stack frame (if %rsp points to the "top", %rbp points to the "base" (= bottom). The %rbp register maintains this value for the whole execution of the function (i.e., the function may not overwrite the value in that register), even as %rsp changes.

This scheme has the advantage that when the function exits, it can restore its original entry %rsp by loading it from %rbp. In addition, it also facilitates debugging because each function stores the old value of %rbp to the stack at its point of entry. The 8 bytes holding the caller's %rbp are the very first thing stored inside the callee's stack frame, and they are right below the return address in the caller's stack frame. This mean that the saved %rbps form a chain that allows each function to locate the base of its caller's stack frame, where it will find the %rbp of the "grand-caller's" stack frame, etc. The backtraces you see in GDB and in Address Sanitizer error messages are generated precisely using this chain!

Therefore, with a base pointer, the function entry sequence becomes:

  1. The first instruction executed by the callee on function entry is pushq %rbp. This saves the caller's value for %rbp into the callee's stack. (Since %rbp is callee-saved, the callee is responsible for saving it.)

  2. The second instruction is movq %rsp, %rbp. This saves the current stack pointer in %rbp (so %rbp = entry %rsp - 8).

    This adjusted value of %rbp is the callee's "frame pointer" or base pointer. The callee will not change this value until it returns. The frame pointer provides a stable reference point for local variables and caller arguments. (Complex functions may need a stable reference point because they reserve varying amounts of space.)

    Note, also, that the value stored at (%rbp) is the caller's %rbp, and the value stored at 8(%rbp) is the return address. This information can be used to trace backwards by debuggers (a process called "stack unwinding").

  3. The function ends with movq %rbp, %rsp; popq %rbp; retq, or, equivalently, leave; retq. This sequence is the last thing the callee does, and it restores the caller's %rbp and entry %rsp before returning.

You can find an example of this in call07.s. Lab 3 also uses the %rbp-based calling convention, so make sure you keep the extra 8 bytes for storing the caller's %rbp on the stack in mind!

Buffer overflow attacks

Now that we understand the calling convention and the stack, let's take a step back and think of some of the consequences of this well-defined memory layout. While a callee is not supposed to access its caller's stack frame (unless it's explicitly passed a pointer to an object within it), there is no principled mechanism in the x86-64 architecture that prevents such access.

In particular, if you can guess the address of a variable on the stack (either a local within the current function or a local/argument in a caller of the current function), your program can just write data to that address and overwrite whatever is there.

This can happen accidentally (due to bugs), but it becomes a much bigger problem if done deliberately by malicious actors: a user might provide input that causes a program to overwrite important data on the stack. This kind of attack is called a buffer overflow attack.

Consider the code in attackme.cc. This program computes checksums of strings provided to it as command line arguments. You don't need to understand in deep detail what it does, but observe that the checksum() function uses a 100-byte stack-allocated buffer (as part of the buf union) to hold the input string, which it copies into that buffer.

A sane execution of attackme might look like this:

$ ./attackme hey yo CS300
hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a  -
yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987  -
CS300: checksum 30335373, sha1 b93b5d831563598f15cae55ba445f0b3cd5da036  -

But what if the user provides an input string longer than 99 characters (remember that we also need the zero terminator in the buffer)? The function just keeps writing, and it will write over whatever is adjacent to buf on the stack.

From our prior pictures, we know that buf will be in checksum's stack frame, below the entry %rsp. Moreover, directly above the entry %rsp is the return address! In this case, that is an address in main(). So, if checksum writes beyond the end of buf, will overwrite the return address on the stack; if it keeps going further, it will overwrite data in main's stack frame.

Why is overwriting the return address dangerous? It means that a clever attacker can direct the program to execute any function within the program. In the case of attackme.cc, note the run_shell() function, which runs a string as a shell command. This has a lot of nefarious potential – what if we could cause that function to execute with a user-provided string? We could print a lot of sad face emojis to the shell, or, more dangerously, run a command like rm -rf /, which deletes all data on the user's computer!

If we run ./attackme.unsafe (a variant of attackme with safety features added by mondern compilers to combat these attacks disabled), it behaves as normal with sane strings:

$ ./attackme.unsafe hey yo CS300
hey: checksum 00796568, sha1 7aea02175315cd3541b03ffe78aa1ccc40d2e98a  -
yo: checksum 00006f79, sha1 dcdc24e139db869eb059c9355c89c382de15b987  -
CS300: checksum 30335373, sha1 b93b5d831563598f15cae55ba445f0b3cd5da036  -
But if we pass a very long string with more than 100 characters, things get a bit more unusual:
$ ./attackme.unsafe sghfkhgkfshgksdhrehugresizqaugerhgjkfdhgkjdhgukhsukgrzufaofuoewugurezgureszgukskgreukfzreskugzurksgzukrestgkurzesi
Segmentation fault (core dumped)
The crash happens because the return address for checksum() was overwritten by garbage from our string, which isn't a valid address. But what if we figure out a valid address and put it in exactly the right place in our string?

This is what the input in attack.txt does. Specifically, using GDB, I figured out that the address of run_shell in my compiled version of the code is 0x400870 (an address in the code/text segment of the executable). attack.txt contains a carefully crafted "payload" that puts the value 0x400870 into the right bytes on the stack. The attack payload is 115 characters long because we need 100 characters to overrun buf, 3 bytes for the malicious return address, and 12 bytes of extra payload because stack frames on x86-64 Linux are aligned to 16-byte boundaries.

Executing this attack works as follows:

$ ./attackme.unsafe "$(cat attack.txt)"
OWNED
OWNED
OWNED
OWNED
OWNED
OWNED
sh: 7: ��5��: not found
Segmentation fault (core dumped)
The cat attack.txt shell command simple pastes the contents of the attack.txt file into the string we're passing to the program. (The quotes are required to make sure our attack payload is processed as a single string even if it contains spaces.)

S3: Introduction to C++

Why are we using C++ for the rest of the course?

You will by now have started to appreciate the power of the C programming language: it gives you direct access to memory, it matches very closely with the concepts in the underlying hardware, and it is able to achieve very high performance because the language adds no to very little overhead to your program.

However, writing programs in C can feel a bit like trying to build your own car from scratch: very educational, but you really have to do everything yourself. C doesn't come with any data structures in a standard library, and as you will recall from the vectors part of Project 1, writing a generic datastructure is somewhat painful. C++ tries to make things easier: it still gives you access to all the low-level power of C, but it also allows you to write code at a higher level of abstraction, using classes, objects, and various other advanced features.

So far, we've used the C programming language in the course. We will now increasingly start working on C++, which is a seperate programming language from C. However, C++ and C are closely related: indeed, pretty much any valid C program is also a valid C++ program.

Note that C++ is a huge programming language, especially compared to C, and comes with many advanced (and some ill-advised) features. The C++ we will write in this course mostly focuses on the C-like subset of the language, with the addition of classes, objects, and some standard library data structures. If you'd like to learn more about the advanced features of C++, check out the links on our C/C++ Primer page.

Compiling C++ programs

Despite its similarity to C, C++ is a separate programming language with its own compilers and tools.

There are many C++ compilers, but the most widely-used ones are GCC's g++ (the C++ equivalent of gcc) and LLVM's clang++ (the C++ equivalent of clang). You can use either for the course; clang++ sometimes has easier-to-understand error messages than g++.

The good news, though, is that the command line options for these compilers are practically identical to those for their C equivalents, and that all your favorite build and debugging tools (Makefiles, GDB, sanitizers, etc.) still work for C++.

Classes and Objects

C++ is an object-oriented programming language, meaning that it includes the notion of classes that you can instantiate into objects. If you know Java, these concepts will seem very familiar to you. A class defines data and functionality associated with a specific type, and each class can have many individual instances in the form of objects of that class. Classes and objects help programmers organize their code, and allow for some data to be accessible only via specific functions – an idea known as "encapsulation". Object-orientation also allows you to write programs that are crazy difficult to understand, and some believe it's overkill – we won't pass judgement on this, but rather make use of C++'s object-oriented features to make our life as systems programmers easier.

Let's look at the specific example of a program that seeks to represent pets via the Animal class type. If you were to write a C program for this purpose, you might define a struct type that tracks information about a specific animal, such as its name, age, and weight:

typedef struct animal {
  char* name;
  int age;
  int weight;
} animal_t;
Specific instances of this structure can exist on the stack or on the heap:
int main() {
  animal_t stack_cat;
  stack_cat.name = "kitty";
  stack_cat.age = 5;
  stack_cat.weight = 10;

  animal_t* heap_dog = (animal_t*) malloc(sizeof(animal_t));
  heap_dog->name = "doggy";
  heap_dog->age = 8;

  // [...]
}
This works fine, but comes with several downsides:
  1. Any piece of code can set the values of any of the struct's members without validation; for example, nothing prevents non-sensical assignments like heap_dog->weight = 999;.
  2. Any function that operates on an animal needs to explictly take a pointer to the specific animal in question as an argument (recall how all the vector methods in Project 1 took a vector_t* as their first argument), so that the function body knows what memory to access.

C++ extends the C struct notion with functionality to allow instances of a type to have behavior (i.e., associated methods). To do so, you can define functions as part of the struct definition:

typedef struct Animal {
  char* name;
  int age;
  int weight;

  // new in C++: define methods on instances of this struct
  void setWeight(int w) {
    if (w > 50) {
      printf("error: unrealistic weight!\n");
      return;
    }
    this->weight = w;
  }

  int getWeight() {
    return this->weight;
  }
} animal_t;
You'll notice the this keyword inside setWeight and getWeight methods here. this is always a pointer to the instance of the struct that the method was called on – in other words, its type is Animal* in this example. This implicit access to a pointer to the current instance allows calling methods on an animal using the same syntax as C struct member access:
int main() {
  Animal stack_cat;
  stack_cat.name = "kitty";
  stack_cat.age = 5;
  stack_cat.setWeight(10);

  animal_t stack_dog; // can still use type alias, just like in C!
  stack_dog.name = "doggy";
  stack_dog.age = 5;
  stack_dog.setWeight(999); // will report an error
}

To be backwards-compatible with C, a C++ struct without any methods has exactly the same syntax and behaves exactly the same way as the C struct would. But you may note that even though our Animal struct defines handy methods to get and set the weight of the animal, including some validation in setWeight, there is nothing preventing code from directly modifying the weight member of the struct.

C++ provides the class keyword to help you define structs whose members are protected from arbitrary access. The definition of class looks exactly the same as that of a struct, with the exception that you can define some members to be public and some to be private, as follows:

typedef class Animal {
 public:
  char* name;
  int age;
 private:
  int weight;

 public:
  // to allow access to the private `weight` member via methods, these need to be public
  void setWeight(int w) {
    if (w > 50) {
      printf("error: unrealistic weight!\n");
      return;
    }
    this->weight = w;
  }

  int getWeight() {
    return this->weight;
  }
} animal_t;
These access modifiers split the definition into sections, and the compiler will prevent any access to private members from outside the methods associated with the class. (Both member variables, called fields, and member functions, called methods can be private.)

Are there ways around access modifiers?

It's important to realize that access modifiers are merely a helpful aid to the programmer, not a failsafe protection mechanism. Only the compiler looks at access modifiers and checks them; once the compiler has turned the C++ code into assembly or machine code, no notion of access modifier protection remains. In particular, the access modifiers are never checked at program runtime!

But even the compiler can be fooled. Since C++ is a systems programming language, it allows for direct memory access, including pointer arithmetic. This actually provides a way for programs to circumvent the private access modifier: knowing at what byte offset in a class or struct a field is located is sufficient to form a pointer to that field, and to ultimately access the memory. No C++ compiler can prove the absence of such illegal accesses without additional hints; this is an instance of the pointer aliasing problem.

Finally, what if you want to create an instance of a class? In the above examples, we've already seen stack-allocated objects of the Animal class. To make heap-allocated objects, C++ uses the new keyword:

int main() {
  Animal* heap_cat = new Animal;
  heap_cat->name = "kitty";
  heap_cat->age = 5;
  heap_cat->setWeight(10);
}

new here works exactly like (Animal*) malloc(sizeof(Animal)), allocating sufficient heap memory for an Animal structure. On top of allocating memory, however, new also calls a special method on the class called the constructor. Constructors are helpful in order to initialize the fields of the object: recall that uninitialized memory may contain arbitrary garbage! (cpp1.cc in the lecture code shows an example of how a stack-allocated object can have surprising contents if the fields aren't set correctly.) To define a constructor, you add a method without a return type (think about this: what would the constructor return?) and with the same name as the class/struct name:

typedef class Animal {
 public:
  char* name;
  int age;
 private:
  int weight;

 public:
  // constructor, takes two arguments
  Animal(char* name, int age) {
    this->name = name;
    this->age = age;
    this->weight = 0;  // can access private field from constructor
  }

  // ... other methods
} animal_t;

int main() {
  // calls constructor on creating stack-allocated object
  Animal stack_cat("kitty", 5);
  stack_cat.setWeight(10);

  // calls constructor on creating heap-allocated object
  Animal* heap_dog = new Animal("doggy", 5);

  // [...]
}
Constructors help set up objects; by default, C++ adds an empty zero-argument constructor to each struct or class, which is why the above examples without an explicit constructor call are still valid.

Initializer list syntax for constructors

Rather than writing several lines of the form this->field = ... in the constructor to initialize fields, C++ permits a shorthand syntax called initializer list. Separated by a colon from the constructor declaration, the initializer list consists of a comma-separated list of field(argument) pairs. For example:

//                            | initializer list
//                            v
Animal(char* name, int age) : name(name), age(age), weight(0) {
 // obsoleted by initializer list
 // this->name = name;
 // this->age = age;
 // this->weight = 0;
}

Finally, how do you get rid of a heap-allocated object? For this purpose, and as a counterpoint to new, C++ provides the delete keyword. delete is to new what free() is to malloc().

You can see some more examples of C++ classes and structures in cpp1.cc.

Summary

Today, we also understood in more detail how the stack segment of memory is structured and managed, and discussed how it grows and shrinks. We learned about how the compiler manages the stack pointer and how base pointers help it "unwind" the stack for debugging.

We then looked into how the very well-defined memory layout of the stack can become a danger if a program is compromised through a malicious input: by carefully crafting inputs that overwrite part of the stack memory via a buffer overflow, we can change important data and cause a program to execute arbitrary code.

In Lab 3, you will craft and execute buffer overflow attacks on a program yourself!

Finally, we talked about the C++ programming language and how it adds object-oriented features to C. We saw that C++ classes are basically fancy structs with methods (member functions), and how C++ allows you to create instances (objects) of such classes.