Section 4: Assembly and the Stack

In this section, we will look at more advanced C programs compiled to assembly language, and discuss the layout of the stack segment of memory.

You can find the code for this section here.

You'll want to use your assembly cheat sheet for this section!

Question 1: Mystery Program

Let's figure out what C program may have created the assembly code below. (As usual, there are multiple possible C programs.)

Remember from lecture that it is a good idea to work backwards from the ret instruction.

func:
	movl	$42, -12(%rsp)
	movl	$0, -4(%rsp)
	movl	$0, -8(%rsp)
	jmp	.L2
.L3:
	movl	-8(%rsp), %eax
	addl	%eax, -4(%rsp)
	addl	$1, -8(%rsp)
.L2:
	movl	-8(%rsp), %eax
	cmpl	-12(%rsp), %eax
	jl	.L3
	movl	-4(%rsp), %eax
	ret
main:
	movl	$0, %eax
	call	func
	ret
	.size	main, .-main
	.ident	"GCC: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0"
	.section	.note.GNU-stack,"",@progbits

QUESTION 1A. What is the size of the return value of the func function? What type of variable might it return?

QUESTION 1B. What kind of control flow manipulation (loops, conditionals, etc.) could have been used to generate this assembly code?

QUESTION 1C. What C program could have generated this assembly code? There are multiple possible C programs that could have done this!

Question 2: The Stack

Consider the following C program. We will draw the layout of the stack at different points in the execution, using the assembly code in q2.s as a reference.

#include <stdio.h>
#include <stdlib.h>

int add(int a, int b) {
    int result = a + b;
    // *** B ***
    return result;
}

int main() {
    int result;
    long a = 7;
    short b = 12;
    // *** A ***

    result = add(a, b);
    // *** C ***
    printf("Result of adding %ld and %d is %d.\n", a, b, result);
    // *** D ***

    return 0;
}

Use the following stack diagram template to help answer the following questions.

Initial Stack Diagram

QUESTION 2A. What does the call stack look like after the line labeled A is executed (after the local variables are initialized)?

QUESTION 2B. What does the call stack look like after executing the line labeled B (where result is initialized in add)? Make sure to include labels and addresses in the stack diagram.

QUESTION 2C. What does the call stack look like after executing the line labeled C in the diagram (after returning from add, but before calling printf)?

QUESTION 2D. What does the call stack look like after executing the line labeled D in the diagram (after returning from printf, but right before we return from main)?

Question 3: Fun!

The below is additional practice material; we don't plan to cover it in section unless we have spare time. Some of these questions are quite difficult, but all of them will get you more familiar with assembly and the call stack.

Now it's time to have some fun! Unfortunately having fun is not quite that simple. Everyone has fun in their own special ways. In order for your computer to have some fun, you'll need to input a specific argument.

There are six different fun-having functions in this portion of the section. Each fun function will return either 0 (if fun is had) or something else (if no fun is had). The goal is to find out the arguments so each fun-having function can have fun!

To compile the program, run make inside the sec4 directory in the section code.

To run a fun function, use the following command format:

$ ./fun [number of func] [argument to func]

For example:

$ ./fun 1 "CS300!"

Will run the first fun function (fun_one) with input CS300!.

In this part of the section, we will use the objdump tool to look at the assembly code for executables we have already compiled, and we will use GDB to investigate the behavior of fun-inducing functions. GDB helps us analyze how each fun function responds to the inputted argument. (Check out this GDB cheatsheet for a brief overview of useful commands and their usage.)

NOTE: If you are on an Apple Silicon or other ARM64 device, please consult the README.apple file in the section code for how to run GDB for this section.

Note that the call instruction calls a function; in x86-64, register %rdi (or its sub-registers) hold the first argument to a function at the time of the call instruction.

QUESTION 3A. What argument will allow fun_one to have fun? For reference, its assembly is below:

fun_one:
	subq	$24, %rsp
	movq	%rdi, 8(%rsp)
	movq	8(%rsp), %rax
	movl	$33, %esi
	movq	%rax, %rdi
	call	strchr@PLT
	testq	%rax, %rax
	je	.L20
	movl	$0, %eax
	jmp	.L21
.L20:
	movl	$-1, %eax
.L21:
	addq	$24, %rsp
	ret

fun_one will have fun (and return 0) when the argument is a string that contains an exclamation point (!). For example, "!", "CS300!", and "Wow!!!!" will have fun.

The C code that generated fun_one is below:
int fun_one(const char *s) {
    if (strchr(s, '!') != NULL) {
        return 0;
    } else {
        return -1;
    }
}

QUESTION 3B. What argument will allow fun_two to have fun? For reference, its assembly -- in objdump output format -- is below. Note that %rdi contains the first argument to a function called as before, %rsi contains the second argument, and %rdx contains the third argument.

0000000000001352 <fun_two>:
    1352:       sub    $0x18,%rsp
    1356:       mov    %rdi,0x8(%rsp)
    135b:       mov    0x8(%rsp),%rax
    1360:       mov    $0x0,%edx
    1365:       mov    $0x0,%esi
    136a:       mov    %rax,%rdi
    136d:       callq  1060 <strtol@plt>
    1372:       add    $0x1,%eax
    1375:       add    $0x18,%rsp
    1379:       retq

fun_two will have fun (and return 0) when the argument is -1 in base 10 ("-1"), base 8 ("-01"), or base 16 ("-0x1").

The C code that generated fun_two is below:
int fun_two(const char *s) {
    return strtol(s, NULL, 0) + 1;
}

QUESTION 3C. What argument will allow fun_three to have fun? Let's investigate using GDB in layout asm mode.

Note the following:

The si command to GDB makes it step by a single assembly instruction at a time.
The finish command makes GDB run to the end of the current function (this is useful for getting out of standard library functions).
The p command followed by a register name written as $reg prints the contents of the register; e.g., p $edx prints the contents of the lower 32 bits of the %rdx register.
The info registers (or, for short, i reg) command prints the contents of all the registers. You can print a specific register via i reg followed by a register name. For example, i reg $rax prints %rax, or i reg $eflags prints the contents of the %rflags register.

fun_three will have fun (and return 0) when the argument is a single character. For example, "A", "?", and "o" will allow fun_three to have fun.

The C code that generated fun_three is below:
int fun_three(const char* s) {
    if (s[0] != 0) {
        return s[1];
    } else {
        return -1;
    }
}

QUESTION 3D. What argument will allow fun_four to have fun? Use objdump or GDB to investigate. The from the your assembly overview sheet may come in handy!

fun_four will have fun (and return 0) when the argument consists of exactly two of the same characters. For example, "aa", ";;", and "TT" will allow fun_four to have fun.

The C code that generated fun_four is below:
int fun_four(const char* s) {
    if (s[0] != 0 && s[0] == s[1] && s[1] != 0) {
        return s[2];
    } else {
        return -1;
    }
}

QUESTION 3E. What argument will allow fun_five to have fun? For reference, its assembly is below:

fun_five:
	movq	%rdi, -8(%rsp)
	movq	-8(%rsp), %rax
	movzbl	(%rax), %eax
	testb	%al, %al
	jne	.L31
	movl	$-1, %eax
	ret
.L31:
	movq	-8(%rsp), %rax
	movzbl	(%rax), %edx
	movq	-8(%rsp), %rax
	addq	$1, %rax
	movzbl	(%rax), %eax
	cmpb	%al, %dl
	je	.L33
	movq	-8(%rsp), %rax
	addq	$1, %rax
	movzbl	(%rax), %eax
	movsbl	%al, %eax
	ret
.L33:
	addq	$1, -8(%rsp)
	jmp	.L31

fun_five will have fun (and return 0) when the argument consists of a nonzero number of the same character. For example, "ooo", "A", and "]]]]]" will allow fun_five to have fun.

The C code that generated fun_five is below:
int fun_five(const char* s) {
    if (s[0] == 0) {
        return -1;
    }
    while (1) {
        if (s[0] != s[1]) {
            return s[1];
        }
        ++s;
    }
}

QUESTION 3F. What argument will allow fun_six to have fun? Investigate this question using GDB.

fun_six will have fun (and return 0) when the argument consists of a string whose length is a multiple of 4. For example, "", "aaaa", and "CS300WoW" will allow fun_six to have fun.

The C code that generated fun_six is below:
int fun_six(const char* s) {
    unsigned len;
    for (len = 0; s[len]; ++len) {
    }
    return len % 4;
}

Acknowledgements: The material for Q3 was originally developed for Harvard's CS 61 course. We are grateful to Eddie Kohler for allowing us to use the material for CS 300.