Before attempting this lab, please make sure that you have:
1. Completed Lab 0 – This will ensure that your VM and grading server account are set up properly.
2. Completed the Diversity Survey – Your grades for Lab 0 and Lab 1 will depend on whether you’ve submitted this (though all questions are optional).
The purpose of this lab is to give you some experience with the syntax and basic features of the C programming language, as well as introduce you to a C debugging tool called
gdb (GNU Debugger). Learning C will help you understand a lot of the underlying architecture of the operating system, and as a whole demystify how programs run.
If you take away anything from this course, hopefully, it’s that Computer Systems are not magic and that much of it actually makes a lot of sense. Don’t be afraid to look up questions on Stack Overflow and Linux Man Pages (which provide great documentation on C library functions), and if that doesn’t help, ask on Piazza!
Check out this article for more on why C programming is awesome! Here are some of the article’s highlights: C is a procedural programming language that was mainly developed as a systems programming language to write operating systems. The main features of the C language include low-level access to memory, a simple set of keywords, and clean style, these features make C language suitable for system programming like operating system or compiler development.
If you are looking for a detailed tutorial on C, check out the links on our C primer.
Start with the
cs131-s20-labs-YOURNAME repository you used for Lab 0.
First, ensure that your repository has a
handout remote. Type:
$ git remote show handout
If this reports an error, run:
$ git remote add handout https://github.com/csci1310/cs131-s20-labs.git
$ git pull $ git pull handout master
This will merge our Lab 1 stencil code with your previous work. If you have any “conflicts” from Lab 0 (very unlikely!), resolve them before continuing further. Run
git push to save your work back to your personal repository.
To run a C program, you first need to compile the source code into a binary. There are several widely-used C compilers, but for this lab and CS 131, you will mostly use
gcc (the GNU C Compiler).
In the next lab, we’ll go over more information on the compilation process.
# compile your c-program into an executable binary (ones and zeros) $ gcc name_of_program.c -o name_of_executable # run the executable $ ./name_of_executable # Smile at the exciting output of your program.
However, sometimes things don’t go as planned, and instead of smiling, you’re pulling up your sleeves to solve a bug!
Like with other programming languages, C programmers frequently make use of print statements to look at the state of their program (in C, you use the
printf function for this). This so-called “printf debugging” is an important approach that can get you quite far, and you’ll probably use it a lot.
Often, however, you may wish that you could stop your program in its tracks (e.g., just before you hit a bug) and interactively inspect its state. This is what debugger tools like
gdb are for.
# compile your C program using the `-g` flag to compile with debugging info $ gcc name_of_program.c -g -o name_of_executable # run the executable in gdb $ gdb name_of_executable # set a breakpoint at a function (gdb) b name_of_a_function # run the program optionally with arguments ARGS (if necessary) (gdb) r ARGS # display the source code as you debug (gdb) layout src # print a variable VAR (gdb) p VAR # Run other gdb commands # Track down your bug # quit out of gdb (gdb) q
As explained on the gnu website, GDB can do four main things (plus other things) to help you catch bugs in the act:
Here’s a cheatsheet of common gdb commands. Throughout this lab we’ll use a few.
math_prog.c. There are two bugs in this program – don’t fix them quite yet.
add_arr, run the program, open the source code, and then print out the variable
# compile your c-program using the `-g` flag to compile with degugging info $ gcc math_prog.c -g -o math_prog # run the executable $ ./math_prog # run the executable in gdb $ gdb math_prog # set a breakpoint at a function (gdb) b add_arr # run the program optionally with arguments(if necessary) (gdb) r # display the source code as you debug (gdb) layout src # print the variable a (gdb) p a # Quit gdb (gdb) q
Note: For the remainder of the this lab, try to refrain from using print statements to debug. The following gdb commands can be very helpful in debugging C programs (particuarly the
bt command), and the sooner you get familiar with working with gdb, the easier your life will be.
Once you’re stopped at a breakpoint at
add_arr, run the following commands:
(gdb) c # continues the program to the next breakpoint or to termination # ...You should notice a SEGFAULT # this should show you exactly when the fault occured (gdb) layout src # this call is accessing invalid memory (gdb) p *(c + i) Cannot access memory at address 0xf0b5ff # ... Hmm where was the variable `c` initialized? # Prints a backtrace of the program # The 'bt' command is incredibly useful anytime you encounter a SEGFAULT. (gdb) bt
bt command shows you the function calls that led up to where you currently are in the program (in our case, the segfault). Each function call comes with a stack frame, which contains information specific to that call (such as arguments and local variables). We will hear more about stack frames later in the course. In
gdb, we can check out different frames (i.e. check out different function calls), like so:
# The 'f' command allows you to switch frames # the below command switches to frame #1, which corresponds to the main function (gdb) f 1 (gdb) p c # ... Oh `c` was declared in `main`, but never intialized
Hopefully you noticed that the pointer
c is initially pointing at uninitialized memory! We can fix this in two ways:
(In this case, because we’re only using the
arr for a short period of time, the stack allocation makes sense.)
int *c = malloc(sizeof(int) * 6); # ... use the pointer and when you're done ... free(c)
Once you fix the bug and re-compile your program, you should notice that the program no longer segfaults, but it’s still not working as expected.
gdb to find (and then fix) the second bug.
Typically when C programmers pass arrays as arguments to functions, they also include the length of the array as another argument to the function. Think about why they might do this.
Take a look at
simple_repl.c. This program reads in input from the terminal and breaks up a single line of text by either a space or comma! Fun fact: “REPL” stands for “read-eval-print” loop, and one place where you may have encountered a REPL before is the Python interpreter: you type a line, it evaluates it, and it prints some result.
As you’re reading through the code, here are some functions and variables you might want to look into:
strtok(This is a wacky function that we’ll use later, so pay special attention to it.)
simple_repl.c. Enter a few lines of text to get a feel for how it works.
<symbol, so that instead of reading in commands from the terminal, it reads them from a file.
$ ./simple_repl < files/three-star.csvor
$ ./simple_repl < files/A_Christmas_Carol_in_Prose.txt
$ echo "hello world" | ./simple_repl, piping the output from
echointo your REPL.
Ctrl-Dwill send an End-Of-File (EOF) signal to the program, causing
NULLand exiting the program.
gdb, and perform the following commands:
n) command until the call to
p) and examine (
x) commands to examine the contents of
bufbefore and after the call to
bufallocated (the stack or the heap)?
# set a break point at main (gdb) b main # show source code, and then run the program (gdb) layout src (gdb) r # use the n command to execute the next line of code (gdb) n # keep using the n command until you're about to execute the `fgets` (gdb) n #... # print out the buffer before executing fgets and after (gdb) p buf # the program will hang # (it's waiting for input from stdin for the fgets function) hello there # type a line of text # print the buffer (gdb) p buf # you should see the text you inputted (gdb) x/10c buf # examines (x) 10 characters (/10c) starting at buf
In this section, you will be writing your own version of
strtok. It might sound daunting, but we’ll walk you through it. Take a look at the link above if you need clarification on what exactly
Note: You may have noticed that
strtok maintains state internally from iteration to iteration. It does this by declaring a static local variable. Essentially, the function creates the variable in a region of memory that will persist until the end of the program (almost like a global variable), but the variable is only accessible within the function. This part has been written for you.
Task: Take a look at
my_strtok.c. You’ll be implementing your own version of
Fill in the
my_strtok.c according to the TODOs in the comments.
You can test your code using
simple_repl.c and some test cases in
test_runner.c. Compiling and running
test_runner.c will run the test cases in the function
strtokyou will need to add
my_strtok.cto the source list. For instance to compile the repl with
my_strtok()the command would be:
gcc simple_repl.c my_strtok.c -g -o simple_repl
Note: Don’t worry about the interplay between
my_strtok.h for now. If you are curious, a comment in
my_strtok.h explains what it’s about, but we will go over compilation more in Lab 2!
This REPL is really good at tokenizing based on commas and spaces now, but you may have realized that the program as a whole might struggle with parsing long sentences.
$ ./simple_repl < files/A_Christmas_Carol_excerpt.txt. This file contains the first two paragraphs of the Christmas Carol text file, and places each sentence on its own line. If you look at the output, you’ll see some weird-looking lines. This is because our program can’t parse more than 99 characters at a time.
One solution to this problem is to increase our
BUFFER_SIZE to something like 1,000,000 (roughly 1 MB), but in the cases where we’re reading smaller lines, this will waste a lot of space on our stack. Plus, what if someone had a really, really long line with more than a million characters? We really need to be able to dynamically adjust the size of our buffer (hint… the heap ).
simple_repl.cto use getline!
getlinewill intialize it correctly.
getlinewill modify the contents of the
charpointer itself (i.e
getlineisn’t changing the contents of what the pointer is pointing at, it’s changing the address that the pointer points at), it needs the address of a char pointer that’s stack allocated.
Before you start coding, let’s use the debugger to examine how our C-program is laid out in memory.
Variables in C never overlap; each variable occupies distinct storage. Additionally, each variable in C has a lifetime, which is called storage duration by the standard. There are three different kinds of lifetime.
The compiler and operating system work together to put variables at different addresses. A program’s address space (which is the range of addresses accessible to a program) divides into regions called segments. Objects with different lifetimes are placed into different segments. The most important segments are:
|Code (text, read-only data)||static, unmodifiable||program instructions and constant global variables|
|Data (data, bss)||static, modifiable||initialized and uninitialized non-constant global variables|
|Stack||automatic, modifiable||temporary local variables for each function call|
|Heap||dynamic, modifiable||memory that is explicitly allocated and deallocated|
An executable is normally at least as big as the static-lifetime data (the code and data segments together). Since all that data must be in memory for the entire lifetime of the program, it’s written to disk and then when a program runs, the operating system loads the segments into memory. The stack and heap segments, by contrast, grow on demand.
A harddisk (HDD, for hard disk drive, or SSD, for solid-state drive) is a persistent form of storage for data. The data on disk is maintained after your computer shuts down or the power fails, but data in memory is not!
Let’s take a look at this in action! We’ll be looking at
hello_world.c and the binary compiled from it.
Note: Modern compilers employ many optimizations to make it difficult for users to examine memory, because malicious users can perform some serious attacks on unprotected programs. We’re using the
-no-pie flags to turn off these optimizations for the purposes of this exercise.
# compile your program with the following flags $ gcc hello_world.c -no-pie -fno-pic -g -o hello_world $ gdb hello_world # before setting any breakpoints, do the following in gdb: (gdb) info files # don't quit yet ...
info fileswill print out the static segments that have been loaded into memory. The segments are formatted as:
[segment-start-address] - [segment-end-address] is [name-of-segment]
.text(the C’s program instructions, i.e., its code),
.data(initialized data), and
.bss(uninitialized data). These are static segments of our program that have already been placed into memory.
Entry point: 0x400590will refer to an address in the
.textregion of memory corresponding to the first instruction the program will run.
# ... back to the terminal (gdb) p GLOBAL_VAR # print the contents of GLOBAL_VAR 200 (gdb) p &GLOBAL_VAR # print the address of GLOBAL_VAR (int *) 0x601058 <GLOBAL_VAR> # the address may vary on your machine # examine (x) the contents at the address of GLOBAL_VAR as an integer (/d) (gdb) x/d &GLOBAL_VAR 0x601058 <GLOBAL_VAR>: 200
Notice that the address of the global
GLOBAL_VAR variable is in the
.data segment – the region where intialized global memory lives.
mainin gdb, and identify the segment of memory each has been loaded into.
.rodatasection. Notice that any static strings used in the course of the
hello_worldprogram are stored in this section.
x/dto examine as a decimal
x/sto examine as a string
x/cto examine as a character
x/ato examine as an address
x/ito examine as an instruction
x/3ito examine next 3 instructions that begin at an address
x/3swill examine the first 3 strings beginning at an address
Now, let’s continue our program in gdb. Set a breakpoint in main and run.
(gdb) b main (gdb) r #Now in main: (gdb) info proc mappings # Again, don't quit yet ...
Here, the command
info proc mappings shows the address ranges currently accessible to the program and their corresponding regions. Note that the mappings for this process currently include a stack (labeled
[stack]), but not a heap.
hello_world.c(past the declaration of
local_var) and then examine the address of
local_var. What section is
local_var. Since it is a char pointer, this should show the address
local_varpoints to and the value (string) at that address. In what section is the address contained in
local_var? (Hint: you examined this section in the previous task)
layout srcto see where you are in the code while it is running in gdb
hello_world.c, you can use the
ncommand in gdb so that you can step over any function calls
b 18and then use
cto continue straight to that line
Now, let’s continue stepping through main until line 22 (past the initialization of
info proc mappings. Do you notice any differences?
heap_allocated, and identify the section it is contained in.
The first time you examined the addresses accessible to the process right at the start of
main, the program had not yet allocated any data in the heap. Hence, the heap was not listed as an accessible section.
You will turn in your code by pushing your git repository to
As a quick recap, you do this by running
git commit; either use
git commit -a to commit all changes; or use
git add -p to interactively choose which changes to “stage” for commit, and then commit them using
git commit. Finally, push your changes to your git repository via
Then, head to the grading server. On the “Labs” page, use the “Lab 1 checkoff” button to check off your lab.
Note: Your lab grades are associated with the commit that you used as your lab checkoff, so when you check off your Lab 1, the grade for Lab 0 will no longer be shown. But rest assured: if you switch to the commit you used for the Lab 0 checkoff, you’ll hopefully see a 2/2 next to Lab 0