Lab 1: C Programming and Makefiles

Due Tuesday, February 6th at 8:00 PM EST

Before attempting this lab, please make sure that you have:

1. Completed Lab 0 – This will ensure that your Docker container and grading server account are set up properly.

2. Completed the Diversity Survey – Your grades for Lab 0 and Lab 1 will depend on whether you’ve submitted this (though you don’t have to answer any of the questions).

3. Signed up for Section on CAB – Our first round of sections will start this week!

Introduction

The purpose of this lab is to give you some experience with writing and understanding the syntax of C programs and apply the tools used to compile and run them. After this lab, you will also be more familiar with pointers and why C and C++ use them.

If you take away anything from this course, hopefully, it’s that Computer Systems are not magic and that much of it actually makes a lot of sense. Don’t be afraid to look up questions on Stack Overflow and Linux Man Pages (which provide great documentation on C library functions), and if that doesn’t help, ask on EdStem!

Why C?

Here are some of highlights: C is an imperative programming language that was mainly developed as a systems programming language to write operating systems. The main features of the C language include low-level access to memory, a simple set of keywords, and clean style, these features make C suitable and widely-used for system programming. C gives you a huge amount of power over what the computer does, which helps optimize the performance of your programs and allows writing low-level sofware that interacts directly with hardware. It also gives you the awesome feeling of really being in control. But with that power comes the responsibility to use it correctly: C has very few safeguards to protect your program’s data or exit gracefully when you make mistakes, and it will happily overwrite your memory with garbage or make your program explode if you make mistakes. Don’t worry, though, we’ll help you find and avoid them!

If you are looking for a detailed tutorial on C, check out the links on our C primer.

Assignment

Assignment Installation

Start with the cs300-s24-labs-YOURNAME repository you used for Lab 0.

First, ensure that your repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s24-labs.git

Then run:

$ git pull
$ git pull handout main

This will merge our Lab 1 stencil code with your previous work.

You may get a warning about “divergent branches” like shown below:

This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace):

$ git config pull.rebase false
$ git pull handout main

If you have any “conflicts” from Lab 0 (although this is unlikely), resolve them before continuing further. Run git push to save your work back to your personal repository.

Assignment Part I: C Programming

In this part of the lab, you will be writing a program that will reverse an array of strings (or, as they are known in C, char pointers). You will be writing two functions in the file reverse.c and you will test your implementation with the code found in test_reverse.c.

Setup

After you set up the lab, you should find within the lab1 folder a couple of files:

File	Description
`reverse.h`	Header file for `reverse.c`. Contains declarations for the function you should be implementing. (Explained Below)
`reverse.c`	You will be writing your code in this file.
`test_reverse.c`	Contains the test suite in which your implementation will be tested.

Header Files

You’ll notice that there are three files in the provided stencil code. reverse.c and test_reverse.c are similar to what we’ve seen before, containing C code. But what about reverse.h?

files ending in the .h extension are called header files, and declare functions so that they can be used in multiple different .c files. Without a header file, reverse_test.c wouldn’t be able to use the functions that you create in reverse.c, which would make testing impossible!

reverse.h includes the signature of the reverse_arr function, but with no implementation:

void reverse_arr(char** arr, int num);

and then reverse_test.c has this line at the top, telling the C compiler to look for function definitions in reverse.h

#include "reverse.h"

This way, when the reverse_arr function is used in reverse_test.c, the C compiler checks reverse.h for a matching signature, and then checks reverse.c for an implementation of the function.

Review of pointers and strings

Pointers are memory locations that store addresses (i.e., they “point” at whatever is at that address!). For instance, int* is a pointer to an integer. On a 64-bit architecture (which most computers today use), the pointer occupies 8 bytes in memory, which store the address it points. And that address refers to the first byte of a 4-byte sequence of memory that stores an int.

As you will notice, there isn’t an explicit data type called “string” in C. That’s because strings in memory are just a sequence of one bytes, each represented as a char (a 1-byte value). Instead of having a datatype explicitly called “string”, in C, you can think of char pointers (i.e., char*) as strings.

char* store = “hello”; 

for (int i = 0; *(store + i) != '\0'; i++) {
  // prints out each character
  printf("character: %c\n", *(store + i));
}

Since store is defined as a char pointer, store will point to a byte of memory that stores a character. And if you increment the value of the pointer by 1 (going to the next box) and dereference that value, you would get the next character of the string. This raises the question: couldn’t you just keep incrementing this pointer? How would you know where the end of the string is?

The answer: all strings in C are terminated by a NUL byte (a char storing numeric value 0), also known as \0. This byte indicates that you have reached the end of the string.

Consider the following memory layout for the example program above:

 store         0x2000 0x2001 .... 
+---+---+      +-----+-----+-----+-----+-----+-----+
| 0x2000| ---> | 'h' | 'e' | 'l' | 'l' |  o  | \0  |
+---+---+      +-----+-----+-----+-----+-----+-----+

Now think through what happens at each iteration of the loop:

if i == 0, then *(store + 0) dereferences the memory address stored in store, which is 0x2000, and at address 0x2000, the character “h” is stored as a single byte. This is equivalent to writing store[0].
if i == 1, then *(store + 1) dereferences the memory address stored in store + 1, which is 0x2001. At 0x2001, there is the character “e”. This is equivalent to writing store[1].

What you saw here is an example of pointer arithmetic, that is, arithmetic on memory addresses.

Let’s get coding!

Task: You will be writing two functions in the file reverse.c:

reverse_arr will take in two inputs, a char* array and the number of input elements in the array. And reverse_arr will reverse the inputted array with the help of another function called swap.
- Note: you can assume that you will have the same number of elements in the array as specified by the second argument, and you will not have to reverse an empty array and all elements will be defined (i.e., not NULL).
swap will take in two elements from the array and swap them.

Note: Remember that pointers are also passed by value as an argument to a function, meaning a copy of the address is passed. Thus, if you make changes to the value of the pointer, the address to which the pointer points changes (rather than the memory object that the pointer points to). Since the address value is copied when passing a pointer, changes to the address will not be reflected outside that function.

Running and Testing

Once you have finished writing your code, you are ready to test your implementation!

To test your code, we provide a file called test_reverse.c which calls reverse_arr, which you implemented in reverse.c. In order to run it, you must compile it and link the two files together into an executable by running the following command:

$ gcc test_reverse.c reverse.c -o reverse_test

This generates an executable called reverse_test. An executable is a special file that contains machine instructions which are made up of machine instructions encoded as 0s and 1s. And running this file causes the computer to perform those instructions.

In this case, those instructions are to run the program starting from main(), which first parses input from the command line, reverses the array given, and calls functions that run the tests found in test_reverse(). (One of the tests will open a file called test.txt in the current directory, reverses each line of the file, and writes it to an output file called testout.txt.)

You run your executable via:

$ ./reverse_test <NUM_ELEMENTS> <ELEMENT0 ELEMENT1 ELEMENT2 ...>

For instance:

$ ./reverse_test 2 hello world

will print out the results of reversing the input array and running the test suite. If you fail a test, the output provides the expected result at a given index in the array and the actual result.

To debug, you may find it helpful to print what’s happening in your swap and reverse_arr function. For instance, if you wanted to print the variable store from the code sample above, you can do:

printf("string: %s\n", store);

to print a string.

Hint: If you want to print out the value of a pointer, use the %p syntax for printf.

Once all of your tests pass, you are ready to move on!

Assignment Part II: More on Compiling

As you saw from the previous section, you compiled your program by running:

$ gcc test_reverse.c reverse.c -o reverse_test

With the -o flag, you can direct the output of the gcc compiler into a file specified by the argument following the flag. If you didn’t use the -o flag, you could run:

$ gcc test_reverse.c reverse.c

And this will produce an executable file called a.out (this is just a default filename defined by the compiler), which you can run by typing ./a.out.

Flags

The gcc compiler supports the use of hundreds of different flags, which we can use to customize the compilation process. Flags, typically prefixed by a dash or two (-<flag> or --<flag>), can help us in many ways from warning us about programming errors to optimizing our code so that it runs faster.

The general structure for compiling a program with flags is:

$ gcc <flags> <c-files> -o <executable-name>

Warning Flags:

-Wall
- One of the most common flags is the -Wall flag. It will cause the compiler to warn you about technically legal but potentially problematic syntax, including:
  - Uninitialized and unused variables
  - Incorrect return types
  - Invalid type comparisons
-Werror
- The -Werror flag forces the compiler to treat all compiler warnings as errors, meaning that your code won’t be compiled until you fix the errors. This may seem annoying at first, but in the long run, it can save you lots of time by forcing you to take a look at potentially problematic code.
-Wextra
- This flag adds a few more warnings (which will appear as errors thanks to -Werror, but are not covered by -Wall. Some problems that -Wextra will warn you about include:
  - Assert statements that always evaluate to true because of the datatype of the argument
  - Unused function parameters (only when used in conjunction with -Wall)
  - Empty if/else statements.

Task: Add the -Wall, -Werror, and -Wextra flag when compiling test_reverse.c and fix the errors that come up.

Notice that in test_reverse.c the main() function takes in two parameters:

What’s argc supposed to do?
- argc indicates the number of arguments passed into the program.
Use argc and change the body of main() so that when:
- argc == 1 then only the test suite should be executed
- argc > 1, the arguments on the command line are in the following order:
  - The number of elements to be reversed
  - The elements to be reversed
  - For example: ./reverse_test 2 csci 300
- You should check to make sure that the number elements inputted by the user actually corresponds to the number of elements to be reversed.
  - For example: ./reverse_test 2 csci should cause an error
Make sure to return 1 from main on an error, so that the OS can detect that your program exited with errors.

Debugging with Sanitizers: The warning flags don’t catch all errors. For example, memory leaks, stack or heap corruption, and cases of undefined behavior are often not detected by the compiler. You can use sanitizers to help with identifying these bugs! Sanitizers sacrifice efficiency to add additional checks and perform analysis on your code. You will be using these flags in the next lab in greater detail.

-fsanitize=address
- This flag enables the AddressSanitizer program, which is a memory error detector developed by Google. This can detect bugs such as out-of-bounds access to heap / stack, global variables, and dangling pointers (using a pointer after the object being pointed to is freed). In practice, this flag also adds another sanitizer, the LeakSanitizer, which detects memory leaks (also available via -fsanitize=leak).
-fsanitize=undefined
- This flag enables the UndefinedBehaviorSanitizer program. It can detect and catch various kinds of undefined behavior during program execution, such as using null pointers, or signed integer overflow.
-g
- This flag requests the compiler to generate and embed debugging information in the executable, especially the source code. This provides more specific debugging information when you’re running your executable with gdb or address sanitizers. You will see this flag being utilized in the next lab.

Optimizations

In addition to flags that let you know about problems in your code, there are also optimization flags that will speed up the runtime of your code at the cost of longer compilation times. Higher optimization levels will optimize by running analyses on your program to determine if the compiler can make certain changes that improve its speed. The higher the optimization level, the longer the compiler will take to compile the program, because it performs more sophisticated analyses on your program. These are the capital O flags, which include -O0, -O1, -O2, -O3, and -Os.

-O0
- This will compile your code without optimizations — it’s equivalent to not specifying the -O option at all. Because higher optimization levels will often remove and modify portions of your original code, it’s best to use this flag when you’re debugging with gdb or address sanitizers.
-O3
- This will enable the most aggressive optimizations, making your code run the fastest.

Task: Time your program before you add the -O3 flag and then after you’ve added the -O3 flag to your compilation. Because this program is so small, you probably won’t be able to detect a difference in speed, but in future assignments where there is a lot more code, the optimization flag will come in handy.

The -O3 flag will ask the compiler to examine what your code is trying to do and rather than following the provided code verbatim it will replace it with machine instructions that functionally do the same thing, but in a more efficient manner.

You can time your program by running the time command in your Docker container. For this exercise, pay attention to the real time, but if you’re curious about the different types of times below, check out this post.

time ./reverse_test

real	0m0.007s
user	0m0.002s
sys	0m0.000s

Assignment Part III: Makefiles

Now you know how to compile C programs! This is great, but actual software projects rarely require you to invoke the compiler directly like we did so far. Often (e.g., in the CS 300 projects!) you need to compile many source files and use specific sets of flags. It’s very easy to forget a flag or source file, and doing this all by hand on the command line is time-consuming. Additionally, when you have many source files (more than 2), it can be annoying to individually recompile/relink each source file when you make a change to it.

This is why the make tool was created! Running the make tool will read a file called the Makefile for specifications on how to compile and link a program. A well-written Makefile automates all the complicated parts of compilation, so you don’t have to remember each step. Additionally, they can do tasks other than just program compilation — they can execute any shell command we provide.

In this part of the lab, you will be writing a Makefile to use when compiling your reverse array program.

A Makefile consists of one or more rules. The basic structure of a Makefile rule is:

<target>: <dependencies>
[ tab ]<shell_command>

The target is the name of an output file generated by this rule, or a rule label that you choose in certain special cases.
The dependencies are the files or other targets that this target depends on.
The shell command is the command you want to run when the target or dependencies are out of date.
General Rules:
- From gnu.org: A target is out of date if it does not exist or if it is older than any of the dependencies (by comparison of last-modification times). The idea is that the contents of the target file are computed based on information in the dependencies, so if any of the dependencies changes, the contents of the existing target file are no longer necessarily valid.
- If a target is out of date, running make <target> will first remake any of its target dependencies and then run the <shell_command>.
- In general, the name of the Makefile target should be the same as the name of the output file, because then running make <target> will rebuild the target when the output file is older than its dependencies.

Linking is the process of combining many object files and libraries into a single (usually executable) file. If you look at the file test_reverse.c, at the top, you can see there is an #include “reverse.h”. This is so that we can use the functions that you wrote to test them, and as you can see, reverse_arr is called in the function test_reverse. You can link these two files together with the following Makefile rule:

reverse_test: test_reverse.c reverse.c reverse.h
    gcc test_reverse.c reverse.c -o reverse_test

The target is the executable named reverse_test, the dependencies are test_reverse.c, reverse.c, and reverse.h. And to compile, instead of typing the shell command, you can just type:

$ make reverse_test

This will cause the Makefile to run the reverse_test target, which will execute the command gcc test_reverse.c -o reverse_test if a reverse_test executable doesn’t exist or if the reverse_test executable is older than any of the dependencies. Notice how this only works properly if the name of the output executable is the same as the target name.

That was a lot of reading and information, but now you are ready to create your own Makefile!

Task:

Create an empty Makefile by typing touch Makefile in your lab directory.
Modify your Makefile so that it has one target, reverse_test, that will compile reverse.c and test_reverse.c.
Run make reverse_test to make sure it compiles successfully. (You may need to delete the reverse_test binary via rm -f reverse_test to make this work.)

Variables

Makefiles support defining variables, so that you can reuse flags and names you commonly use. MY_VAR = "something" will define a variable that can be used as $(MY_VAR) or ${MY_VAR} in your rules. A common way to define flags for C program compilation is to have a CFLAGS variable that you include whenever you run gcc. For example, you can then rewrite your target like this:

CFLAGS = -Wall -Werror

reverse_test: test_reverse.c reverse.c 
    gcc $(CFLAGS) test_reverse.c reverse.c -o reverse_test

Automatic Variables are special variables called automatic variables that can have a different value for each rule in a Makefile and are designed to make writing rules simpler. They can only be used in the command portion of a rule!

Here are some common automatic variables:

$@ represents the name of the current rule’s target.
$^ represents the names of all of the current rule’s dependencies, with spaces in between.
$< represents the name of the current rule’s first dependency.

If we wanted to stop using test_reverse.c and reverse.c to avoid repetitiveness, we could rewrite our target like this:

reverse_test: test_reverse.c reverse.c reverse.h
    gcc $(CFLAGS) $^ -o $@

Task: Use regular variables (i.e. CFLAGS) and automatic variables simplify your Makefile and add the -O3 flag.

Note: you can do MY_VAR += <additional flags> if you want to compile with more flags and only use one variable.

Phony Targets

There are also targets known as ‘phony’ targets. These are targets that themselves create no files, but rather exist to provide shortcuts for doing other common operations, like making all the targets in our Makefile or getting rid of all the executables that we made.

To mark targets as phony, you need to include this line before any targets in your Makefile:

.PHONY: target1 target2 etc.

Why do we need to declare a target as phony?

To avoid a conflict with a file of the same name: We learned earlier that targets will only execute their <shell_command> if the target file is out-of-date. This is problematic because phony targets generally don’t create files under the target name. If somehow there exists a file under the same name as a phony target, the phony target’s command will never be run. You can avoid this by explicitly declaring a target as phony to specify to the make tool to rebuild the target even if it’s not “out-of-date”.
To improve performance: there’s also a more advanced performance advantage that you can learn more about here.

Here are some common phony targets that we’ll be using in this course:

all target

We use the all target to make all of the executables (non-phony targets) in our project simultaneously. This is what it generally looks like:

all: target1 target2 target3

As you can see, there are no shell commands associated with the all target. In fact, we don’t need to include shell commands for all, because by including each target (target1, target2, target3) as dependencies for the all target, the Makefile will automatically build those targets in order to fulfill the requirements of all.

In other words, since the all target depends on all the executables in a project, building the all target causes make to first build every other target in our Makefile.

clean target

We also have a target for getting rid of all the executables (and other files we created with make) in our project. This is the clean target.

The clean target generally looks like this:

clean:
    rm -f exec1 exec2 obj1.o obj2.o

As you can see, the clean target is fundamentally just a shell command to remove all the executables and object files that we made earlier. By convention, the clean target should remove all content automatically generated by make. It must be a phony target, because by definition, make clean doesn’t generate output files (but rather removes them)!

Note: Be careful which files you put after the rm -f command, as they will be deleted when you run make clean. Don’t put your .c or .h files because you might lose the code that you wrote!

format target

In this class, you will notice that all of the Makefiles will also contain a format target, which use a command called clang-format to style your .c and .h files following a specified standard. A typical format command would look like this:

format:
    clang-format -style=Google -i <file1>.h <file2>.c ...

The above command will format any listed files according to Google’s coding conventions (a set of stylistic and technical conventions that Google engineers agreed to use).

Note: When using this, keep in mind the order of your #include files. Formatting might change the order of include statements. This is something to consider if, for example, you are importing a header file that relies on standard libraries from the file you’re importing it in. To avoid this, make sure that your header files are self-contained (i.e., include all the headers they need).

check target

You’ll also notice a check target in the Makefiles we provide in future labs and projects. If you were to create a check target in this particular instance, the dependency for the check target is the reverse_test executable.

Task: Add all, clean, and format targets to your Makefile.

Running make without any targets will run the first target in your Makefile. Consequently, you should place the all target as the first target so that typing make will automatically generate all the executables.
Don’t forget to mark these targets as phony!

Simplifying Linking

It is often a good idea to break compilation of a large program into smaller sub-steps. Consider, for example, this command you used earlier:

gcc test_reverse.c reverse.c -o reverse_test

For this program, gcc creates two separate .o files, one for test_reverse.c and one for reverse.c and then links them together. But what if you had hundreds of source files?

Large vs. Small Projects: For small projects, the above works well. However, for large projects it can be much faster to generate intermediate .o files (so-called “object files”) and then separately link the .o files together into an executable. Linking is the process of combining multiple object files (which already contain machine code, but not a full program) into a full executable program.

Why does this make sense? Imagine a project that generates two shared libraries and four executables, all of which separately link a file called data.c. Let’s say the data.o file takes 1 second to compile. If you compile and link each executable in one command (without creating intermediate .o files), gcc will rebuild the data.o file five times, resulting in 5 seconds of build time. If you separately build the data.o file, you’ll build the data.c file only once (taking 1 second) and then link it (which is much faster than compiling from scratch, especially with large source files). So, if linking takes 0.2 seconds per file, the total build time will be 2 seconds instead of 5 seconds.

Although this technique won’t yield a huge performance benefit in the case of our small lab, let’s try this to drive the concept of linking home! We can then use our Makefile to automate this process for us, so that we don’t have to regenerate all object and source files every time we edit one of them.

To do this, we need to first generate object files for each file, containing the machine instructions. Then we need to link these programs together into one executable.

To create the object files without linking them, we use the -c flag when running gcc. For example, to create object files for test_reverse.c and reverse.c, we would run:

$ gcc <flags> -c reverse.c -o reverse.o
$ gcc <flags> -c test_reverse.c -o test_reverse.o

This will generate reverse.o and test_reverse.o files. Then, to link the object files into an executable, we would run:

$ gcc test_reverse.o reverse.o -o <executable name>`

The advantage of creating object files independently is that when a source file is changed, we only need to create the object file for that source file. For example, if we changed reverse.c, we would just have to run gcc -c reverse.c -o reverse.o to get the object file, and then gcc reverse.o test_reverse.o instead of also regenerating test_reverse.o to get the final executable.

Task: In your Makefile, create targets for test_reverse.o and reverse.o, that each include the corresponding source file as a dependency.

Each of these targets should compile their source file into an object file (not an executable). They also need their correct flags for optimization and debugging.
Update your reverse_test targets to use the .o files.
Update your clean and format targets.
Thanks to this, make will only recompile each individual object file if that file’s source was changed. It may not make the biggest difference for this lab, but in a larger project doing this will save you lots of time.

Pattern Rules

The last Makefile technique we’ll discuss are pattern rules. These are very commonly used in Makefiles. A pattern rule uses the % character in the target to create a general rule. As an example:

file_%: %.c
    gcc $< -o $@

The % will match any non empty substring in the target, and the % used in dependencies will substitute the target’s matched string. In this case, this will specify how to make any file_<name> executable with another file called <name>.c as a dependency. If <name>.c doesn’t exist or can’t be made, this will throw an error.

As you may have noticed, both the test_reverse.o and reverse.o targets are running the same command, which means that we can simplify it.

Task: Use pattern rules to simplify your Makefile targets such that you can generate reverse.o and test_reverse.o using only one rule rather than two seperate rules.

If you need help, this documentation might help.

Handin instructions

Turn in your code by pushing your git repository to csci0300-s24-labs-YOURUSERNAME.git.

Then, head to the grading server. On the “Labs” page, use the “Lab 1 checkoff” button to check off your lab.

Note: Lab checkoffs are tied to Git commits. So, when you check off Lab 1 (with a new commit), your grade for Lab 0 will disappear.

This is nothing to worry about! Your grade is still associated with the older commit, and if you select that commit from the dropdown on the grading server, you will be able to see the prior grade.

At the end of the semester, we will collate all lab grades across your commit history.