« Back to the main CS 300 website
Project 2: DMalloc
Project Due Friday, March 1st at 6:00 PM EST
Introduction
One of the unique challenges systems developers face is the explicit management of computer memory allocation. In other, higher-level programming languages, you work with variables/objects without really worrying about where they are in memory, how much space they are taking up, or when to give their memory back to the operating system.
In systems we take a different perspective: dynamic memory is allocated and deallocated manually by the programmer. This gives the programmer the ability to write faster and more efficient code, or low level programs that might not have super fancy languages or runtime environments to implicitly allocate (embedded systems, etc.). But, like most powerful systems assets, dynamic memory can lead to some tricky bugs! In this project you will create a debugging memory allocator to help catch some of these bugs. You can think of this as writing your own version of the AddressSanitizer that you will be familiar with from the first project!
More Context on Memory Allocation
Recall the memory layout a program sees in x86
Remember from lab 1 that there are three places in memory where we can store program data and accordingly there are three types of memory allocation.
-
Static memory allocation stores data in the data segement. This is used for global variables and variables declared with the static
keyword. The memory location of these variables does not change throughout the program. In other words, these allocations exist in the same "static" location determined by the compiler before runtime and are always available (e.g., recall how static
variables work in functions like strtok
). If static variables are initialized to a value, the compiler puts them into the initialized data segment; otherwise it puts them into a segment (confusingly) called bss. Remember: "static" refers to the location of a variable, not whether the program can modify it (that's what the const
keyword is for).
-
Automatic memory allocation stores data on the Stack. This is the memory location for local variables and arguments within functions. The allocation size for the stack is also determined at compile time, and the compiler decides how much the stack grows on a given function call. It is not "static" because the allocations come and go as we move through stack frames (function calls), and it is "automatic" because the lifetime of these allocations is determined by the compiler.
-
Dynamic memory allocation stores data on the heap. The lifetime of this type of memory allocation cannot be determined at compile time and is managed explicitly by the programmer. Dynamic memory allocation in C uses functions like malloc
and free
; C++ uses the new
and delete
keywords. The lifetime and size of dynamic memory allocations is entirely up to you!
This project focuses on dynamic memory allocation. We rarely have memory allocation problems when the compiler is doing deterministic work as with static and automatic allocation; however with dynamic allocation we can run into some issues.
Because memory is the foundation on which a program runs, memory errors and misuse are a particularly nasty set of bugs that can be really hard to deal with. Quite often memory errors will result in Undefined Behavior or compeletly fatal errors .
The purpose of this project is to get familiar with dynamic memory allocation functions as well as understand how the heap is used and some of the dangers associated with it.
More specifically, your task is to implement a special type of dynamic memory allocator that helps to debug dynamic memory usage. Your so-called dmalloc
program will:
- Track information about dynamic memory allocation.
- Catch common memory programming errors (e.g., freeing the same block twice, trying to allocate too much memory etc).
- Detect a write outside a dynamically allocated block (e.g., writing 65 bytes into a 64-byte piece of memory).
- Other more general programming errors.
This project will allow you to…
- Get used to C++ programming.
- Learn about memory representations of information.
- Be aware of common memory allocation errors.
- Really understand how
malloc
and the heap work!
Questions to Think About
Note: You are not required to answer the following questions to complete DMalloc, and they will not affect your grade for the project. They exist to help you think about how you might want to design your implementation for DMalloc. If you're failing tests or stuck debugging, these questions can help you think about edge cases you might not be accounting for!
- What are some situations where it is much better to use dynamic memory allocation over static or automatic memory allocation?
- How does the heap grow in memory? How is this different from how the stack grows? Why is this the case?
- Consider some pointer,
ptr
, that points to memory address 0x12f99a64
. What address does (char *) ptr + 1
point to? What address does (int *) ptr + 1
point to?
- Name and describe at least two potential memory errors that can arise from using
malloc
and free
incorrectly.
- How is checking for integer overflow in addition different than checking for integer overflow in multiplication? Think about both signed and unsigned integers.
Debugging Resources/GDB
A common way students debug code is by adding print statements everywhere and hoping for the best. While this may have worked in the past, and may continue to work, it is the debugging equivalent of kicking down a revolving door. GDB can, and should, become your best friend. It might take a little practice to get used to, but we guarentee GDB will make debugging faster and easier for you.
GDB: How To
Run GDB on the executable (in this case the test) using the gdb
command.
$ gdb [executable-name]
More
2. Set a breakpoint on the function of your choice using b [ function | file:line ]
(often this will be main
).
(gdb) b main
(gdb) b my_file.c:23
3. Run the program using r
, along with any arguments you want.
(gdb) r [arg1] [arg2] ...
4. Display the source code so you can follow along the execution.
(gdb) layout src
5. Step through your code using n
to execute one line and s
to step inside a function. (If you accidentally step into a function you didn't intend to, use finish
).
6. Print variables and pointers. The most useful commands are:
p [variable]
to print values
x/[type] [variable]
to interpret the variable as a pointer, and print the value at that location as a specific type. For examples, x/s
for strings, x/d
for decimal, x/x
for hexadecimal, etc. more
Debugging a Segfault
Run GDB on your executable (don't set any breakpoints) and then run your program with r
. Once it segfaults, run bt
to print a stack trace (the list of subsequent function calls that got you where you are now). You can now use layout src
to view the code, then up
and down
to view different stack frames.
more more more
Assignment
Installation
Ensure that your project repository has a handout
remote. Type:
$ git remote show handout
If this reports an error, run:
$ git remote add handout https://github.com/csci0300/cs300-s24-projects.git
Then run:
$ git pull
$ git pull handout main
This will merge our Project 2 (DMalloc) stencil code with your repository.
You may get a warning about "divergent branches" like shown below:
This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace):
$ git config pull.rebase false
$ git pull handout main
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You'll be doing all your work for this project inside the dmalloc
directory in the working copy of your projects repository.
Stencil Code
Standard Memory allocation in C
To do this project, it is important to understand the functions that are called when dynamically allocating memory in C. These are the functions you see in regular C programs:
malloc
void* malloc(size_t sz)
Allocates sz
bytes of memory and returns a pointer ptr
to the "new" memory block. This memory is not initialized (it can contain anything). Upon success, the sz
bytes of storage starting at address ptr
are guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. Upon failure, malloc
returns a null pointer (NULL
in C-speak, nullptr
in C++ speak). Failure can occur if sz
is too big, memory is exhausted, or for other reasons.
free
void free(void* ptr)
Free a single block of memory previously allocated by malloc
. The pointer argument, ptr
, must either be a null pointer or a pointer to active dynamically-allocated memory. More explicitly, it is NOT OK to call free if…
ptr
points to the program’s code, or into its global variables, or into the stack.
ptr
was not returned by a previous call to malloc
.
ptr
points to heap memory whose lifetime has ended (i.e., the memory block pointed to by ptr
was already freed and has not been reallocated).
calloc
void* calloc(size_t nmemb, size_t sz)
Allocates memory for an array of objects and clears the allocated block to zero. This is a "secondary" memory allocation function that itself makes a call to malloc
. Our stencil code includes an implementation of this method with a bug that you will fix when testing!
Using these functions:
Dynamically-allocated memory remains active until it is explicitly freed with a call to free. The rules of malloc and free are simple: allocate once, then free exactly once.
Some quick notes on boundary cases:
- In C++,
malloc(0)
should return a non-null pointer. If ptr == malloc(0)
, then ptr
does not overlap with any other allocation and can be passed to free
.
free(nullptr)
is allowed. It does nothing.
malloc(sz)
returns memory whose alignment works for any object. On x86-64 computers, this means that the address value returned by malloc
must be a multiple of 16 (so its last hexadecimal digit is 0
).
base_malloc/base_free
In the stencil code, dmalloc
and dfree
make calls to base_malloc
and base_free
. Why do we avoid directly calling the C functions malloc
and free
? Because debugging allocators interact with undefined behavior!
Undefined behavior is something to be avoided because any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as setting your computer on fire or (our favorite example), force demons to fly out of your nose.
Undefined-behavior bugs with malloc
and free
are common, and debugging allocators must make some undefined behaviors well-defined. For example, when a debugging allocator is in force, a program with a simple double free bug will reliably print a specific error message and exit before any undefined behavior occurs.
This brings us back to base_malloc
and base_free
, which are defined in basealloc.cc
. This allocator behaves like malloc
and free
, but has the following additional properties:
base_free
does not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a future base_malloc
).
base_free
never returns freed memory to the operating system.
You don't really have to understand how the functions in basealloc.cc
work – just treat them like malloc
and free
. Yet, base_malloc
and base_free
are not standard library functions (like malloc
and free
), so we are able to change what is considered undefined behavior. However, we don't define everything! You will still need to take care of some undefined behavior (like double frees, invalid frees, and wild writes) in your debugger.
Testing
We have included a comprehensive set of tests for you to help you write your debugging memory allocator. Running make check
will run your code against all the tests in order. It’s a good idea to work on the tests in order: get the simple stuff working, then gradually add more complex functionality. In other words, you will have a better time with this assignment if you incrementally tackle tests one after another, rather than writing the full allocator in one go and then running the tests.
Open test001.cc
and look at its code. You will notice these comments at the end of the file:
//! alloc count: active 0 total 0 fail 0
//! alloc size: active 0 total 0 fail 0
These lines define the expected output for the test. The make check
command checks your actual output against the expected output and reports any discrepancies. (It does this by calling a Perl script called check.pl
; you don't have to worry about how this script works.) Don't change these lines – doing so may make tests pass for you locally, but the grading server will replace your changes with the original lines before it runs your code.
Super Helpful Testing Tips!
- To print differences for a single test, try
make check-N
, as in make check-1
. If you want to see the full output of a test, without worrying about comparing to expected output, run that test on the command line directly. For example:
$ make test001
$ ./test001
runs test001 and prints the full output. This can be very useful if you're getting errors from the sanitizers.
-
The tests are written such that your output is compared to an expected output. This means you should pay super close attention to make sure your output is exactly right. (One extra space will fail a test!) At the bottom of each test file, there are comments with the expected output.
-
If you see "???" in the expected output comment, that means the test will accept any symbols. In the example below, the test will pass regardless of the number your program prints after fail
//! alloc count: active 5 total 10 fail ???
-
We also use "??" in some tests to assign a "variable" to match in the output. Check out test20.cc for an example, you will need to print out the correct pointer.
Note: One really important thing to do in this assignment is to run your code with sanitizers enabled for tests 1 to 26. To do this, add SAN=1
to your make
invocation: for example, you might run make check SAN=1
to run all tests with the appropriate sanitizer flags set, or make check-1-26 SAN=1
to run tests 1-26 with sanitizers enabled. The grading server will run your code for tests 1-26 with the sanitizers enabled, so you must pass them (later tests use undefined behavior to test your allocator; they won't pass with sanitizers enabled). They'll also help you find many bugs, guaranteed!
make check SAN=1
takes care of running with the right sanitizers for each test, but here are the details:
- Tests 1-26 must pass the AddressSanitizer, but tests 2-7, 9, 12-13, 16-23 can have memory leaks (i.e., LeakSanitizer errors are okay).
- Tests 1, 8, 10, 11, 14, 15, 24, 28, 34, 35, 36, and 38 need to pass without memory leaks (i.e., no LeakSanitizer errors).
Assignment Roadmap
This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex and from low-numbered to high-numbered tests.
Statistics
In the stencil code there is a function print_statistics()
that uses the information stored in a statistics struct to print out statistics about all the memory allocations so far.
The statistics struct is defined as follows (see dmalloc.hh
):
struct dmalloc_stats {
unsigned long long nactive;
unsigned long long active_size;
unsigned long long ntotal;
unsigned long long total_size;
unsigned long long nfail;
unsigned long long fail_size;
uintptr_t heap_min;
uintptr_t heap_max;
};
It is up to you to fill in the function get_statistics(statistics *stats)
so that accurate information is stored in an instance of dmalloc_stats
. To do this, think about using a global variable that holds a struct and which any calls to dmalloc()
and dfree(ptr)
update.
Task: Implement the get_statistics
function in dmalloc.cc
.
Hints!
Most of these statistics are easy to track, and you should tackle them first (start by passing tests 1-5 and 7-10).
The trickier test is active_size
. To track this information, your dfree(ptr)
implementation must find the number of bytes allocated for the block pointed to by ptr
. This is because after a block is freed, active_size
should decrease by the size of that block.
The easiest, and probably best, way to do this is for your dmalloc
code to request more space than the user required when it calls base_malloc
. This extra space will allow you to store metadata about the allocated block alongside it. This metadata can store a structure you define yourself, and it should keep track of the allocated block size. As you write your debugging allocator, you may realize you want to store other information as well. The metadata will be stored at the beginning of an allocated block, but dmalloc
should still return a pointer to the "payload" of the block, i.e., to the space after the metadata. You will need to use pointer arithmetic to return a pointer that points somewhere just beyond the metadata. In dfree(ptr)
, the ptr
argument then must be a pointer to the payload of a block (because that's what dmalloc
returned before), but you can use pointer arithmetic again to access the metadata stored just before ptr
to access the block's size. Consider what bugs might emerge when adding metadata near your allocated block and how you might resolve them!
Quick reminders on pointer arithmetic:
- You can use the function
sizeof()
to get the size of a datatype.
base_malloc
returns a void*
, which is a pointer without a type. Before you can do pointer arithmetic, you will need to cast the void*
pointer to have a type. However, dmalloc
should still return a void*
.
- When you incremement a pointer, the pointer's address will jump ahead by the size of the pointer's datatype.
Run make check
to run the test programs. Specifically, test001 through test011 should pass when you are done with this section.
Integer overflow protection
Your debugging malloc library also needs to be robust against integer overflow attacks. Integer overflow occurs when you try to allocate a number of bytes that is too large. For example, the size_t
type of the first argument to dmalloc
is a type alias for an unsigned long data type, which is comprised of 8 bytes. When you compute the number of bytes to request and it exceeds the maximum possible integer, the value of the size_t
wraps around to 0. Consider, for example, the calloc
function in the stencil code, think about how overflow could occur, and then fix it.
Task: Implement the overflow protection in dmalloc()
and dcalloc()
.
test012 through test015 should pass after this is done correctly. Make sure that test015 succeeds on the grading server too; if you get a test timeout on the grading server, your overflow checks are not quite right.
Invalid free and double-free detection
dfree(ptr)
should print an error message and then call C’s abort()
function when ptr
does not point to active dynamically-allocated memory. There are several cases when this might occur.
The test programs define the desired error message format. Here’s our error message for test016:
MEMORY BUG: test016.cc:8: invalid free of pointer 0xffffffffffffffe0, not in heap
Error messages should be printed to standard error (using C’s fprintf(stderr, ...)
). Different error situations require different error messages; the other test programs define the required messages.
Task: Check for invalid and double frees in dfree()
and print an appropriate error message.
Hints!
-
ptr
needs to point somewhere on the heap. How are you tracking the allocated space on the heap?
-
You need to catch double frees (when free is called on the same pointer twice)! How can you determine whether or not a block is freed?
-
ptr
must be a pointer to the beginning of an allocated memory block, not just any address on the heap. Knowing your metadata header should be stored in memory right before ptr
, how can you guarantee you are actually accessing a header?
-
Don't be afraid to store more information in your metadata!
test016 through test024 should pass after this is done correctly.
Boundary write error detection
A boundary error is when a program reads or writes memory beyond the actual dimensions of an allocated memory block.
A debugging memory allocator can’t detect boundary read errors, but it can detect many boundary write errors. Your dfree(ptr)
should print an error message and call abort()
if it detects that the memory block associated with ptr
suffered a boundary write error.
Task: Check for boundary write errors in dfree()
and print an appropriate error message.
Hints!
As a strategy for detecting boundary write errors, consider adding data with a known ("secret") value around the boundaries of your allocation, and then check that value in appropriate places.
No debugging allocator can reliably detect all boundary write errors. For example, consider this:
int* array = (int*) malloc(10 * sizeof(int));
int secret = array[10]; // save boundary value
array[10] = 1384139431; // boundary write error
array[10] = secret; // restore old value! dmalloc can’t tell there was an error!
Or this:
int* array = (int*) malloc(10 * sizeof(int));
array[200000] = 0; // a boundary write error, but very far from the boundary!
We’re just expecting your code to catch common simple cases where the user writes one or more bytes directly after the allocated block.
test025 through test027 should pass after this is done correctly. Note that test 27 and following invoke undefined behavior (and make assumptions about the compiler's choices for it) in order to test your implementation. You don't need to pass these and the higher-numbered tests (27-39) with sanitizers enabled.
Memory leak reporting
A memory leak happens when code allocates a block of memory, but then forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don't cause a crash, and the operating system always reclaims all of a program's memory when the program exits. But in long-running programs, such as your browser or the operating system itself, memory leaks have serious effect and are important to avoid.
Fill in the print_leak_report()
function such that, when called, it prints a report about every allocated object in the system. This report should list every object that has been allocated via dmalloc
but not freed using dfree
. Print the report to standard output (not standard error). A report should look like this:
LEAK CHECK: test033.cc:23: allocated object 0x9b811e0 with size 19
LEAK CHECK: test033.cc:21: allocated object 0x9b81170 with size 17
LEAK CHECK: test033.cc:20: allocated object 0x9b81140 with size 16
LEAK CHECK: test033.cc:19: allocated object 0x9b81110 with size 15
LEAK CHECK: test033.cc:18: allocated object 0x9b810e0 with size 14
LEAK CHECK: test033.cc:16: allocated object 0x9b81080 with size 12
LEAK CHECK: test033.cc:15: allocated object 0x9b81050 with size 11
A programmer will use this leak checker by calling print_leak_report()
before exiting the program, after freeing all allocated dynamic memory they recall. Any missing frees will show up in the leak report!
Task: Implement print_leak_report()
.
Hints!
There are multiple ways to do this. Here are some things that might be helpful:
-
std::pair
– a C++ object that contains two data elements (similar to a tuple with two members).
-
std::map
– C++ key-value map. You will need to add #include <map>
at the top of dmalloc.cc to use this.
-
map::begin
– returns an iterator referring to the first element in the map container.
-
std::unordered_map
– C++ key-value hash map. You will need to add #include <unordered_map>
at the top of dmalloc.cc
to use it. If you get strange and scary compiler errors when using this data structure, it's likely because the compiler doesn't know how to hash your key type. Check out this guide for more information.
test028 through test030 should pass after this is done correctly.
Advanced reports and checking
This is the home stretch!
To make your debugger even more robust, you should print better information and defend against more complex invalid frees. You will need to read the test code and understand what is being tested to defend against it.
Update your invalid free message. After determining that a pointer is invalid, your code should check whether the pointer is inside a different allocated block. This will use the same structures you created for the leak checker. If the invalid pointer is inside another block, print out that block, like so:
MEMORY BUG: test031.cc:10: invalid free of pointer 0x833306c, not allocated
test031.cc:9: 0x833306c is 100 bytes inside a 2001 byte region allocated here
Also make sure your invalid free detector can handle more diabolical situations. What situations? Check the test code to find out!
Task: If an invalid free occurs within an already allocated block print an appropriate error message.
test031 through test033 should pass after this is done correctly.
Finally, test programs 34 to 39 test other situations that allow you to check that your debugging allocator is efficient and compatible with C++ style memory alocation (not just C style memory allocation).
test034 calls dmalloc
500,000 times, supplying tens of thousands of different filename/line-number pairs. Ideally, your solution should run test034 in a second or less.
Task: Meet the performance requirement!
All tests should pass after this is done correctly!
Note: C++ introduces a new paradigm for dynamic memory allocation with the new
and delete
keywords. These keywords are designed to work with C++ classes, but under the hood they work simply by using malloc
and free
respectively.
We encourage you to learn more and flex your coding muscles by doing some extra credit work! Extra credit is in no sense required to get an A, except if you're a graduate student taking CSCI 1310, in which case we require you to solve two of the below ideas.
Reallocation
Easy. Implement drealloc
. This function is based on the C standard library's realloc
, and should behave like this:
void* drealloc(void* ptr, size_t sz, const char* file, long line);
Sanitizer
Moderately difficult. Modern debugging memory allocators are tightly integrated with C and C++ compilers: the compilers generate data structures that the debugging allocators use at runtime to detect mistakes. This has many advantages, including efficiency and precision.
Research a debugging memory allocator integrated with a compiler and runtime of your choice, such as GCC/Clang's AddressSanitizer, and write a brief section in your README (under "Extra Credit"), describing how it works. You can do this by reading information online (with citation) and/or by running your own experiments.
Line Number Lookup
Difficult. Most of our test programs use macros that redefine malloc
and free
to supply filename and line number information. But C++ discourages the use of fancy macros, and C++ style allocation, such as the dbg_allocator
class used in test035
and up, has no place for macros to go. A C++ debugging memory allocator will use return address information to identify call sites.
Change your debugging allocator so that dbg_allocator
's allocate and deallocate functions pass information based on __builtin_return_address
to your allocator's functions. Then, make your allocator transform this information numbers into real filename–line number pairs just before printing. For instance, you can call out to an external program, such as Linux's addr2line
. But do not perform that transformation until printing, since the transformation is expensive.
Use ./test037
to check your work. Before your fixes, the LEAK CHECK
lines will start with ?:1;
afterwards, they should start with something like .../test037.cc:NN
.
A note on inlining
A compiler optimization called "inlining" may complicate your work, since inlining can cause __builtin_return_address
to return an unexpected address. Put the magic attribute [[gnu::noinline]]
before a function to prevent inlining for that function.
For example:
int f() {
}
[[gnu::noinline]] int f2() {
}
Handing In Your Code
As before, you will hand in your code using Git. In the dmalloc/
subdirectory of your project repository, you MUST fill in the text file called README.md
.
Remind me again what the README.md
should contain?
The README.md
file will include the following:
- Any design decisions you made and comments for graders, under "Design Overview". If there's nothing interesting to say, just list "None".
- Any collaborators and citations for help that you received, under "Collaborators". CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
- Ranking the difficulty and time consumption of the project, compared to other projects so far. We won't use this for grading, but we collect the data to calibrate for future years.
Grading breakdown:
- 100% (80 points) tested functionality (2 points per test, plus 2 for passing them all). If running
make check
reports that all tests pass, you've probably got all these points.
- Up to 10 points for extra credit.
Now head to the grading server, make sure that you have the "DMalloc" page configured correctly with your project repository, and check that all tests pass on the grading server.
Note: If the commit you want graded is not your most recent one, you should flag the one you want graded and the grading commit will be updated after the autograder finishes.
Congratulations, you've completed the second CS 300 project!
Acknowledgements: DMalloc was originally developed for Harvard's CS 61 course. We are grateful to Eddie Kohler for allowing us to use the assignment for CS 300.
« Back to the main CS 300 website
Project 2: DMalloc
Project Due Friday, March 1st at 6:00 PM EST
Introduction
One of the unique challenges systems developers face is the explicit management of computer memory allocation. In other, higher-level programming languages, you work with variables/objects without really worrying about where they are in memory, how much space they are taking up, or when to give their memory back to the operating system.
In systems we take a different perspective: dynamic memory is allocated and deallocated manually by the programmer. This gives the programmer the ability to write faster and more efficient code, or low level programs that might not have super fancy languages or runtime environments to implicitly allocate (embedded systems, etc.). But, like most powerful systems assets, dynamic memory can lead to some tricky bugs! In this project you will create a debugging memory allocator to help catch some of these bugs. You can think of this as writing your own version of the AddressSanitizer that you will be familiar with from the first project!
More Context on Memory Allocation
Recall the memory layout a program sees in x86
Remember from lab 1 that there are three places in memory where we can store program data and accordingly there are three types of memory allocation.
Static memory allocation stores data in the data segement. This is used for global variables and variables declared with the
static
keyword. The memory location of these variables does not change throughout the program. In other words, these allocations exist in the same "static" location determined by the compiler before runtime and are always available (e.g., recall howstatic
variables work in functions likestrtok
). If static variables are initialized to a value, the compiler puts them into the initialized data segment; otherwise it puts them into a segment (confusingly) called bss. Remember: "static" refers to the location of a variable, not whether the program can modify it (that's what theconst
keyword is for).Automatic memory allocation stores data on the Stack. This is the memory location for local variables and arguments within functions. The allocation size for the stack is also determined at compile time, and the compiler decides how much the stack grows on a given function call. It is not "static" because the allocations come and go as we move through stack frames (function calls), and it is "automatic" because the lifetime of these allocations is determined by the compiler.
Dynamic memory allocation stores data on the heap. The lifetime of this type of memory allocation cannot be determined at compile time and is managed explicitly by the programmer. Dynamic memory allocation in C uses functions like
malloc
andfree
; C++ uses thenew
anddelete
keywords. The lifetime and size of dynamic memory allocations is entirely up to you!This project focuses on dynamic memory allocation. We rarely have memory allocation problems when the compiler is doing deterministic work as with static and automatic allocation; however with dynamic allocation we can run into some issues.
Because memory is the foundation on which a program runs, memory errors and misuse are a particularly nasty set of bugs that can be really hard to deal with. Quite often memory errors will result in Undefined Behavior or compeletly fatal errors .
The purpose of this project is to get familiar with dynamic memory allocation functions as well as understand how the heap is used and some of the dangers associated with it.
More specifically, your task is to implement a special type of dynamic memory allocator that helps to debug dynamic memory usage. Your so-called
dmalloc
program will:This project will allow you to…
malloc
and the heap work!Questions to Think About
Note: You are not required to answer the following questions to complete DMalloc, and they will not affect your grade for the project. They exist to help you think about how you might want to design your implementation for DMalloc. If you're failing tests or stuck debugging, these questions can help you think about edge cases you might not be accounting for!
ptr
, that points to memory address0x12f99a64
. What address does(char *) ptr + 1
point to? What address does(int *) ptr + 1
point to?malloc
andfree
incorrectly.Debugging Resources/GDB
A common way students debug code is by adding print statements everywhere and hoping for the best. While this may have worked in the past, and may continue to work, it is the debugging equivalent of kicking down a revolving door. GDB can, and should, become your best friend. It might take a little practice to get used to, but we guarentee GDB will make debugging faster and easier for you.
GDB: How To
Run GDB on the executable (in this case the test) using the
gdb
command.$ gdb [executable-name]
More
2. Set a breakpoint on the function of your choice using
b [ function | file:line ]
(often this will bemain
).3. Run the program using
r
, along with any arguments you want.4. Display the source code so you can follow along the execution.
5. Step through your code using
n
to execute one line ands
to step inside a function. (If you accidentally step into a function you didn't intend to, usefinish
).6. Print variables and pointers. The most useful commands are:
p [variable]
to print valuesx/[type] [variable]
to interpret the variable as a pointer, and print the value at that location as a specific type. For examples,x/s
for strings,x/d
for decimal,x/x
for hexadecimal, etc. moreDebugging a Segfault
Run GDB on your executable (don't set any breakpoints) and then run your program with
r
. Once it segfaults, runbt
to print a stack trace (the list of subsequent function calls that got you where you are now). You can now uselayout src
to view the code, thenup
anddown
to view different stack frames.more more more
Assignment
Installation
Ensure that your project repository has a
handout
remote. Type:If this reports an error, run:
Then run:
This will merge our Project 2 (DMalloc) stencil code with your repository.
You may get a warning about "divergent branches" like shown below:
This means you need to set your Git configuration to merge our code by default. To do so, run (within the same directory in your labs repository workspace):
$ git config pull.rebase false $ git pull handout main
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You'll be doing all your work for this project inside the
dmalloc
directory in the working copy of your projects repository.Stencil Code
Standard Memory allocation in C
To do this project, it is important to understand the functions that are called when dynamically allocating memory in C. These are the functions you see in regular C programs:
malloc
Allocates
sz
bytes of memory and returns a pointerptr
to the "new" memory block. This memory is not initialized (it can contain anything). Upon success, thesz
bytes of storage starting at addressptr
are guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. Upon failure,malloc
returns a null pointer (NULL
in C-speak,nullptr
in C++ speak). Failure can occur ifsz
is too big, memory is exhausted, or for other reasons.free
Free a single block of memory previously allocated by
malloc
. The pointer argument,ptr
, must either be a null pointer or a pointer to active dynamically-allocated memory. More explicitly, it is NOT OK to call free if…ptr
points to the program’s code, or into its global variables, or into the stack.ptr
was not returned by a previous call tomalloc
.ptr
points to heap memory whose lifetime has ended (i.e., the memory block pointed to byptr
was already freed and has not been reallocated).calloc
Allocates memory for an array of objects and clears the allocated block to zero. This is a "secondary" memory allocation function that itself makes a call to
malloc
. Our stencil code includes an implementation of this method with a bug that you will fix when testing!Using these functions:
Dynamically-allocated memory remains active until it is explicitly freed with a call to free. The rules of malloc and free are simple: allocate once, then free exactly once.
Some quick notes on boundary cases:
malloc(0)
should return a non-null pointer. Ifptr == malloc(0)
, thenptr
does not overlap with any other allocation and can be passed tofree
.free(nullptr)
is allowed. It does nothing.malloc(sz)
returns memory whose alignment works for any object. On x86-64 computers, this means that the address value returned bymalloc
must be a multiple of 16 (so its last hexadecimal digit is0
).base_malloc/base_free
In the stencil code,
dmalloc
anddfree
make calls tobase_malloc
andbase_free
. Why do we avoid directly calling the C functionsmalloc
andfree
? Because debugging allocators interact with undefined behavior!Undefined behavior is something to be avoided because any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as setting your computer on fire or (our favorite example), force demons to fly out of your nose.
Undefined-behavior bugs with
malloc
andfree
are common, and debugging allocators must make some undefined behaviors well-defined. For example, when a debugging allocator is in force, a program with a simple double free bug will reliably print a specific error message and exit before any undefined behavior occurs.This brings us back to
base_malloc
andbase_free
, which are defined inbasealloc.cc
. This allocator behaves likemalloc
andfree
, but has the following additional properties:base_free
does not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a futurebase_malloc
).base_free
never returns freed memory to the operating system.You don't really have to understand how the functions in
basealloc.cc
work – just treat them likemalloc
andfree
. Yet,base_malloc
andbase_free
are not standard library functions (likemalloc
andfree
), so we are able to change what is considered undefined behavior. However, we don't define everything! You will still need to take care of some undefined behavior (like double frees, invalid frees, and wild writes) in your debugger.Testing
We have included a comprehensive set of tests for you to help you write your debugging memory allocator. Running
make check
will run your code against all the tests in order. It’s a good idea to work on the tests in order: get the simple stuff working, then gradually add more complex functionality. In other words, you will have a better time with this assignment if you incrementally tackle tests one after another, rather than writing the full allocator in one go and then running the tests.Open
test001.cc
and look at its code. You will notice these comments at the end of the file:These lines define the expected output for the test. The
make check
command checks your actual output against the expected output and reports any discrepancies. (It does this by calling a Perl script calledcheck.pl
; you don't have to worry about how this script works.) Don't change these lines – doing so may make tests pass for you locally, but the grading server will replace your changes with the original lines before it runs your code.Super Helpful Testing Tips!
make check-N
, as inmake check-1
. If you want to see the full output of a test, without worrying about comparing to expected output, run that test on the command line directly. For example:$ make test001 $ ./test001
runs test001 and prints the full output. This can be very useful if you're getting errors from the sanitizers.
Note: One really important thing to do in this assignment is to run your code with sanitizers enabled for tests 1 to 26. To do this, add
SAN=1
to yourmake
invocation: for example, you might runmake check SAN=1
to run all tests with the appropriate sanitizer flags set, ormake check-1-26 SAN=1
to run tests 1-26 with sanitizers enabled. The grading server will run your code for tests 1-26 with the sanitizers enabled, so you must pass them (later tests use undefined behavior to test your allocator; they won't pass with sanitizers enabled). They'll also help you find many bugs, guaranteed!make check SAN=1
takes care of running with the right sanitizers for each test, but here are the details:Assignment Roadmap
This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex and from low-numbered to high-numbered tests.
Statistics
In the stencil code there is a function
print_statistics()
that uses the information stored in a statistics struct to print out statistics about all the memory allocations so far.The statistics struct is defined as follows (see
dmalloc.hh
):It is up to you to fill in the function
get_statistics(statistics *stats)
so that accurate information is stored in an instance ofdmalloc_stats
. To do this, think about using a global variable that holds a struct and which any calls todmalloc()
anddfree(ptr)
update.Task: Implement the
get_statistics
function indmalloc.cc
.Hints!
Most of these statistics are easy to track, and you should tackle them first (start by passing tests 1-5 and 7-10).
The trickier test is
active_size
. To track this information, yourdfree(ptr)
implementation must find the number of bytes allocated for the block pointed to byptr
. This is because after a block is freed,active_size
should decrease by the size of that block.The easiest, and probably best, way to do this is for your
dmalloc
code to request more space than the user required when it callsbase_malloc
. This extra space will allow you to store metadata about the allocated block alongside it. This metadata can store a structure you define yourself, and it should keep track of the allocated block size. As you write your debugging allocator, you may realize you want to store other information as well. The metadata will be stored at the beginning of an allocated block, butdmalloc
should still return a pointer to the "payload" of the block, i.e., to the space after the metadata. You will need to use pointer arithmetic to return a pointer that points somewhere just beyond the metadata. Indfree(ptr)
, theptr
argument then must be a pointer to the payload of a block (because that's whatdmalloc
returned before), but you can use pointer arithmetic again to access the metadata stored just beforeptr
to access the block's size. Consider what bugs might emerge when adding metadata near your allocated block and how you might resolve them!Quick reminders on pointer arithmetic:
sizeof()
to get the size of a datatype.base_malloc
returns avoid*
, which is a pointer without a type. Before you can do pointer arithmetic, you will need to cast thevoid*
pointer to have a type. However,dmalloc
should still return avoid*
.Run
make check
to run the test programs. Specifically, test001 through test011 should pass when you are done with this section.Integer overflow protection
Your debugging malloc library also needs to be robust against integer overflow attacks. Integer overflow occurs when you try to allocate a number of bytes that is too large. For example, the
size_t
type of the first argument todmalloc
is a type alias for an unsigned long data type, which is comprised of 8 bytes. When you compute the number of bytes to request and it exceeds the maximum possible integer, the value of thesize_t
wraps around to 0. Consider, for example, thecalloc
function in the stencil code, think about how overflow could occur, and then fix it.Task: Implement the overflow protection in
dmalloc()
anddcalloc()
.test012 through test015 should pass after this is done correctly. Make sure that test015 succeeds on the grading server too; if you get a test timeout on the grading server, your overflow checks are not quite right.
Invalid free and double-free detection
dfree(ptr)
should print an error message and then call C’sabort()
function whenptr
does not point to active dynamically-allocated memory. There are several cases when this might occur.The test programs define the desired error message format. Here’s our error message for test016:
Error messages should be printed to standard error (using C’s
fprintf(stderr, ...)
). Different error situations require different error messages; the other test programs define the required messages.Task: Check for invalid and double frees in
dfree()
and print an appropriate error message.Hints!
ptr
needs to point somewhere on the heap. How are you tracking the allocated space on the heap?You need to catch double frees (when free is called on the same pointer twice)! How can you determine whether or not a block is freed?
ptr
must be a pointer to the beginning of an allocated memory block, not just any address on the heap. Knowing your metadata header should be stored in memory right beforeptr
, how can you guarantee you are actually accessing a header?Don't be afraid to store more information in your metadata!
test016 through test024 should pass after this is done correctly.
Boundary write error detection
A boundary error is when a program reads or writes memory beyond the actual dimensions of an allocated memory block.
A debugging memory allocator can’t detect boundary read errors, but it can detect many boundary write errors. Your
dfree(ptr)
should print an error message and callabort()
if it detects that the memory block associated withptr
suffered a boundary write error.Task: Check for boundary write errors in
dfree()
and print an appropriate error message.Hints!
As a strategy for detecting boundary write errors, consider adding data with a known ("secret") value around the boundaries of your allocation, and then check that value in appropriate places.
No debugging allocator can reliably detect all boundary write errors. For example, consider this:
Or this:
We’re just expecting your code to catch common simple cases where the user writes one or more bytes directly after the allocated block.
test025 through test027 should pass after this is done correctly. Note that test 27 and following invoke undefined behavior (and make assumptions about the compiler's choices for it) in order to test your implementation. You don't need to pass these and the higher-numbered tests (27-39) with sanitizers enabled.
Memory leak reporting
A memory leak happens when code allocates a block of memory, but then forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don't cause a crash, and the operating system always reclaims all of a program's memory when the program exits. But in long-running programs, such as your browser or the operating system itself, memory leaks have serious effect and are important to avoid.
Fill in the
print_leak_report()
function such that, when called, it prints a report about every allocated object in the system. This report should list every object that has been allocated viadmalloc
but not freed usingdfree
. Print the report to standard output (not standard error). A report should look like this:A programmer will use this leak checker by calling
print_leak_report()
before exiting the program, after freeing all allocated dynamic memory they recall. Any missing frees will show up in the leak report!Task: Implement
print_leak_report()
.Hints!
There are multiple ways to do this. Here are some things that might be helpful:
std::pair
– a C++ object that contains two data elements (similar to a tuple with two members).std::map
– C++ key-value map. You will need to add#include <map>
at the top of dmalloc.cc to use this.map::begin
– returns an iterator referring to the first element in the map container.std::unordered_map
– C++ key-value hash map. You will need to add#include <unordered_map>
at the top ofdmalloc.cc
to use it. If you get strange and scary compiler errors when using this data structure, it's likely because the compiler doesn't know how to hash your key type. Check out this guide for more information.test028 through test030 should pass after this is done correctly.
Advanced reports and checking
This is the home stretch!
To make your debugger even more robust, you should print better information and defend against more complex invalid frees. You will need to read the test code and understand what is being tested to defend against it.
Update your invalid free message. After determining that a pointer is invalid, your code should check whether the pointer is inside a different allocated block. This will use the same structures you created for the leak checker. If the invalid pointer is inside another block, print out that block, like so:
Also make sure your invalid free detector can handle more diabolical situations. What situations? Check the test code to find out!
Task: If an invalid free occurs within an already allocated block print an appropriate error message.
test031 through test033 should pass after this is done correctly.
Performance and C++ integration
Finally, test programs 34 to 39 test other situations that allow you to check that your debugging allocator is efficient and compatible with C++ style memory alocation (not just C style memory allocation).
test034 calls
dmalloc
500,000 times, supplying tens of thousands of different filename/line-number pairs. Ideally, your solution should run test034 in a second or less.Task: Meet the performance requirement!
All tests should pass after this is done correctly!
Note: C++ introduces a new paradigm for dynamic memory allocation with the
new
anddelete
keywords. These keywords are designed to work with C++ classes, but under the hood they work simply by usingmalloc
andfree
respectively.Extra Credit
We encourage you to learn more and flex your coding muscles by doing some extra credit work! Extra credit is in no sense required to get an A, except if you're a graduate student taking CSCI 1310, in which case we require you to solve two of the below ideas.
Reallocation
Easy. Implement
drealloc
. This function is based on the C standard library'srealloc
, and should behave like this:/// drealloc(ptr, sz, file, line) /// Reallocate the dynamic memory pointed to by `ptr` to hold at least /// `sz` bytes, returning a pointer to the new block. If `ptr` is /// `nullptr`, behaves like `dmalloc(sz, file, line)`. If `sz` is 0, /// behaves like `dfree(ptr, file, line)`. The allocation request /// was at location `file`:`line`. void* drealloc(void* ptr, size_t sz, const char* file, long line);
Sanitizer
Moderately difficult. Modern debugging memory allocators are tightly integrated with C and C++ compilers: the compilers generate data structures that the debugging allocators use at runtime to detect mistakes. This has many advantages, including efficiency and precision.
Research a debugging memory allocator integrated with a compiler and runtime of your choice, such as GCC/Clang's AddressSanitizer, and write a brief section in your README (under "Extra Credit"), describing how it works. You can do this by reading information online (with citation) and/or by running your own experiments.
Line Number Lookup
Difficult. Most of our test programs use macros that redefine
malloc
andfree
to supply filename and line number information. But C++ discourages the use of fancy macros, and C++ style allocation, such as thedbg_allocator
class used intest035
and up, has no place for macros to go. A C++ debugging memory allocator will use return address information to identify call sites.Change your debugging allocator so that
dbg_allocator
's allocate and deallocate functions pass information based on__builtin_return_address
to your allocator's functions. Then, make your allocator transform this information numbers into real filename–line number pairs just before printing. For instance, you can call out to an external program, such as Linux'saddr2line
. But do not perform that transformation until printing, since the transformation is expensive.Use
./test037
to check your work. Before your fixes, theLEAK CHECK
lines will start with?:1;
afterwards, they should start with something like.../test037.cc:NN
.A note on inlining
A compiler optimization called "inlining" may complicate your work, since inlining can cause__builtin_return_address
to return an unexpected address. Put the magic attribute[[gnu::noinline]]
before a function to prevent inlining for that function.For example:
int f() { // f might be inlined into its caller } [[gnu::noinline]] int f2() { // f2 will not be inlined into its caller }
Handing In Your Code
As before, you will hand in your code using Git. In the
dmalloc/
subdirectory of your project repository, you MUST fill in the text file calledREADME.md
.Remind me again what the
README.md
should contain?The
README.md
file will include the following:Grading breakdown:
make check
reports that all tests pass, you've probably got all these points.Now head to the grading server, make sure that you have the "DMalloc" page configured correctly with your project repository, and check that all tests pass on the grading server.
Note: If the commit you want graded is not your most recent one, you should flag the one you want graded and the grading commit will be updated after the autograder finishes.
Congratulations, you've completed the second CS 300 project!
Acknowledgements: DMalloc was originally developed for Harvard's CS 61 course. We are grateful to Eddie Kohler for allowing us to use the assignment for CS 300.