« Back to the main CS 131 website
Project 2: DMalloc
Collaborative Hours: Feb 15 – Feb 21
TA Hours: Feb 22 – March 2
Conceptual Questions Due Feb 24, 2020 at 6:00pm
Project Code Due Feb 28, 2020 at 6:00pm
Introduction
One of the unique challenges systems developers face is the explicit management of computer memory allocation. In other, higher-level programming languages, you work with variables/objects without really worrying about where they are in memory, how much space they are taking up, or when to give their memory back to the operating system.
In systems we take a different perspective: dynamic memory is allocated and deallocated manually by the programmer. This gives the programmer the ability to write faster and more efficient code, or low level programs that might not have super fancy languages or runtime environments to implicitly allocate (embedded systems etc). But, like most powerful systems assets, dynamic memory can lead to some tricky bugs! In this project you will create a debugging memory allocator to help catch some of these bugs.
  More Context on Memory Allocation  
Recall the memory layout a program sees in x86
 

 
Remember from lab 1 that there are 3 places in memory where we can store program data and accordingly there are 3 types of memory allocation.
- 
Static memory allocation stores data in the data segement. This is used for global variables and variables declared with the statickeyword. The memory location of these variables does not change throughout the program. In other words, these allocations exist in the same “static” location determined by the compiler before runtime and are always available (e.g., recall howstaticvariables work in functions likestrtok). If static variables are initialized to a value, the compiler puts them into the initialized data segment; otherwise it puts them into a segment (confusingly) called bss. Remember: “static” refers to the location of a variable, not whether the program can modify it (that’s what theconstkeyword is for).
 
- 
Automatic memory allocation stores data on the Stack. This is the memory location for local variables and arguments within functions. The allocation size for the stack is also determined at compile time, and the compiler decides how much the stack grows on a given function call. It is not “static” because the allocations come and go as we move through stack frames (function calls), and it is “automatic” because the lifetime of these allocations is determined by the compiler. 
- 
Dynamic memory allocation stores data on the heap. The lifetime of this type of memory allocation cannot be determined at compile time and is managed explicitly by the programmer. Dynamic memory allocation in C uses functions like mallocandfree; C++ uses thenewanddeletekeywords. The lifetime and size of dynamic memory allocations is entirely up to you!
 
This project focuses on dynamic memory allocation. We rarely have memory allocation problems when the compiler is doing deterministic work as with static and automatic allocation; however with dynamic allocation we can run into some issues.
Because memory is the foundation on which a program runs, memory errors and misuse are a particularly nasty set of bugs that can be really hard to deal with. Quite often memory errors will result in Undefined Behavior or compeletly fatal errors  .
.
The purpose of this project is to get familiar with dynamic memory allocation functions as well as understand how the heap is used and some of the dangers associated with it.
 
 More specifically, your task is to implement a special type of dynamic memory allocator that helps to debug dynamic memory usage. Your so-called dmalloc program will:
- Track information about dynamic memory allocation
- Catch common memory programming errors (e.g. freeing the same block twice, trying to allocate too much memory etc)
- Detect a write outside a dynamically allocated block (e.g., writing 65 bytes into a 64-byte piece of memory)
- Other more general programming errors
This project will allow you to…
- Get used to C++ programming
- Learn about memory representations of information
- Be aware of common memory allocation errors
- Really understand how mallocand the heap work!
Warning: It is important to get started early: this project is not trivial even though it does not require a lot of code.
 
Conceptual Questions
Instructions
- Write your answers to the following questions in your README.
- You should be able to answer these questions without any project-specific knowledge.
- We strongly recommend that you answer these questions as you complete the project, rather than waiting until after finishing the project code. Answering the questions will help you immensely in understanding the concepts behind the code that you’ll be writing.
Note: Answers to these conceptual questions are due Feb 24 at 6:00pm, which is 4 days prior to the final project deadline. In order to submit your answers, create a commit in your project repository (before the due date) that includes your README with your answers.
Questions
- Describe a situation when it would be much better to use dynamic memory allocation over static or automatic memory allocation.
- How does the heap grow in memory? How is this different from how the stack grows? Why is this the case?
- What is a structin C and why is it useful?
- Consider some pointer, ptr, that points to memory address0x12f99a64. What address does(char *) ptr + 1point to? What address does(int *) ptr + 1point to? Assume 32 bit integer sizes.
- What is a memory leak, and why can it create serious problems?
- What are some other potential issues that can arise from using mallocandfree?
- How is checking for integer overflow in addition different than checking for integer overflow in multiplication?
Debugging Resources/GDB
A common way students debug code is by adding print statements everywhere and hoping for the best. While this may have worked in the past, and may continue to work, it is the debugging equivalent of kicking down a revolving door. GDB can, and should, become your best friend. It might take a little practice to get used to, but we guarentee GDB will make debugging faster and easier for you.
GDB Testimonials
“I don’t know what I would do without GDB.”
 Bill Gates
“GDB keeps me company on those cold, lonely nights.”
 Elon Musk
“Without GDB man is nothing.”
 Barack Obama
“The new craze: millenials are killing the print statement industry in favor of GDB.”
 The Wall Street Journal
“GNU’s Debugger? More like… God’s Debugger!”
 God
GDB: How To
Run GDB on the executable (in this case the test) using the gdb command.
$ gdb [executable-name]
More
2. Set a breakpoint on the function of your choice using b [ function | file:line ] (often this will be main).
(gdb) b main
(gdb) b my_file.c:23
3. Run the program using r, along with any arguments you want.
(gdb) r [arg1] [arg2] ...
4. Display the source code so you can follow along the execution.
(gdb) layout src
5. Step through your code using n to execute one line and s to step inside a function. (If you accidentally step into a function you didn’t intend to, use finish).
6. Print variables and pointers. The most useful commands are:
- p [variable]to print values
- x/[type] [variable]to interpret the variable as a pointer, and print the value at that location as a specific type. For examples,- x/sfor strings,- x/dfor decimal,- x/xfor hexadecimal, etc. more
Debugging a Segfault
Run GDB on your executable (don’t set any breakpoints) and then run your program with r. Once it segfaults, run bt to print a stack trace (the list of subsequent function calls that got you where you are now). You can now use layout src to view the code, then up and down to view different stack frames.
more more more
Assignment
Installation
Ensure that your project repository has a handout remote. Type:
$ git remote show handout
If this reports an error, run:
$ git remote add handout https://github.com/csci1310/cs131-s20-projects.git
Then run:
$ git pull
$ git pull handout master
This will merge our Project 2 (DMalloc) stencil code with your repository.
Once you have a local working copy of the repository that is up to date with our stencils,  you are good to proceed. You’ll be doing all your work for this project inside the dmalloc directory in the working copy of your projects repository.
Infrastructure Help
Stencil Code
Standard Memory allocation in C
To do this project, it is important to understand the functions that are called when dynamically allocating memory in C. These are the functions you see in regular C programs:
malloc  
void* malloc(size_t sz)
Allocates sz bytes of memory and returns a pointer ptr to the “new” memory block. This memory is not initialized (it can contain anything). Upon success, the sz bytes of storage starting at address ptr are guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. Upon failure, malloc returns a null pointer (NULL in C-speak, nullptr in C++ speak). Failure can occur if sz is too big, memory is exhausted, or for other reasons.
 
 
free  
void free(void* ptr)
Free a single block of memory previously allocated by malloc. The pointer argument, ptr, must either be a null pointer or a pointer to active dynamically-allocated memory. More explicitly, it is NOT OK to call free if…
- ptrpoints to the program’s code, or into its global variables, or into the stack.
- ptrwas not returned by a previous call to- malloc.
- ptrpoints to heap memory whose lifetime has ended (i.e., the memory block pointed to by- ptrwas already freed and has not been reallocated).
 
 
calloc  
void* calloc(size_t nmemb, size_t sz)
Allocates memory for an array of objects and clears the allocated block to zero. This is a “secondary” memory allocation function that itself makes a call to malloc. Our stencil code includes an implementation of this method with a bug that you will fix when testing!
 
 Using these functions:
Dynamically-allocated memory remains active until it is explicitly freed with a call to free. The rules of malloc and free are simple: allocate once, then free exactly once.
Some quick notes on boundary cases:
- In C++, malloc(0)should return a non-null pointer. Ifptr == malloc(0), thenptrdoes not overlap with any other allocation and can be passed tofree.
- free(nullptr)is allowed. It does nothing.
- malloc(sz)returns memory whose alignment works for any object. On x86-64 computers, this means that the address value returned by- mallocmust be a multiple of 16 (so its last hexadecimal digit is- 0).
base_malloc/base_free
In the stencil code, dmalloc and dfree make calls to base_malloc and base_free. Why do we avoid directly calling the C functions malloc and free? Because debugging allocators interact with undefined behavior! 
Undefined behavior is something to be avoided because any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as setting your computer on fire or (our favorite example), force demons to fly out of your nose.
Undefined-behavior bugs with malloc and free are common, and debugging allocators must make some undefined behaviors well-defined. For example, when a debugging allocator is in force, a program with a simple double free bug will reliably print a specific error message and exit before any undefined behavior occurs.
This brings us back to base_malloc and base_free, which are defined in basealloc.cc. This allocator behaves like malloc and free, but has the following additional properties:
- base_freedoes not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a future- base_malloc).
- base_freenever returns freed memory to the operating system.
You don’t really have to understand how the functions in basealloc.cc work – just treat them like malloc and free. Yet, base_malloc and base_free are not standard library functions (like malloc and free), so we are able to change what is considered undefined behavior. However, we don’t define everything! You will still need to take care of some undefined behavior (like double frees, invalid frees, and wild writes) in you debugger.
Testing
We have included a comprehensive set of tests for you to help you write your debugging memory allocator. Running make check will run your code against all the tests in order. It’s a good idea to work on the tests in order: get the simple stuff working, then gradually add more complex functionality. In other words, you will have a better time with this assignment if you incrementally tackle tests one after another, rather than writing the full allocator in one go and then running the tests.
Open test001.cc and look at its code. You will notice these comments at the end of the file:
//! alloc count: active          0   total          0   fail          0
//! alloc size:  active          0   total          0   fail          0
These lines define the expected output for the test. The make check command checks your actual output against the expected output and reports any discrepancies. (It does this by calling a Perl script called check.pl; you don’t have to worry about how this script works.) Don’t change these lines – doing so may make tests pass for you locally, but the grading server will replace your changes with the original lines before it runs your code.
 Super Helpful Testing Tips! 
- To print differences for a single test, try make check-N, as inmake check-1. If you want to see the full output of a test, without worrying about comparing to expected output, run that test on the command line directly. For example:$ make test001
$ ./test001 
 runs test001 and prints the full output. This can be very useful if you’re getting errors from the sanitizers. 
- 
The tests are written such that your output is compared to an expected output. This means you should pay super close attention to make sure your output is exactly right. (One extra space will fail a test!) At the bottom of each test file, there are comments with the expected output. 
- 
If you see "???" in the expected output comment, that means the test will accept any symbols. In the example below, the test will pass regardless of the number your program prints after fail 
//! alloc count: active          5   total         10   fail        ???
 
- 
We also use "??" in some tests to assign a "variable" to match in the output. Check out test20.cc for an example, you will need to print out the correct pointer. 
 
 Note: One really important thing to do in this assignment is to run your code with sanitizers enabled for tests 1 to 26. To do this, add SAN=1 to your make invocation: for example, you might run make SAN=1 to build all code with the sanitizer flags set, or make check-1-26 SAN=1 to run the tests with sanitizers enabled. The grading server will run your code for tests 1-26 with the sanitizers enabled, so you must pass them (later tests use undefined behavior to test your allocator; they won’t pass with sanitizers enabled). They’ll also help you find many bugs, guaranteed!
Assignment Roadmap
This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex and from low-numbered to high-numbered tests.
Statistics
In the stencil code there is a function print_statistics() that uses the information stored in a statistics struct to print out statistics about all the memory allocations so far.
The statistics struct is defined as follows (see dmalloc.hh):
struct dmalloc_stats {
    unsigned long long nactive;           
    unsigned long long active_size;       
    unsigned long long ntotal;            
    unsigned long long total_size;        
    unsigned long long nfail;             
    unsigned long long fail_size;         
    uintptr_t heap_min;                   
    uintptr_t heap_max;                   
}; 
It is up to you to fill in the function get_statistics(statistics *stats) so that accurate information is stored in an instance of dmalloc_stats. To do this, think about using a global variable that holds a struct and which any calls to dmalloc() and dfree(ptr) update.
Task:  Implement the get_statistics function in dmalloc.cc.
 
Hints!
Most of these statistics are easy to track, and you should tackle them first (start by passing tests 1-5 and 7-10).
The trickier test is active_size. To track this information, your dfree(ptr) implementation must find the number of bytes allocated for the block pointed to by ptr. This is because after a block is freed, active_size should decrease by the size of that block.
The easiest, and probably best, way to do this is for your dmalloc code to request more space than the user required when it calls base_malloc. This extra space will allow you to store metadata about the allocated block alongside it. This metadata can store a structure you define yourself, and it should keep track of the allocated block size. As you write your debugging allocator, you may realize you want to store other information as well. The metadata will be stored at the beginning of an allocated block, but dmalloc should still return a pointer to the “payload” of the block, i.e., to the space after the metadata. You will need to use pointer arithmetic to return a pointer that points somewhere just beyond the metadata. In dfree(ptr), the ptr argument then must be a pointer to the payload of a block (because that’s what dmalloc returned before), but you can use pointer arithmetic again to access the metadata stored just before ptr to access the block’s size.

Quick reminders on pointer arithmetic:
- You can use the function sizeof()to get the size of a datatype.
- base_mallocreturns a- void*, which is a pointer without a type. Before you can do pointer arithmetic, you will need to cast the- void*pointer to have a type. However,- dmallocshould still return a- void*.
- When you incremement a pointer, the pointer’s address will jump ahead by the size of the pointer’s datatype.
 
 Run make check to run the test programs. Specifically,  test001  through  test011  should pass when you are done with this section.
Integer overflow protection
Your debugging malloc library also needs to be robust against integer overflow attacks. Integer overflow occurs when you try to allocate a number of bytes that is too large. For example, the size_t type of the first argument to dmalloc is a type alias for an unsigned long data type, which is comprised of 8 bytes. When you compute the number of bytes to request and it exceeds the maximum possible integer, the value of the size_t wraps around to 0. Consider, for example, the calloc function in the stencil code, think about how overflow could occur, and then fix it.
Task:  Implement the overflow protection in dmalloc() and dcalloc().
  test012  through  test015  should pass after this is done correctly. Make sure that  test015  succeeds on the grading server too; if you get a test timeout on the grading server, your overflow checks are not quite right.
Invalid free and double-free detection
dfree(ptr) should print an error message and then call C’s abort() function when ptr does not point to active dynamically-allocated memory. There are several cases when this might occur.
The test programs define the desired error message format. Here’s our error message for test016:
MEMORY BUG: test016.cc:8: invalid free of pointer 0xffffffffffffffe0, not in heap
Error messages should be printed to standard error (using C’s fprintf(stderr, ...)). Different error situations require different error messages; the other test programs define the required messages.
Task:  Check for invalid and double frees in dfree() and print an appropriate error message.
 
 Hints! 
- 
ptrneeds to point somewhere on the heap. How are you tracking the allocated space on the heap?
 
- 
You need to catch double frees (when free is called on the same pointer twice)! How can you determine whether or not a block is freed? 
- 
ptrmust be a pointer to the beginning of an allocated memory block, not just any address on the heap. Knowing your metadata header should be stored in memory right beforeptr, how can you guarantee you are actually accessing a header?
 
- 
Don’t be afraid to store more information in your metadata! 
 
  test016  through  test024  should pass after this is done correctly.
Boundary write error detection
A boundary error is when a program reads or writes memory beyond the actual dimensions of an allocated memory block.
A debugging memory allocator can’t detect boundary read errors, but it can detect many boundary write errors. Your dfree(ptr) should print an error message and call abort() if it detects that the memory block associated with ptr suffered a boundary write error.
Task:  Check for boundary write errors in dfree() and print an appropriate error message.
 
Hints!
As a strategy for detecting boundary write errors, consider adding data with a known (“secret”) value around the boundaries of your allocation, and then check that value in appropriate places.
No debugging allocator can reliably detect all boundary write errors. For example, consider this:
int* array = (int*) malloc(10 * sizeof(int));
int secret = array[10];     // save boundary value
array[10] = 1384139431;     // boundary write error
array[10] = secret;         // restore old value! dmalloc can’t tell there was an error!
Or this:
int* array = (int*) malloc(10 * sizeof(int));
array[200000] = 0;          // a boundary write error, but very far from the boundary!
We’re just expecting your code to catch common simple cases where the user writes one or more bytes directly after the allocated block.
 
  test025  through  test027  should pass after this is done correctly. Note that test 27 and following invoke undefined behavior (and make assumptions about the compiler’s choices for it) in order to test your implementation. You don’t need to pass these and the higher-numbered tests with sanitizers enabled.
Memory leak reporting
A memory leak happens when code allocates a block of memory, but then forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don’t cause a crash, and the operating system always reclaims all of a program’s memory when the program exits. But in long-running programs, such as your browser or the operating system itself, memory leaks have serious effect and are important to avoid.
Fill in the print_leak_report() function such that, when called, it prints a report about every allocated object in the system. This report should list every object that has been allocated via dmalloc but not freed using dfree. Print the report to standard output (not standard error). A report should look like this:
LEAK CHECK: test033.cc:23: allocated object 0x9b811e0 with size 19
LEAK CHECK: test033.cc:21: allocated object 0x9b81170 with size 17
LEAK CHECK: test033.cc:20: allocated object 0x9b81140 with size 16
LEAK CHECK: test033.cc:19: allocated object 0x9b81110 with size 15
LEAK CHECK: test033.cc:18: allocated object 0x9b810e0 with size 14
LEAK CHECK: test033.cc:16: allocated object 0x9b81080 with size 12
LEAK CHECK: test033.cc:15: allocated object 0x9b81050 with size 11
A programmer will use this leak checker by calling print_leak_report() before exiting the program, after freeing all allocated dynamic memory they recall. Any missing frees will show up in the leak report!
Task:  Implement print_leak_report().
 
Hints!
There are multiple ways to do this. Here are some things that might be helpful:
- 
std::pair– a C++ object that contains two data elements (similar to a tuple with two members).
 
- 
std::map– C++ key-value map. You will need to add#include <map>at the top of dmalloc.cc to use this.
 
- 
map::begin– returns an iterator referring to the first element in the map container.
 
- 
std::unordered_map– C++ key-value hash map. You will need to add#include <unordered_map>at the top ofdmalloc.ccto use it. If you get strange and scary compiler errors when using this data structure, it’s likely because the compiler doesn’t know how to hash your key type. Check out this guide for more information.
 
 
  test028  through  test030  should pass after this is done correctly.
Advanced reports and checking
This is the home stretch!
To make your debugger even more robust, you should print better information and defend against more complex invalid frees. You will need to read the test code and understand what is being tested to defend against it.
Update your invalid free message. After determining that a pointer is invalid, your code should check whether the pointer is inside a different allocated block. This will use the same structures you created for the leak checker. If the invalid pointer is inside another block, print out that block, like so:
MEMORY BUG: test031.cc:10: invalid free of pointer 0x833306c, not allocated
 test031.cc:9: 0x833306c is 100 bytes inside a 2001 byte region allocated here
Also make sure your invalid free detector can handle more diabolical situations. What situations? Check the test code to find out!
Task: If an invalid free occurs within an already allocated block print an appropriate error message.
  test031  through  test033  should pass after this is done correctly.
Finally, test programs 34 to 39 test other situations that allow you to check that your debugging allocator is efficent and compatible with C++ style memory alocation (not just C style memory allocation).
test034 calls dmalloc 500,000 times, supplying tens of thousands of different filename/line-number pairs. Ideally, your solution should run test034 in a second or less.
Task: Meet the performance requirement!
 All tests should pass after this is done correctly! 
Note: C++ introduces a new paradigm for dynamic memory allocation with the new and delete keywords. These keywords are designed to work with C++ classes, but under the hood they work simply by using malloc and free respectively.
Handing In Your Code
As before, you will hand in your code using Git. In the dmalloc/ subdirectory of your project repository, you MUST fill in the text file called README.md.
  Remind me again what the README.md should contain? 
The README.md file will include the following:
- Any design decisions you made and comments for graders, under "Design Overview". If there's nothing interesting to say, just list "None".
- Any collaborators and citations for help that you received, under "Collaborators". CS 131 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
- Your answers to the conceptual questions at the start of this handout under "Conceptual Questions".
- Notes on approximately how long it took you to complete the project. We won't use this for grading, but we collect the data to calibrate for future years.
 Grading breakdown:
- 80% (80 points) tested functionality (2 points per test, plus 2 for passing them all). If running make checkreports that all tests pass, you’ve probably got all these points.
- 20% (20 points) answers to conceptual questions.
Now head to the grading server, make sure that you have the “Dmalloc” page configured correctly with your project repository, and check that all tests pass on the grading server.
Congratulations, you’ve completed the second CS 131 project! 
Acknowledgements: DMalloc was originally developed for Harvard’s CS 61 course. We are grateful to Eddie Kohler for allowing us to use the assignment for CS 131.
« Back to the main CS 131 website
Project 2: DMalloc
Collaborative Hours: Feb 15 – Feb 21
TA Hours: Feb 22 – March 2
Conceptual Questions Due Feb 24, 2020 at 6:00pm
Project Code Due Feb 28, 2020 at 6:00pm
Introduction
One of the unique challenges systems developers face is the explicit management of computer memory allocation. In other, higher-level programming languages, you work with variables/objects without really worrying about where they are in memory, how much space they are taking up, or when to give their memory back to the operating system.
In systems we take a different perspective: dynamic memory is allocated and deallocated manually by the programmer. This gives the programmer the ability to write faster and more efficient code, or low level programs that might not have super fancy languages or runtime environments to implicitly allocate (embedded systems etc). But, like most powerful systems assets, dynamic memory can lead to some tricky bugs! In this project you will create a debugging memory allocator to help catch some of these bugs.
More Context on Memory Allocation
Recall the memory layout a program sees in x86
Remember from lab 1 that there are 3 places in memory where we can store program data and accordingly there are 3 types of memory allocation.
Static memory allocation stores data in the data segement. This is used for global variables and variables declared with the
statickeyword. The memory location of these variables does not change throughout the program. In other words, these allocations exist in the same “static” location determined by the compiler before runtime and are always available (e.g., recall howstaticvariables work in functions likestrtok). If static variables are initialized to a value, the compiler puts them into the initialized data segment; otherwise it puts them into a segment (confusingly) called bss. Remember: “static” refers to the location of a variable, not whether the program can modify it (that’s what theconstkeyword is for).Automatic memory allocation stores data on the Stack. This is the memory location for local variables and arguments within functions. The allocation size for the stack is also determined at compile time, and the compiler decides how much the stack grows on a given function call. It is not “static” because the allocations come and go as we move through stack frames (function calls), and it is “automatic” because the lifetime of these allocations is determined by the compiler.
Dynamic memory allocation stores data on the heap. The lifetime of this type of memory allocation cannot be determined at compile time and is managed explicitly by the programmer. Dynamic memory allocation in C uses functions like
mallocandfree; C++ uses thenewanddeletekeywords. The lifetime and size of dynamic memory allocations is entirely up to you!This project focuses on dynamic memory allocation. We rarely have memory allocation problems when the compiler is doing deterministic work as with static and automatic allocation; however with dynamic allocation we can run into some issues.
Because memory is the foundation on which a program runs, memory errors and misuse are a particularly nasty set of bugs that can be really hard to deal with. Quite often memory errors will result in Undefined Behavior or compeletly fatal errors .
.
The purpose of this project is to get familiar with dynamic memory allocation functions as well as understand how the heap is used and some of the dangers associated with it.
More specifically, your task is to implement a special type of dynamic memory allocator that helps to debug dynamic memory usage. Your so-called
dmallocprogram will:This project will allow you to…
mallocand the heap work!Warning: It is important to get started early: this project is not trivial even though it does not require a lot of code.
Conceptual Questions
Instructions
Note: Answers to these conceptual questions are due Feb 24 at 6:00pm, which is 4 days prior to the final project deadline. In order to submit your answers, create a commit in your project repository (before the due date) that includes your README with your answers.
Questions
structin C and why is it useful?ptr, that points to memory address0x12f99a64. What address does(char *) ptr + 1point to? What address does(int *) ptr + 1point to? Assume 32 bit integer sizes.mallocandfree?Debugging Resources/GDB
A common way students debug code is by adding print statements everywhere and hoping for the best. While this may have worked in the past, and may continue to work, it is the debugging equivalent of kicking down a revolving door. GDB can, and should, become your best friend. It might take a little practice to get used to, but we guarentee GDB will make debugging faster and easier for you.
GDB Testimonials
GDB: How To
Run GDB on the executable (in this case the test) using the
gdbcommand.$ gdb [executable-name]More
2. Set a breakpoint on the function of your choice using
b [ function | file:line ](often this will bemain).3. Run the program using
r, along with any arguments you want.4. Display the source code so you can follow along the execution.
5. Step through your code using
nto execute one line andsto step inside a function. (If you accidentally step into a function you didn’t intend to, usefinish).6. Print variables and pointers. The most useful commands are:
p [variable]to print valuesx/[type] [variable]to interpret the variable as a pointer, and print the value at that location as a specific type. For examples,x/sfor strings,x/dfor decimal,x/xfor hexadecimal, etc. moreDebugging a Segfault
Run GDB on your executable (don’t set any breakpoints) and then run your program with
r. Once it segfaults, runbtto print a stack trace (the list of subsequent function calls that got you where you are now). You can now uselayout srcto view the code, thenupanddownto view different stack frames.more more more
Assignment
Installation
Ensure that your project repository has a
handoutremote. Type:If this reports an error, run:
Then run:
This will merge our Project 2 (DMalloc) stencil code with your repository.
Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the
dmallocdirectory in the working copy of your projects repository.Infrastructure Help
Stencil Code
Standard Memory allocation in C
To do this project, it is important to understand the functions that are called when dynamically allocating memory in C. These are the functions you see in regular C programs:
malloc
Allocates
szbytes of memory and returns a pointerptrto the “new” memory block. This memory is not initialized (it can contain anything). Upon success, theszbytes of storage starting at addressptrare guaranteed not to overlap with any other active objects, including the program’s code and global variables, its stack, or with any other active dynamically allocated memory. Upon failure,mallocreturns a null pointer (NULLin C-speak,nullptrin C++ speak). Failure can occur ifszis too big, memory is exhausted, or for other reasons.free
Free a single block of memory previously allocated by
malloc. The pointer argument,ptr, must either be a null pointer or a pointer to active dynamically-allocated memory. More explicitly, it is NOT OK to call free if…ptrpoints to the program’s code, or into its global variables, or into the stack.ptrwas not returned by a previous call tomalloc.ptrpoints to heap memory whose lifetime has ended (i.e., the memory block pointed to byptrwas already freed and has not been reallocated).calloc
Allocates memory for an array of objects and clears the allocated block to zero. This is a “secondary” memory allocation function that itself makes a call to
malloc. Our stencil code includes an implementation of this method with a bug that you will fix when testing!Using these functions:
Dynamically-allocated memory remains active until it is explicitly freed with a call to free. The rules of malloc and free are simple: allocate once, then free exactly once.
Some quick notes on boundary cases:
malloc(0)should return a non-null pointer. Ifptr == malloc(0), thenptrdoes not overlap with any other allocation and can be passed tofree.free(nullptr)is allowed. It does nothing.malloc(sz)returns memory whose alignment works for any object. On x86-64 computers, this means that the address value returned bymallocmust be a multiple of 16 (so its last hexadecimal digit is0).base_malloc/base_free
In the stencil code,
dmallocanddfreemake calls tobase_mallocandbase_free. Why do we avoid directly calling the C functionsmallocandfree? Because debugging allocators interact with undefined behavior!Undefined behavior is something to be avoided because any program that invokes undefined behavior has no meaning. As far as the C++ language standard is concerned, once undefined behavior occurs, a program may do absolutely anything, such as setting your computer on fire or (our favorite example), force demons to fly out of your nose.
Undefined-behavior bugs with
mallocandfreeare common, and debugging allocators must make some undefined behaviors well-defined. For example, when a debugging allocator is in force, a program with a simple double free bug will reliably print a specific error message and exit before any undefined behavior occurs.This brings us back to
base_mallocandbase_free, which are defined inbasealloc.cc. This allocator behaves likemallocandfree, but has the following additional properties:base_freedoes not modify freed memory (the contents of freed storage remain unchanged until the storage is reused by a futurebase_malloc).base_freenever returns freed memory to the operating system.You don’t really have to understand how the functions in
basealloc.ccwork – just treat them likemallocandfree. Yet,base_mallocandbase_freeare not standard library functions (likemallocandfree), so we are able to change what is considered undefined behavior. However, we don’t define everything! You will still need to take care of some undefined behavior (like double frees, invalid frees, and wild writes) in you debugger.Testing
We have included a comprehensive set of tests for you to help you write your debugging memory allocator. Running
make checkwill run your code against all the tests in order. It’s a good idea to work on the tests in order: get the simple stuff working, then gradually add more complex functionality. In other words, you will have a better time with this assignment if you incrementally tackle tests one after another, rather than writing the full allocator in one go and then running the tests.Open
test001.ccand look at its code. You will notice these comments at the end of the file:These lines define the expected output for the test. The
make checkcommand checks your actual output against the expected output and reports any discrepancies. (It does this by calling a Perl script calledcheck.pl; you don’t have to worry about how this script works.) Don’t change these lines – doing so may make tests pass for you locally, but the grading server will replace your changes with the original lines before it runs your code.Super Helpful Testing Tips!
make check-N, as inmake check-1. If you want to see the full output of a test, without worrying about comparing to expected output, run that test on the command line directly. For example:$ make test001 $ ./test001runs test001 and prints the full output. This can be very useful if you’re getting errors from the sanitizers.
Note: One really important thing to do in this assignment is to run your code with sanitizers enabled for tests 1 to 26. To do this, add
SAN=1to yourmakeinvocation: for example, you might runmake SAN=1to build all code with the sanitizer flags set, ormake check-1-26 SAN=1to run the tests with sanitizers enabled. The grading server will run your code for tests 1-26 with the sanitizers enabled, so you must pass them (later tests use undefined behavior to test your allocator; they won’t pass with sanitizers enabled). They’ll also help you find many bugs, guaranteed!Assignment Roadmap
This section describes the main functionality required for the debugging allocator, ordered from simpler to more complex and from low-numbered to high-numbered tests.
Statistics
In the stencil code there is a function
print_statistics()that uses the information stored in a statistics struct to print out statistics about all the memory allocations so far.The statistics struct is defined as follows (see
dmalloc.hh):It is up to you to fill in the function
get_statistics(statistics *stats)so that accurate information is stored in an instance ofdmalloc_stats. To do this, think about using a global variable that holds a struct and which any calls todmalloc()anddfree(ptr)update.Task: Implement the
get_statisticsfunction indmalloc.cc.Hints!
Most of these statistics are easy to track, and you should tackle them first (start by passing tests 1-5 and 7-10).
The trickier test is
active_size. To track this information, yourdfree(ptr)implementation must find the number of bytes allocated for the block pointed to byptr. This is because after a block is freed,active_sizeshould decrease by the size of that block.The easiest, and probably best, way to do this is for your
dmalloccode to request more space than the user required when it callsbase_malloc. This extra space will allow you to store metadata about the allocated block alongside it. This metadata can store a structure you define yourself, and it should keep track of the allocated block size. As you write your debugging allocator, you may realize you want to store other information as well. The metadata will be stored at the beginning of an allocated block, butdmallocshould still return a pointer to the “payload” of the block, i.e., to the space after the metadata. You will need to use pointer arithmetic to return a pointer that points somewhere just beyond the metadata. Indfree(ptr), theptrargument then must be a pointer to the payload of a block (because that’s whatdmallocreturned before), but you can use pointer arithmetic again to access the metadata stored just beforeptrto access the block’s size.Quick reminders on pointer arithmetic:
sizeof()to get the size of a datatype.base_mallocreturns avoid*, which is a pointer without a type. Before you can do pointer arithmetic, you will need to cast thevoid*pointer to have a type. However,dmallocshould still return avoid*.Run
make checkto run the test programs. Specifically, test001 through test011 should pass when you are done with this section.Integer overflow protection
Your debugging malloc library also needs to be robust against integer overflow attacks. Integer overflow occurs when you try to allocate a number of bytes that is too large. For example, the
size_ttype of the first argument todmallocis a type alias for an unsigned long data type, which is comprised of 8 bytes. When you compute the number of bytes to request and it exceeds the maximum possible integer, the value of thesize_twraps around to 0. Consider, for example, thecallocfunction in the stencil code, think about how overflow could occur, and then fix it.Task: Implement the overflow protection in
dmalloc()anddcalloc().test012 through test015 should pass after this is done correctly. Make sure that test015 succeeds on the grading server too; if you get a test timeout on the grading server, your overflow checks are not quite right.
Invalid free and double-free detection
dfree(ptr)should print an error message and then call C’sabort()function whenptrdoes not point to active dynamically-allocated memory. There are several cases when this might occur.The test programs define the desired error message format. Here’s our error message for test016:
Error messages should be printed to standard error (using C’s
fprintf(stderr, ...)). Different error situations require different error messages; the other test programs define the required messages.Task: Check for invalid and double frees in
dfree()and print an appropriate error message.Hints!
ptrneeds to point somewhere on the heap. How are you tracking the allocated space on the heap?You need to catch double frees (when free is called on the same pointer twice)! How can you determine whether or not a block is freed?
ptrmust be a pointer to the beginning of an allocated memory block, not just any address on the heap. Knowing your metadata header should be stored in memory right beforeptr, how can you guarantee you are actually accessing a header?Don’t be afraid to store more information in your metadata!
test016 through test024 should pass after this is done correctly.
Boundary write error detection
A boundary error is when a program reads or writes memory beyond the actual dimensions of an allocated memory block.
A debugging memory allocator can’t detect boundary read errors, but it can detect many boundary write errors. Your
dfree(ptr)should print an error message and callabort()if it detects that the memory block associated withptrsuffered a boundary write error.Task: Check for boundary write errors in
dfree()and print an appropriate error message.Hints!
As a strategy for detecting boundary write errors, consider adding data with a known (“secret”) value around the boundaries of your allocation, and then check that value in appropriate places.
No debugging allocator can reliably detect all boundary write errors. For example, consider this:
Or this:
We’re just expecting your code to catch common simple cases where the user writes one or more bytes directly after the allocated block.
test025 through test027 should pass after this is done correctly. Note that test 27 and following invoke undefined behavior (and make assumptions about the compiler’s choices for it) in order to test your implementation. You don’t need to pass these and the higher-numbered tests with sanitizers enabled.
Memory leak reporting
A memory leak happens when code allocates a block of memory, but then forgets to free it. Memory leaks are not as serious as other memory errors, particularly in short-running programs. They don’t cause a crash, and the operating system always reclaims all of a program’s memory when the program exits. But in long-running programs, such as your browser or the operating system itself, memory leaks have serious effect and are important to avoid.
Fill in the
print_leak_report()function such that, when called, it prints a report about every allocated object in the system. This report should list every object that has been allocated viadmallocbut not freed usingdfree. Print the report to standard output (not standard error). A report should look like this:A programmer will use this leak checker by calling
print_leak_report()before exiting the program, after freeing all allocated dynamic memory they recall. Any missing frees will show up in the leak report!Task: Implement
print_leak_report().Hints!
There are multiple ways to do this. Here are some things that might be helpful:
std::pair– a C++ object that contains two data elements (similar to a tuple with two members).std::map– C++ key-value map. You will need to add#include <map>at the top of dmalloc.cc to use this.map::begin– returns an iterator referring to the first element in the map container.std::unordered_map– C++ key-value hash map. You will need to add#include <unordered_map>at the top ofdmalloc.ccto use it. If you get strange and scary compiler errors when using this data structure, it’s likely because the compiler doesn’t know how to hash your key type. Check out this guide for more information.test028 through test030 should pass after this is done correctly.
Advanced reports and checking
This is the home stretch!
To make your debugger even more robust, you should print better information and defend against more complex invalid frees. You will need to read the test code and understand what is being tested to defend against it.
Update your invalid free message. After determining that a pointer is invalid, your code should check whether the pointer is inside a different allocated block. This will use the same structures you created for the leak checker. If the invalid pointer is inside another block, print out that block, like so:
Also make sure your invalid free detector can handle more diabolical situations. What situations? Check the test code to find out!
Task: If an invalid free occurs within an already allocated block print an appropriate error message.
test031 through test033 should pass after this is done correctly.
Performance and C++ integration
Finally, test programs 34 to 39 test other situations that allow you to check that your debugging allocator is efficent and compatible with C++ style memory alocation (not just C style memory allocation).
test034 calls
dmalloc500,000 times, supplying tens of thousands of different filename/line-number pairs. Ideally, your solution should run test034 in a second or less.Task: Meet the performance requirement!
All tests should pass after this is done correctly!
Note: C++ introduces a new paradigm for dynamic memory allocation with the
newanddeletekeywords. These keywords are designed to work with C++ classes, but under the hood they work simply by usingmallocandfreerespectively.Handing In Your Code
As before, you will hand in your code using Git. In the
dmalloc/subdirectory of your project repository, you MUST fill in the text file calledREADME.md.Remind me again what the
README.mdshould contain?The
README.mdfile will include the following:Grading breakdown:
make checkreports that all tests pass, you’ve probably got all these points.Now head to the grading server, make sure that you have the “Dmalloc” page configured correctly with your project repository, and check that all tests pass on the grading server.
Congratulations, you’ve completed the second CS 131 project!
Acknowledgements: DMalloc was originally developed for Harvard’s CS 61 course. We are grateful to Eddie Kohler for allowing us to use the assignment for CS 131.