Lecture 5: Structures and Alignment

» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 6pm Monday, February 10).

Structures, continued

Last time, we learned about structures in the C language. Structures are handy because you can define the memory layout for a complex object and use this object in your programs.

There are several ways to obtain memory for an instance of your struct in C: using a global, static-lifetime struct, stack-allocating a local struct with automatic lifetime, or heap-allocating dynamic-lifetime memory to hold the struct. The example below, based on mexplore-struct.c, shows how stack- and heap-allocated structs works.

int main() {
  // declares a new instance of x_t on the stack (automatic lifetime)
  struct x_t stack_allocated;

  stack_allocated.i1 = 1;
  stack_allocated.c2 = 'A';

  printf("stack-allocated structure at %p: i1 = %d (addr: %p), c1 = %c (addr: %p)\n",
         &stack_allocated,  // need to take address, as stack_allocated is not a pointer
         stack_allocated.i1,
         &stack_allocated.i1,
         stack_allocated.c1
         &stack_allocated.c1);

  // makes a new instance of x_t on the heap (dynamic lifetime)
  struct x_t* heap_allocated = (struct x_t*)malloc(sizeof(struct x_t));

  heap_allocated->i1 = 3;
  heap_allocated->c1 = 'X';

  printf("heap-allocated structure at %p: i1 = %d (addr: %p), c1 = %c (addr: %p)\n",
         heap_allocated,  // already an address, so no & needed here
         heap_allocated>i1,
         &heap_allocated->i1,
         heap_allocated->c1
         &heap_allocated->c1);
}

Observe that we access struct members in two different ways: when the struct is a value (e.g., a stack-allocated struct), we access member i1 as stack_allocated.i1, using a dot to separate variable and member name. (This is the same syntax that you'd use to access members of Java objects.) But if we're dealing with a pointer to a struct (such as the pointer returned from malloc() for our heap-allocated struct), we use -> to separate variable and member name. The arrow syntax (->) implicitly dereferences the pointer and then accesses the member. In other words, heap_allocated->i1 is identical to (*heap_allocated).i1.

Different ways to initialize a struct

Like arrays, structures also have an initializer list syntax that makes it easy for you to set the values of their members when creating a struct. For example, you could write struct x_t my_x = { 1, 2, 3, 'A', 'B', 'C'};, or even only partially initialize the struct via struct x_t my_x2 = { .i2 = 42, .c3 = 'X' };. The values of uninitialized members in practice depends on where the memory comes from (static segment data is initialized to zeros; other segments are not), but it's generally best to treat such memory as uninitialized and set all members.

Sick of writing struct x_t all the time?

Normally, you always need to put the struct keyword in front of your new struct type whenever you use it. But this gets tedious, and the C language provides the helpful keyword typedef to save you some work. You can use typedef with a struct definition like this:

typedef struct {
  int i1;
  int i2;
  int i3;
  char c1;
  char c2;
  char c3;
} x_t;
... and henceforth you just write x_t to refer to your struct type.

Now let's build a useful data structure! We'll look at a linked list of integers here (linked-list.c). This actually consists of two structures: one to represent the list as a whole (list_t) and one to represent nodes in the list (list_node_t). The list_t structure contains a pointer to the first node of the list, and (in this simple implementation) nothing else. The list_node_t structure contains the node's value (an int) and a pointer to the next list_node_t in memory.

typedef struct list_node {
  int value;
  struct list_node* next;
} list_node_t;

typedef struct list {
  list_node_t* head;
} list_t;
Why does the next pointer in list_node_t have type struct list_node*, not list_node_t*?

C compilers do not allow recursively-defined type definitions. In particular, you cannot use the type you're defining via typedef within its own definition. You can, however, use a struct pointer within the structure's definition. Think of it this way: struct list_node is already known a known object for the compiler when the pointer occurs in the definition, but list_node_t isn't yet, as its definition only ends with the semicolon.
Note that you can only nest a pointer to a struct in its own definition, not an instance of the struct itself. Try to think of why that must be the case, remebering that C types must have fixed memory sizes at compile time!

A function to append a node to this list must take two arguments: the list to append to (a list_t*) and the element to append (an int). It then needs to check if the list is empty (l->head == NULL); if it is not, append() needs to find the end of the list. It does so by following the next pointer in each node until it encounters a list_node_t whose next pointer is NULL. Once we have the end of the list, we allocate a new list_node_t using malloc(), set its value and initialize its next pointer to NULL (as this will be the new end of the list). Finally, we change the pointer of the current list end (either l->head for an empty list, or cut->next for a non-empty one) to point to the new node.

void append(list_t* list, int value) {
  list_node_t* cur = list->head;

  if (cur != NULL) {
    while (cur->next != NULL) {
      cur = cur->next;
    }
  }

  list_node_t* new_node = (list_node_t*)malloc(sizeof(list_node_t));
  new_node->next = NULL;
  new_node->value = value;

  if (cur != NULL) {
    cur->next = new_node;
  } else {
    list->head = new_node;
  }
}
Exercise: how would you write a method to obtained the ith element of a linked list of integers?

The signature of this method is int* at(int index), i.e., it takes an index as its argument and returns a pointer to the integer stored at that index, and NULL if the index does not exist.

This is the linked list data structure we looked at all the way back in lecture 1, when we were looking to sort a list of integers! You now understand its exact memory representation. The size of the data stored in each list_node_t is 12 bytes. But is that also what sizeof(list_node_t) returns? Spoiler: it's not. Let's see why.

How are the members of a struct like list_node_t actually laid out in memory? This is defined by the struct rule, which says that the members of a struct are laid out in the order they're declared in, without overlap, and subject only to alignment constraints. These mysterious "alignment constraints" are what makes our list_node_t have a size of 16 bytes even though it only needs 12.

Alignment

Why are we covering this?

Since C requires you to work closely with memory addresses, it is important to understand how the compiler lays out data in memory, and why the layout may not always be exactly what you expect. If you understand alignment, you will get pointer arithmetic and byte offsets right when you deal with them, and you will understand why programs sometimes use more memory than you would think based on your data structure specifications.

The chips in your computer are very good at working with fixed-size numbers. This is the reason why the basic integer types in C grow in powers of two (char = 1 byte, short = 2 bytes, int = 4 bytes, long = 8 bytes). But it further turns out that the computer can only work efficiently if these fixed-size numbers are aligned at specific addresses in memory. This is especially important when dealing with structs, which could be of arbitrary size based on their definition, and could have odd memory layouts following the struct rule.

Just like each primitive type has a size, it also has an alignment. The alignment means that all objects of this type must start at an address divisible by the alignment. In other words, an integer with size 4 and alignment 4 must always start at an address divisible by 4. (This applies independently of whether the object is inside a collection, such as a struct or array, or not.) The table below shows the alignment restrictions of primitive types on an x86-64 Linux machine.

Type Size Address restriction
char (signed char, unsigned char) 1 No restriction
short (unsigned short) 2 Multiple of 2
int (unsigned int) 4 Multiple of 4
long (unsigned long) 8 Multiple of 8
float 4 Multiple of 4
double 8 Multiple of 8
T* 8 Multiple of 8

The reason for this lies in the way hardware is constructed: to end up with simpler wiring and logic, computers often move fixed amounts of data around. In particular, when the computer's process accesses memory, it actually does not go directly to RAM (the random access memory whose chips hold our bytes). Instead, it accesses a fast piece of memory that contains a tiny subset of the contents of RAM (this is called a "cache" and we'll learn more about it in future lectures!). But building logic that can copy memory at any arbitrary byte address in RAM into this smaller memory would be hugely complicated, so the hardware designers chunk RAM into fixed-size "blocks" that can be copied efficiently. The size of these blocks differs between computers, but their existence reveals why alignment is necessary.

Let's assume there were no alignment constraints, and consider a situation like the one shown in the following:

                | 4B int  |     <-- unaligned integer stored across block boundary
                | 2B | 2B |     <-- 2 bytes in block k, 2 bytes in block k+1
     ----+-----------+-----------+-----------+--
    ...  | block k   | block k+1 | block k+2 |   ...  <- memory blocks ("cache lines")
     ----+-----------+-----------+-----------+--

An unaligned integer could end up being stored across the boundary between two memory blocks. This would require the processor to fetch two blocks of RAM into its fast cache memory, which would not only take longer, but also make the circuits much harder to build. With alignment, the circuit can assume that every integer (and indeed, every primitive type in C) is always contained entirely in one memory block.

                     | 4B int  |     <-- aligned integer stored entirely in one block
                     | 4B      |     <-- all 4 bytes in block k+1
     ----+-----------+-----------+-----------+--
    ...  | block k   | block k+1 | block k+2 |   ...  <- memory blocks ("cache lines")
     ----+-----------+-----------+-----------+--

The compiler, standard library, and operating system all work together to enforce alignment restrictions. If you want to get the alignment of a type in a C program, you can use the sizeof operator's cousin alignof. In other words, alignof(int) is replaced with 4 by the compiler, and similarly for other types.

We can now write down a precise definition of alignment: The alignment of a type T is a number a ≥ 1 such that the address of every object of type T is a multiple of a. Every object with type T has size sizeof(T), meaning that it occupies sizeof(T) contiguous bytes of memory; and each object of type T has alignment alignof(T), meaning that the address of its first byte is a multiple of alignof(T).

You might wonder what the maximum alignment is – the larger an alignment, the more memory might get wasted by being unusable! It turns out that the 64-bit architectures we use today have maximum 16-byte alignment, which is sufficient for the largest primitive type, long double.

Note that structs are not primitive types, so they aren't as such subject to alignment constraints. However, each struct has a first member, and by the first member rule for collections, the address of the struct is the address of the first member. Since struct members are primitive types (even with nested structures, eventually you'll end up with primitive type members after expansion), and those members do need to be aligned. So, by the first member rule, the struct will be aligned. (It turns out that, in practice, structures on the heap are aligned on 16-byte boundaries because malloc() on x86-64 Linux returns 16-byte aligned pointers; structures on the stack are aligned by the compiler.)

The size of a struct might therefore be larger than the sum of the sizes of its components due to alignment constraints. Since the compiler must lay out struct components in order, and it must obey the components' alignment constraints, and it must ensure different components don't overlap, it must sometimes introduce extra space in structs. This space is called padding, and it's effectively wasted memory. Our linked list node is an example of a situation where padding is required : the struct will have 4 bytes of padding after int v, to ensure that list_node_t* has a correct alignment (address divisible by 8).

Alignment constraints also apply when the compiler lays out variables on the stack. mexplore-order.c illustrates this: with all int variables and char variables defined consecutively, we end up with the memory addresses we might expect (the three ints are consecutive in memory, and the three chars are in the bytes below them). But if I move c1 up to declare it just after i1, the compiler leaves a gap below the character, so that the next integer is aligned correctly on a four-byte boundary.

But: if we turn on compiler optimizations, there is no gap! The compiler has reordered the variables on the stack to avoid wasting memory: all integers are again consecutive in memory, even though we didn't declare them in that order. This is permitted, as there is no rule about the order of stack-allocated variables in memory (nor is there one about the order of heap-allocated ones, though addresses returned from malloc() do need to be aligned). If these variables were in a struct (as in x_t), however, the compiler could not perform this optimization because the struct rule forbids reordering members.

Summary

Today, we completed our understanding of structures in the C language and how they are represented in memory. We learned how a linked list of integers can be built from two C structures, and implemented a simple function that appends to our list. This is similar to what you'll do in the vector part of Project 1, except that you'll build a vector and not a linked list.

We also explored the tricky subject of alignment in memory, where the compiler sometimes wastes memory to achieve faster program execution, and learned how the bytes of types larger than a char, as well as structures, are actually laid out in memory.