CSCI 0300/1310: Fundamentals of Computer Systems

Lecture 8: Alignment, Collection Rules, Assembly Language

🎥 Lecture video (Brown ID required)
💻 Lecture code (alignment)
💻 Lecture code (assembly)
❓ Post-Lecture Quiz (due 11:59pm, Monday, February 26).

Alignment continued, and Collection Rules

In previous lectures, we built up to specifying a set of rules that govern how the C language expects data to be laid out in memory. We're now ready to write down these rules.

Here are the first two:

the first member rule says that the address of the collection (array, structure, or union [see below]) is the same as the address of its first member;
the array rule says that all members of an array are laid out consecutively in memory; and

How are the members of a struct like list_node_t actually laid out in memory? This is defined by the struct rule, which says that the members of a struct are laid out in the order they're declared in, without overlap, and subject only to alignment constraints. These mysterious "alignment constraints" are what makes our list_node_t have a size of 16 bytes even though it only needs 12.

So, by the first member rule, the struct will be aligned. (It turns out that, in practice, structures on the heap are aligned on 16-byte boundaries because malloc() on x86-64 Linux returns 16-byte aligned pointers; structures on the stack are aligned by the compiler.)

The size of a struct might therefore be larger than the sum of the sizes of its components due to alignment constraints. Since the compiler must lay out struct components in order, and it must obey the components' alignment constraints, and it must ensure different components don't overlap, it must sometimes introduce extra space in structs. This space is called padding, and it's effectively wasted memory. Our linked list node is an example of a situation where padding is required : the struct will have 4 bytes of padding after int v, to ensure that list_node_t* has a correct alignment (address divisible by 8).

So, we can now specify the third rule:

the struct rule says that members of a struct are laid out in declaration order, without overlap, and with minimum padding as necessary to satisfy the struct members' alignment constraints.

In addition to these rules, there are three more we haven't covered or made explicit yet.

Aside: Unions

⚠️ We did not cover unions in the course this year. Following material is for your education only; we won't test you on it. Feel free to skip ahead to the other rules.

For the next rule, we need to learn about unions, which are another collection type in C.

A union is a C data structure that looks a lot like a struct, but which contains only one of its members. Here's an example:

union int_or_char {
  int i;
  char c;
}

Any variable u of type union int_or_char is either an integer (so u.i is valid) or a char (so u.c is valid), but never both at the same time. Unions are rarely used in practice and you won't need them in this course. The size of a union is the maximum of the sizes of its members, and so is its alignment.

What are unions good for?

Unions are helpful when a data structure's size is of the essence (e.g., for embedded environments like the controller chip in a microwave), and in situations where the same bytes can represent one thing or another. For example, the internet is based on a protocol called IP, and there are two versions of: IPv4 (the old one) and IPv6 (the new one, which permits >4B computers on the internet). But there are situations where we need to pass an address that either follows the IPv4 format (4 bytes) or the IPv6 format (16 bytes). A union makes this possible without wasting memory or requiring two separate data structures.

Now we can get to the next rule!

The union rule says that the address of all members of a union is the same as the address of the union.

Back to other rules!

The remaining two rules are far more important:

The minimum rule says that the memory used for a collection shall be the minimum possible without violating any of the other rules.
The malloc rule says that any call to malloc that succeeds returns a pointer that is aligned for any type. This rule has some important consequences: it means that malloc() must return pointers aligned for the maximum alignment, which on x86-64 Linux is 16 bytes. In other words, any pointer returned from malloc points to an address that is a multiple of 16.

One consequence from the struct rule and the minimum rule is that reordering struct members can reduce size of structures! Look at the example in mexplore-structalign.c. The struct ints_and_chars defined in that file consists of three ints and three chars, whose declarations alternate. What will the size of this structure be?

It's 24 bytes. The reason is that each int requires 4 bytes (so, 12 bytes total), and each char requires 1 byte (3 bytes total), but alignment requires the integers to start at addresses that are multiples of four! Hence, we end up with a struct layout like the following:

0x... 00 ... 04 ... 08 ... 0c ... 10 ... 14 ...   <- addresses (hex)
     +------+--+---+------+--+---+------+--+---+
     |  i1  |c1|PAD|  i2  |c2|PAD|  i3  |c3|PAD|  <- values
     +------+--+---+------+--+---+------+--+---+

This adds 9 bytes of padding – a 37.5% overhead! The padding is needed because the characters only use one byte, but the next integer has to start on an address divisible by 4.

But if we rearrange the members of the struct, declaring them in order i1, i2, i3, c1, c2, c3, the structure's memory layout changes. We now have the three integers adjacent, and since they require an alignment of 4 and are 4 bytes in size, no padding is needed between them. Following, we can put the characters into contiguous bytes also, since their size and alignment are 1.

0x... 00 ... 04 ... 08 ... 0c 0d 0e 0f ...   <- addresses (hex)
     +------+------+------+--+--+--+--+
     |  i1  |  i2  |  i3  |c1|c2|c3|P.|      <- values
     +------+------+------+--+--+--+--+

We only need a single byte of padding (6.25% overhead), as the struct must be padded to 16 bytes (why? Consider an array of ints_and_chars and the alignment of the next element!). In addition, the structure is now 16 bytes in size rather than 24 bytes – a 33% saving.

Assembly Language

We've now arrived at the end of our introduction to C programming. The rest of the course will mostly look at higher-level concepts built atop the understanding we have developed. But before the look at higher levels, we will briefly pull back the covers and see what happens at the level below C to make your programs run.

    | Web sites, Google, Facebook, AirBnB, etc.
--- |-------------------------------------------
    | Distributed systems     <-- block 4
 C  |-------------------------
 S  | Parallel programming    <-- block 3
    |-------------------------
 3  | C++ | Operating systems <-- block 2
 0  |-------------------------
 0  | C programming language  <-- we discussed this so far
    |-------------------------
    | Assembly language       <-- we will briefly cover this
--- |-------------------------------------------
    | Hardware (chips)

Now that you understand the C language and memory representations of data, you may wonder about the "magic" hexadecimal bytes that the compiler outputs to make your computer's processor do things like adding numbers. How does the compiler choose these bytes, and what bytes are valid?

Each computer architecture (such as x86-64, which most modern computers use and we're considering in this course) has an instruction set specified by the manufacturer. The instruction set, first and foremost, defines what sequences of bytes trigger specific behavior in the processor (e.g., adding numbers, comparing them for equality, or loading data from memory). But hexadecimal bytes are hard for humans to read, so the instruction set also comes with a human-readable assembly language that consists of short, mnemonic instructions that correspond directly to a byte encoding (i.e., each of these instructions corresponds to a specific, unique set of hexadecimal bytes).

Why are we covering this?

You will almost certainly never need to write code in assembly language yourself, but it is helpful to have at least some intuition of how to read it. Being able to read assembly helps you debug weird problems with your program, and can help you understand compiler optimizations betters. Some of the examples we look at will also illustrate to you the tricks used to make computers run our code efficiently.
If you're curious to learn more details about assembly, consider taking CSCI 0330, which explains it in more detail and with more hands-on exercises than we'll have time for!

From C code to instructions

Let's take a look at how your C program get turned into hexadecimal bytes that can run on your processor.

The compiler, which we've discussed a lot already, turns your C program into assembly language. Lots of cleverness and optimizations.

The assembler turns assembly language into a set of bytes in an executable. This a very direct translation.

How to read an assembly file

Assembly files (and assembly layout in GDB, layout asm) can be confusing at first. The important tricks to reading them are the following:

Focus only on the instructions that you care about, and initially ignore anything else.
Work backwards from the return statement.

Here's an example assembly file (output from gcc -S testasm.c) to consider:

    .file "testasm.c"
    .text
    .globl  _Z1fiii
    .type   _Z1fiii, @function
_Z1fiii:
.LFB0:
    cmpl    %edx, %esi
    je  .L3
    movl    %esi, %eax
    ret
.L3:
    movl    %edi, %eax
    ret
.LFE0:
    .size   _Z1fiii, .-_Z1fiii
    .ident  "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
    .section        .note.GNU-stack,"",@progbits

There are many lines here that are effectively comments. All lines starting with a dot (e.g., .file or .ident) are of this kind: they constitute "directives", rather than instructions. Some directives tell the assembler what to do, but they're often unimportant to your understanding. All directive lines that end in a colon, however, are important: they constitute labels, which matter for control flow instructions (e.g., .L3:).

The actual instructions are on the indented lines between the labels. The processor will execute these top to bottom unless it's told to continue somewhere other than the next instruction by a control flow instruction (e.g., je). A good way to figure out the important parts of the instructions is often to work backwards from the ret instructions, which is the function's return point. Why do we work backwards? Going forward is more difficult because it requires understanding what the state of the processor's register is when we call the function (something not explicitly described in the file here). By working backwards from the return, we can often figure out what the function does without knowing its inputs.

Sometimes, we are also interested in looking at the assembly in an already-compiled object file – i.e., binary code after the translation from human-readable assembly language to bytes. This is called "disassembling" from executable instructions, and happens when we look at assembly using GDB, objdump -d, or objdump -S. This output looks different from compiler-generated assembly: in disassembled instructions, there are no intermediate labels or directives. This is because the labels and directives disappear during the process of generating executable instructions.

Registers

Assembly instructions operate on registers, small pieces of very fast memory inside the processor. To process data stored in memory, the processor first needs to load it into registers; and once it has completed working on the data in a register, it needs to store it back to memory.

Registers are the fastest kind of memory available in the machine. x86-64 has 14 general-purpose registers and several special-purpose registers. The table below lists all basic registers, with special-purpose registers highlighted in yellow. You won't understand all columns yet, but you will soon and can then use this table as a reference (we won't ask you to memorize it in detail). You'll notice different naming conventions for subsets of the same register, a side effect of the long history of the x86 architecture (the first x86 processor, the 8086 was first released in 1978).

Full register name	32-bit (bits 0–31)	16-bit (bits 0–15)	8-bit low (bits 0–7)	8-bit high (bits 8–15)	Use in calling convention	Callee-saved?
General-purpose registers:
%rax	%eax	%ax	%al	%ah	Return value (accumulator)	No
%rbx	%ebx	%bx	%bl	%bh	–	Yes
%rcx	%ecx	%cx	%cl	%ch	4th function argument	No
%rdx	%edx	%dx	%dl	%dh	3rd function argument	No
%rsi	%esi	%si	%sil	–	2nd function argument	No
%rdi	%edi	%di	%dil	–	1st function argument	No
%r8	%r8d	%r8w	%r8b	–	5th function argument	No
%r9	%r9d	%r9w	%r9b	–	6th function argument	No
%r10	%r10d	%r10w	%r10b	–	–	No
%r11	%r11d	%r11w	%r11b	–	–	No
%r12	%r12d	%r12w	%r12b	–	–	Yes
%r13	%r13d	%r13w	%r13b	–	–	Yes
%r14	%r14d	%r14w	%r14b	–	–	Yes
%r15	%r15d	%r15w	%r15b	–	–	Yes
Special-purpose registers:
%rsp	%esp	%sp	%spl	–	Stack pointer	Yes
%rbp	%ebp	%bp	%bpl	–	Base pointer (general-purpose in some compiler modes)	Yes
%rip	%eip	%ip	–	–	Instruction pointer (Program counter; called $pc in GDB)	*
%rflags	%eflags	%flags	–	–	Flags and condition codes	No

Note that unlike primary memory (RAM) – which is what we think of when we discuss memory in a C/C++ program – registers have no addresses! There is no address value that, if cast to a pointer and dereferenced, would return the contents of the %rax register. Registers live in a separate world from the memory, and we need special instructions to move data to and from registers and memory.

Whenever you see %ZZZ in assembly code, this refers to a register named ZZZ. The x86-64 registers have confusing names because they evolved over time; each register also has multiple names that refer to different subsets of its bits. For example %rax, one of the general-purpose registers that is, by convention, used to pass return values from functions, is split into the following five names:

 63                              31              15      7       0
 +-------------------------------+-------------------------------+
 |                               |               |       |       |
 +---------------------------------------------------------------+

 |---------------------%rax (64 bits/8 bytes)--------------------|
                                 |-----%eax (32 bits/4 bytes)----|
                                                 |-%ax (16b/2B)--|
                                                 |--%ah--|--%al--| <-- 8 bits/1 byte each

We'll be working with these register names a lot, but you don't need to memorize them – instead, we suggest that you consult the handy summary sheet we handed out in lecture, or this page!

Summary

Today, we learned some handy rules about collections and their memory representation and review how these rules they interact with alignment, particularly within structs. We saw that changing the order in which members are declared in a struct can significantly affect its size, meaning that alignment matters for writing efficient systems code.

Finally, we looked at how the computer operates at the level just below C code: it executes a sequence of assembly instructions, which are small operations that translate into operations of the processor's circuits. Assembly is hard to write, but it is useful to be able to read it somewhat intuitively.