Lecture 8: Alignment, Collection Rules, Assembly Language #
Alignment continued, and Collection Rules #
In previous lectures, we built up to specifying a set of rules that govern how the C language expects data to be laid out in memory. We're now ready to write down these rules.
Here are the first two:
- the first member rule says that the address of the collection (array, structure, or union \[see below\]) is the same as the address of its first member;
- the array rule says that all members of an array are laid out consecutively in memory; and
How are the members of a struct like list_node_t
actually laid out in
memory? This is defined by the struct rule, which says that the
members of a struct are laid out in the order they're declared in,
without overlap, and subject only to alignment constraints. These
mysterious "alignment constraints" are what makes our list_node_t
have
a size of 16 bytes even though it only needs 12.
So, by the first member rule, the struct will be aligned. (It turns out
that, in practice, structures on the heap are aligned on 16-byte
boundaries because malloc()
on x86-64 Linux returns 16-byte aligned
pointers; structures on the stack are aligned by the compiler.)
The size of a struct might therefore be larger than the sum of the sizes
of its components due to alignment constraints. Since the compiler must
lay out struct components in order, and it must obey the components'
alignment constraints, and it must ensure different components don't
overlap, it must sometimes introduce extra space in structs. This space
is called padding, and it's effectively wasted memory. Our linked
list node is an example of a situation where padding is required : the
struct will have 4 bytes of padding after int v
, to ensure that
list_node_t*
has a correct alignment (address divisible by 8).
So, we can now specify the third rule:
- the struct rule says that members of a struct are laid out in declaration order, without overlap, and with minimum padding as necessary to satisfy the struct members' alignment constraints.
In addition to these rules, there are three more we haven't covered or made explicit yet.
Aside: Unions #
⚠️ We did not cover unions in the course this year. Following material is for your education only; we won't test you on it. Feel free to skip ahead to the other rules.
For the next rule, we need to learn about unions, which are another collection type in C.
A union is a C data structure that looks a lot like a struct, but which contains only one of its members. Here's an example:
union int_or_char {
int i;
char c;
}
Any variable u
of type union int_or_char
is either an integer (so
u.i
is valid) or a char (so u.c
is valid), but never both at the
same time. Unions are rarely used in practice and you won't need them in
this course. The size of a union is the maximum of the sizes of its
members, and so is its alignment.
What are unions good for?
Unions are helpful when a data structure's size is of the essence (e.g., for embedded environments like the controller chip in a microwave), and in situations where the same bytes can represent one thing or another. For example, the internet is based on a protocol called IP, and there are two versions of: IPv4 (the old one) and IPv6 (the new one, which permits >4B computers on the internet). But there are situations where we need to pass an address that either follows the IPv4 format (4 bytes) or the IPv6 format (16 bytes). A union makes this possible without wasting memory or requiring two separate data structures.
Now we can get to the next rule!
- The union rule says that the address of all members of a union is the same as the address of the union.
Back to other rules! #
The remaining two rules are far more important:
- The minimum rule says that the memory used for a collection shall be the minimum possible without violating any of the other rules.
- The malloc rule says that any call to malloc that succeeds
returns a pointer that is aligned for any type. This rule has some
important consequences: it means that
malloc()
must return pointers aligned for the maximum alignment, which on x86-64 Linux is 16 bytes. In other words, any pointer returned from malloc points to an address that is a multiple of 16.
One consequence from the struct rule and the minimum rule is that
reordering struct members can reduce size of structures! Look at the
example in mexplore-structalign.c
. The struct ints_and_chars
defined
in that file consists of three int
s and three char
s, whose
declarations alternate. What will the size of this structure be?
It's 24 bytes. The reason is that each int
requires 4 bytes (so, 12
bytes total), and each char
requires 1 byte (3 bytes total), but
alignment requires the integers to start at addresses that are multiples
of four! Hence, we end up with a struct layout like the following:
0x... 00 ... 04 ... 08 ... 0c ... 10 ... 14 ... <- addresses (hex)
+------+--+---+------+--+---+------+--+---+
| i1 |c1|PAD| i2 |c2|PAD| i3 |c3|PAD| <- values
+------+--+---+------+--+---+------+--+---+
This adds 9 bytes of padding – a 37.5% overhead! The padding is needed because the characters only use one byte, but the next integer has to start on an address divisible by 4.
But if we rearrange the members of the struct, declaring them in order
i1
, i2
, i3
, c1
, c2
, c3
, the structure's memory layout
changes. We now have the three integers adjacent, and since they require
an alignment of 4 and are 4 bytes in size, no padding is needed between
them. Following, we can put the characters into contiguous bytes also,
since their size and alignment are 1.
0x... 00 ... 04 ... 08 ... 0c 0d 0e 0f ... <- addresses (hex)
+------+------+------+--+--+--+--+
| i1 | i2 | i3 |c1|c2|c3|P.| <- values
+------+------+------+--+--+--+--+
We only need a single byte of padding (6.25% overhead), as the struct
must be padded to 16 bytes (why? Consider an array of ints_and_chars
and the alignment of the next element!). In addition, the structure is
now 16 bytes in size rather than 24 bytes – a 33% saving.
Assembly Language #
We've now arrived at the end of our introduction to C programming. The rest of the course will mostly look at higher-level concepts built atop the understanding we have developed. But before the look at higher levels, we will briefly pull back the covers and see what happens at the level below C to make your programs run.
| Web sites, Google, Facebook, AirBnB, etc.
--- |-------------------------------------------
| Distributed systems <-- block 4
C |-------------------------
S | Parallel programming <-- block 3
|-------------------------
3 | C++ | Operating systems <-- block 2
0 |-------------------------
0 | C programming language <-- we discussed this so far
|-------------------------
| Assembly language <-- we will briefly cover this
--- |-------------------------------------------
| Hardware (chips)
Now that you understand the C language and memory representations of data, you may wonder about the "magic" hexadecimal bytes that the compiler outputs to make your computer's processor do things like adding numbers. How does the compiler choose these bytes, and what bytes are valid?
Each computer architecture (such as x86-64, which most modern computers use and we're considering in this course) has an instruction set specified by the manufacturer. The instruction set, first and foremost, defines what sequences of bytes trigger specific behavior in the processor (e.g., adding numbers, comparing them for equality, or loading data from memory). But hexadecimal bytes are hard for humans to read, so the instruction set also comes with a human-readable assembly language that consists of short, mnemonic instructions that correspond directly to a byte encoding (i.e., each of these instructions corresponds to a specific, unique set of hexadecimal bytes).
Why are we covering this?
You will almost certainly never need to write code in assembly language yourself, but it is helpful to have at least some intuition of how to read it. Being able to read assembly helps you debug weird problems with your program, and can help you understand compiler optimizations betters. Some of the examples we look at will also illustrate to you the tricks used to make computers run our code efficiently.
If you're curious to learn more details about assembly, consider taking CSCI 0330, which explains it in more detail and with more hands-on exercises than we'll have time for!
From C code to instructions #
Let's take a look at how your C program get turned into hexadecimal bytes that can run on your processor.
The compiler, which we've discussed a lot already, turns your C program into assembly language. Lots of cleverness and optimizations.
The assembler turns assembly language into a set of bytes in an executable. This a very direct translation.
How to read an assembly file #
Assembly files (and assembly layout in GDB, layout asm
) can be
confusing at first. The important tricks to reading them are the
following:
- Focus only on the instructions that you care about, and initially ignore anything else.
- Work backwards from the return statement.
Here's an example assembly file (output from gcc -S testasm.c
) to
consider:
.file "testasm.c"
.text
.globl _Z1fiii
.type _Z1fiii, @function
_Z1fiii:
.LFB0:
cmpl %edx, %esi
je .L3
movl %esi, %eax
ret
.L3:
movl %edi, %eax
ret
.LFE0:
.size _Z1fiii, .-_Z1fiii
.ident "GCC: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",@progbits
There are many lines here that are effectively comments. All lines
starting with a dot (e.g., .file
or .ident
) are of this kind: they
constitute "directives", rather than instructions. Some directives tell
the assembler what to do, but they're often unimportant to your
understanding. All directive lines that end in a colon, however, are
important: they constitute labels, which matter for control flow
instructions (e.g., .L3:
).
The actual instructions are on the indented lines between the labels.
The processor will execute these top to bottom unless it's told to
continue somewhere other than the next instruction by a control flow
instruction (e.g., je
). A good way to figure out the important parts
of the instructions is often to work backwards from the ret
instructions, which is the function's return point. Why do we work
backwards? Going forward is more difficult because it requires
understanding what the state of the processor's register is when we call
the function (something not explicitly described in the file here). By
working backwards from the return, we can often figure out what the
function does without knowing its inputs.
Sometimes, we are also interested in looking at the assembly in an
already-compiled object file – i.e., binary code after the translation
from human-readable assembly language to bytes. This is called
"disassembling" from executable instructions, and happens when we look
at assembly using GDB, objdump -d
, or objdump -S
. This output looks
different from compiler-generated assembly: in disassembled
instructions, there are no intermediate labels or directives. This is
because the labels and directives disappear during the process of
generating executable instructions.
Registers #
Assembly instructions operate on registers, small pieces of very fast memory inside the processor. To process data stored in memory, the processor first needs to load it into registers; and once it has completed working on the data in a register, it needs to store it back to memory.
Registers are the fastest kind of memory available in the machine. x86-64 has 14 general-purpose registers and several special-purpose registers. The table below lists all basic registers, with special-purpose registers highlighted in yellow. You won't understand all columns yet, but you will soon and can then use this table as a reference (we won't ask you to memorize it in detail). You'll notice different naming conventions for subsets of the same register, a side effect of the long history of the x86 architecture (the first x86 processor, the 8086 was first released in 1978).
Full register name | 32-bit (bits 0–31) |
16-bit (bits 0–15) |
8-bit low (bits 0–7) |
8-bit high (bits 8–15) |
Use in calling convention | Callee-saved? |
---|---|---|---|---|---|---|
General-purpose registers: | ||||||
%rax | %eax | %ax | %al | %ah | Return value (accumulator) | No |
%rbx | %ebx | %bx | %bl | %bh | – | Yes |
%rcx | %ecx | %cx | %cl | %ch | 4th function argument | No |
%rdx | %edx | %dx | %dl | %dh | 3rd function argument | No |
%rsi | %esi | %si | %sil | – | 2nd function argument | No |
%rdi | %edi | %di | %dil | – | 1st function argument | No |
%r8 | %r8d | %r8w | %r8b | – | 5th function argument | No |
%r9 | %r9d | %r9w | %r9b | – | 6th function argument | No |
%r10 | %r10d | %r10w | %r10b | – | – | No |
%r11 | %r11d | %r11w | %r11b | – | – | No |
%r12 | %r12d | %r12w | %r12b | – | – | Yes |
%r13 | %r13d | %r13w | %r13b | – | – | Yes |
%r14 | %r14d | %r14w | %r14b | – | – | Yes |
%r15 | %r15d | %r15w | %r15b | – | – | Yes |
Special-purpose registers: | ||||||
%rsp | %esp | %sp | %spl | – | Stack pointer | Yes |
%rbp | %ebp | %bp | %bpl | – | Base pointer (general-purpose in some compiler modes) |
Yes |
%rip | %eip | %ip | – | – | Instruction pointer (Program counter; called $pc in GDB) |
* |
%rflags | %eflags | %flags | – | – | Flags and condition codes | No |
Note that unlike primary memory (RAM) – which is what we think of when
we discuss memory in a C/C++ program – registers have no addresses!
There is no address value that, if cast to a pointer and dereferenced,
would return the contents of the %rax
register. Registers live in a
separate world from the memory, and we need special instructions to move
data to and from registers and memory.
Whenever you see %ZZZ
in assembly code, this refers to a register
named ZZZ
. The x86-64 registers have confusing names because they
evolved over time; each register also has multiple names that refer to
different subsets of its bits. For example %rax
, one of the
general-purpose registers that is, by convention, used to pass return
values from functions, is split into the following five names:
63 31 15 7 0
+-------------------------------+-------------------------------+
| | | | |
+---------------------------------------------------------------+
|---------------------%rax (64 bits/8 bytes)--------------------|
|-----%eax (32 bits/4 bytes)----|
|-%ax (16b/2B)--|
|--%ah--|--%al--| <-- 8 bits/1 byte each
We'll be working with these register names a lot, but you don't need to memorize them – instead, we suggest that you consult the handy summary sheet we handed out in lecture, or this page!
Summary #
Today, we learned some handy rules about collections and their memory representation and review how these rules they interact with alignment, particularly within structs. We saw that changing the order in which members are declared in a struct can significantly affect its size, meaning that alignment matters for writing efficient systems code.
Finally, we looked at how the computer operates at the level just below C code: it executes a sequence of assembly instructions, which are small operations that translate into operations of the processor's circuits. Assembly is hard to write, but it is useful to be able to read it somewhat intuitively.