Lecture 6: Alignment and Collections, Signed Number Representation and Undefined Behavior
» Lecture video (Brown ID required)
» Lecture code
» Post-Lecture Quiz (due 6pm Wednesday, February 12).
Alignment and Collection Rules, continued
In previous lectures, we built up a set of rules that govern how the C language expects data to be laid out in memory. Let's recap these rules and add a few more.
- the first member rule says that the address of the collection (array, structure, or union [see below]) is the same as the address of its first member;
- the array rule says that all members of an array are laid out consecutively in memory; and
- the struct rule says that members of a struct are laid out in declaration order, without overlap, and with minimum padding as necessary to satisfy the struct members' alignment constraints.
In addition to these rules, there are three more we haven't covered or made explicit yet. For the first of these, we need to learn about unions, which are another collection type in C.
A union is a C data structure that looks a lot like a struct, but which contains only one of its members. Here's an example:
union int_or_char {
int i;
char c;
}
Any variable u
of type union int_or_char
is either an integer (so u.i
is valid) or a char (so u.c
is valid), but never both at the same time.
Unions are rarely used in practice and you won't need them in this course. The size of a union is the maximum of the
sizes of its members, and so is its alignment.
What are unions good for?
Unions are helpful when a data structure's size is of the essence (e.g., for embedded environments like the controller chip in a microwave), and in situations where the same bytes can represent one thing or another. For example, the internet is based on a protocol called IP, and there are two versions of: IPv4 (the old one) and IPv6 (the new one, which permits >4B computers on the internet). But there are situations where we need to pass an address that either follows the IPv4 format (4 bytes) or the IPv6 format (16 bytes). A union makes this possible without wasting memory or requiring two separate data structures.
Now we can get to the next rule!
- The union rule says that the address of all members of a union is the same as the address of the union.
The remaining two rules are far more important:
- The minimum rule says that the memory used for a collection shall be the minimum possible without violating any of the other rules.
- The malloc rule says that any call to malloc that succeeds returns a pointer that is aligned for any
type. This rule has some important consequences: it means that
malloc()
must return pointers aligned for the maximum alignment, which on x86-64 Linux is 16 bytes. In other words, any pointer returned from malloc points to an address that is a multiple of 16.
One consequence from the struct rule and the minimum rule is that reordering struct members can reduce size of
structures! Look at the example in mexplore-structalign.c
. The struct ints_and_chars
defined
in that file consists of three int
s and three char
s, whose declarations alternate. What will
the size of this structure be?
It's 24 bytes. The reason is that each int
requires 4 bytes (so, 12 bytes total), and each
char
requires 1 byte (3 bytes total), but alignment requires the integers to start at addresses that are
multiples of four! Hence, we end up with a struct layout like the following:
0x... 00 ... 04 ... 08 ... 0c ... 10 ... 14 ... <- addresses (hex) +------+--+---+------+--+---+------+--+---+ | i1 |c1|PAD| i2 |c2|PAD| i3 |c3|PAD| <- values +------+--+---+------+--+---+------+--+---+This adds 9 bytes of padding – a 37.5% overhead! The padding is needed because the characters only use one byte, but the next integer has to start on an address divisible by 4.
But if we rearrange the members of the struct, declaring them in order i1
, i2
,
i3
, c1
, c2
, c3
, the structure's memory layout changes. We now have
the three integers adjacent, and since they require an alignment of 4 and are 4 bytes in size, no padding is needed
between them. Following, we can put the characters into contiguous bytes also, since their size and alignment are 1.
0x... 00 ... 04 ... 08 ... 0c 0d 0e 0f ... <- addresses (hex) +------+------+------+--+--+--+--+ | i1 | i2 | i3 |c1|c2|c3|P.| <- values +------+------+------+--+--+--+--+We only need a single byte of padding (6.25% overhead), as the struct must be padded to 16 bytes (why? Consider an array of
ints_and_chars
and the alignment of the next element!). In addition, the structure is now 16
bytes in size rather than 24 bytes – a 33% saving.
Signed number representation
Why are we covering this?
Debugging computer systems often require you to look at memory dumps and understand what the contents of memory mean. Signed numbers have a non-obvious representation (they will appear as very large hexadecimal values), and learning how the computer interprets hexadecimal bytes as negative numbers will help you understand better what is in memory and whether that data is what you expect. Moreover, arithmetic on signed numbers can trigger undefined behavior in non-intuitive ways; this demonstrates an instance of undefined behavior unrelated to memory access!
Recall from last time that our computers use a little endian number representation. This makes reading the
values of pointers and integers from memory dumps (like those produced by our hexdump()
function) more
difficult, but it is how things work.
Using position notation on bytes allows us to represent unsigned numbers very well: the higher the byte's position in the number, the greater its value. You may have wondered how we can represent negative, signed numbers in this system, however. The answer is a representation called two's complement, which is what the x86-64 architecture (and most other architectures) use.
Two's complement strikes most people as weird when they first encounter it, but there is an intuition for it. The
best way to think about it is that adding 1 to -1 should produce 0. The representation of 1 in a 4-byte integer
is 0x0000'0001
(N.B.: for clarity for humans, I'm using big endian notation here; on the machine, this will
be laid out as 0x0100'0000
). What number, when added to this representation, yields 0?
The answer is 0xffff'ffff
, the largest representable integer in 4 bytes. If we add 1 to it, we flip each
bit from f
to 0
and carry a one, which flips the next bit in turn. At the end, we have:
0x0000'0001 + 0xffff'ffff -------------- 0x1'0000'0000 == 0x0000'0000 (mod 2^32)The computer simply throws away the carried 1 at the top, since it's outside the 4-byte width of the integer, and we end up with zero, since all arithmetic on fixed-size integers is modulo their size (here, 164 = 232). You can see this in action in
signed-int.c
.
More generally, in two's complement arithmetic, we always have -x + x = 0, so a negative number added to its
positive complement yields zero. The principle that makes this possible is that -x
corresponds to positive
x
, with all bits flipped (written ~x
) and 1 added. In other words, -x = ~x +
1.
Signed numbers split their range in half, with half representing negative numbers and the other half representing 0
and positive numbers. For example, a signed char
can represent numbers -128 to 127 inclusive (the positive
range is one smaller because it also includes 0). The most significant bit acts as a sign bit, so all signed
numbers whose top bit is set to 1 are negative. Consequently, the largest positive value of a signed char
is 0x7f
(binary 0111'1111), and the largest-magnitude negative value is 0x80
(binary
1000'0000), representing -128. The number -1 corresponds to 0xff
(binary 1111'1111), so that adding 1 to it
yields zero (modulo 28).
Two's complement representation has some nice properties for building hardware: for example, the processor can use the same circuits for addition and subtraction of signed and unsigned numbers. On the downside, however, two's complement representation also has a nasty property: arithmetic overflow on signed numbers is undefined behavior.
Integer overflow
Arithmetic overflow on signed integers is undefined behavior! To demonstrate this, let's look at
ubexplore.c
. This program takes its first argument, converts it to an integer, and then adds 1 to it. It
also calls a function called check_signed_increment
, which uses an assertion to check that the result of
adding 1 to x
(the function's argument) is indeed greater than x
. Intuitively, this should
always be true from a mathematical standpoint. But in two's complement arithmetic, it's not always true: consider what
happens if I pass 0x7fff'ffff
(the largest positive signed int
) to the program. Adding 1 to
this value turns it into 0x8000'0000
, which is the smallest negative number representable in a signed
integer! So the assertion should fail in that case.
With compiler optimizations turned off, this is indeed what happens. But since undefined behavior allows the compiler to do whatever it wants, the optimizer decides to just remove the assertion in the optimized version of the code! This is perfectly legal, because C compilers assume that programmers never write code that triggers undefined behavior, and certainly that programmers never rely on a specific behavior of code that is undefined behavior (it's undefined, after all).
And just to mess with you and demonstrate that arithmetic overflow on signed integers produces confusing results not
only with compiler optimizations enabled, let's look at ubexplore2.c
. This program runs a for
loop to print the numbers between its first and second argument. ./ubexplore2.opt 0 10
prints numbers from
0 to 10 inclusive, and ./ubexplore2.opt 0x7ffffff0 0x7fffffff
prints 16 numbers from 2,147,483,632 to
2,147,483,647 (the largest positive signed 4-byte integer we can represent). But ./ubexplore2.noopt 0x7ffffff0
0x7fffffff
prints a lot more and appears to loop infinitely! It turns out that although the optimized behavior is
correct for mathematical addition (which doesn't have overflow), the unoptimized code is actually correct for computer
arithmetic. When we look at the code carefully, we understand why: the loop increments i
after the
body executes, and 0x7fff'ffff overflows into 0x8000'0000 (= -1), so next time the loop condition is checked, -1 is
indeed less than or equal to n2
. But with optimizations enabled, the compiler increments i
early and compares i + 1 < n2
rather than i <= n2
(a legal optimization if assuming that
i + 1 > i
always).
Perhaps confusingly, arithmetic overflow on unsigned numbers does not constitute undefined behavior. It still best avoided, of course :)
The good news is that there is a handy sanitizer tool that helps you detect undefined behavior such as
arithmetic overflow on signed numbers. The tool is called
UBSan, and you can add it to your program by passing the -fsanitize=undefined
flag when you compile. You'll
see this tool again in Lab 2!
Summary
Today, we reviewed the memory layout rules for collections in the C language, and how they interact with alignment. We saw that changing the order in which members are declared out in a struct can significantly affect its size, meaning that alignment matters for writing efficient systems code.
We also learned more about how computer represent integers, and in particular about how they represent negative numbers in a binary encoding. We observed that certain arithmetic operations on numbers can invoke the dreaded undefined behavior, and the confusing effects this can have.
Next time, we'll start looking at assembly and at how function calls work. Then we'll move on C++ and higher-level concepts.