Lecture 7: Signed Number Representation and Alignment #
- datarep/signed-int.c (Signed integers)
- datarep/ubexplore.c (Undefined behavior)
- datarep/mexplore-align.c (Alignment)
Signed number representation #
Why are we covering this?
Debugging computer systems often require you to look at memory dumps and understand what the contents of memory mean. Signed numbers have a non-obvious representation (they will appear as very large hexadecimal values), and learning how the computer interprets hexadecimal bytes as negative numbers will help you understand better what is in memory and whether that data is what you expect. Moreover, arithmetic on signed numbers can trigger undefined behavior in non-intuitive ways; this demonstrates an instance of undefined behavior unrelated to memory access!
Recall from prior lectures that our computers use a little endian
number representation. This makes reading the values of pointers and
integers from memory dumps (like those produced by our hexdump()
function) more difficult, but it is how things work.
Using position notation on bytes allows us to represent unsigned numbers very well: the higher the byte's position in the number, the greater its value. You may have wondered how we can represent negative, signed numbers in this system, however. The answer is a representation called two's complement, which is what the x86-64 architecture (and most other architectures) use.
Two's complement strikes most people as weird when they first encounter
it, but there is an intuition for it. The best way to think about it is
that adding 1 to -1 should produce 0. The representation of 1 in a
4-byte integer is 0x0000'0001
(N.B.: for clarity for humans, I'm using
big endian notation here; on the machine, this will be laid out as
0x0100'0000
). What number, when added to this representation, yields
0?
The answer is 0xffff'ffff
, the largest representable integer in 4
bytes. If we add 1 to it, we flip each bit from f
to 0
and carry a
one, which flips the next bit in turn. At the end, we have:
0x0000'0001
+ 0xffff'ffff
--------------
0x1'0000'0000 == 0x0000'0000 (mod 2^32)
The computer simply throws away the carried 1 at the top, since it's
outside the 4-byte width of the integer, and we end up with zero, since
all arithmetic on fixed-size integers is modulo their size (here,
164 = 232). You can see this in action in
signed-int.c
.
More generally, in two's complement arithmetic, we always have -x + x
= 0, so a negative number added to its positive complement yields
zero. The principle that makes this possible is that -x
corresponds to
positive x
, with all bits flipped (written ~x
) and 1 added. In
other words, -x = ~x + 1.
Signed numbers split their range in half, with half representing
negative numbers and the other half representing 0 and positive numbers.
For example, a signed char
can represent numbers -128 to 127 inclusive
(the positive range is one smaller because it also includes 0). The most
significant bit acts as a sign bit, so all signed numbers whose top
bit is set to 1 are negative. Consequently, the largest positive value
of a signed char
is 0x7f
(binary 0111'1111), and the
largest-magnitude negative value is 0x80
(binary 1000'0000),
representing -128. The number -1 corresponds to 0xff
(binary
1111'1111), so that adding 1 to it yields zero (modulo 28).
Two's complement representation has some nice properties for building hardware: for example, the processor can use the same circuits for addition and subtraction of signed and unsigned numbers. On the downside, however, two's complement representation also has a nasty property: arithmetic overflow on signed numbers is undefined behavior.
Integer overflow #
Arithmetic overflow on signed integers is undefined behavior! To
demonstrate this, let's look at ubexplore.c
. This program takes its
first argument, converts it to an integer, and then adds 1 to it. It
also calls a function called check_signed_increment
, which uses an
assertion to check that the result of adding 1 to x
(the function's
argument) is indeed greater than x
. Intuitively, this should always be
true from a mathematical standpoint. But in two's complement arithmetic,
it's not always true: consider what happens if I pass 0x7fff'ffff
(the
largest positive signed int
) to the program. Adding 1 to this value
turns it into 0x8000'0000
, which is the smallest negative number
representable in a signed integer! So the assertion should fail in that
case.
With compiler optimizations turned off, this is indeed what happens. But since undefined behavior allows the compiler to do whatever it wants, the optimizer decides to just remove the assertion in the optimized version of the code! This is perfectly legal, because C compilers assume that programmers never write code that triggers undefined behavior, and certainly that programmers never rely on a specific behavior of code that is undefined behavior (it's undefined, after all).
Perhaps confusingly, arithmetic overflow on unsigned numbers does not constitute undefined behavior. It still best avoided, of course :)
The good news is that there is a handy sanitizer tool that helps you
detect undefined behavior such as arithmetic overflow on signed numbers.
The tool is called
UBSan,
and you can add it to your program by passing the -fsanitize=undefined
flag when you compile.
⚠️ We did not cover ubexplore2.c
this year. Following material is for
your education only; we won't test you on it. Feel free to skip ahead.
And just to mess with you and demonstrate that arithmetic overflow on
signed integers produces confusing results not only with compiler
optimizations enabled, let's look at ubexplore2.c
. This program runs a
for
loop to print the numbers between its first and second argument.
./ubexplore2.opt 0 10
prints numbers from 0 to 10 inclusive, and
./ubexplore2.opt 0x7ffffff0 0x7fffffff
prints 16 numbers from
2,147,483,632 to 2,147,483,647 (the largest positive signed 4-byte
integer we can represent). But
./ubexplore2.noopt 0x7ffffff0 0x7fffffff
prints a lot more and appears
to loop infinitely! It turns out that although the optimized behavior is
correct for mathematical addition (which doesn't have overflow), the
unoptimized code is actually correct for computer arithmetic. When we
look at the code carefully, we understand why: the loop increments i
after the body executes, and 0x7fff'ffff overflows into 0x8000'0000 (=
-1), so next time the loop condition is checked, -1 is indeed less than
or equal to n2
. But with optimizations enabled, the compiler
increments i
early and compares i + 1 < n2
rather than i <= n2
(a
legal optimization if assuming that i + 1 > i
always).
Alignment #
Why are we covering this?
Since C requires you to work closely with memory addresses, it is important to understand how the compiler lays out data in memory, and why the layout may not always be exactly what you expect. If you understand alignment, you will get pointer arithmetic and byte offsets right when you deal with them, and you will understand why programs sometimes use more memory than you would think based on your data structure specifications.
The chips in your computer are very good at working with fixed-size
numbers. This is the reason why the basic integer types in C grow in
powers of two (char
= 1 byte, short
= 2 bytes, int
= 4 bytes,
long
= 8 bytes). But it further turns out that the computer can only
work efficiently if these fixed-size numbers are aligned at specific
addresses in memory. This is especially important when dealing with
structs, which could be of arbitrary size based on their definition, and
could have odd memory layouts following the struct rule.
Just like each primitive type has a size, it also has an alignment. The alignment means that all objects of this type must start at an address divisible by the alignment. In other words, an integer with size 4 and alignment 4 must always start at an address divisible by 4. (This applies independently of whether the object is inside a collection, such as a struct or array, or not.) The table below shows the alignment restrictions of primitive types on an x86-64 Linux machine.
Type | Size | Address restriction |
---|---|---|
char (signed char , unsigned char ) |
1 | No restriction |
short (unsigned short ) |
2 | Multiple of 2 |
int (unsigned int ) |
4 | Multiple of 4 |
long (unsigned long ) |
8 | Multiple of 8 |
float |
4 | Multiple of 4 |
double |
8 | Multiple of 8 |
T* |
8 | Multiple of 8 |
The reason for this lies in the way hardware is constructed: to end up with simpler wiring and logic, computers often move fixed amounts of data around. In particular, when the computer's process accesses memory, it actually does not go directly to RAM (the random access memory whose chips hold our bytes). Instead, it accesses a fast piece of memory that contains a tiny subset of the contents of RAM (this is called a "cache" and we'll learn more about it in future lectures!). But building logic that can copy memory at any arbitrary byte address in RAM into this smaller memory would be hugely complicated, so the hardware designers chunk RAM into fixed-size "blocks" that can be copied efficiently. The size of these blocks differs between computers, but their existence reveals why alignment is necessary.
Let's assume there were no alignment constraints, and consider a situation like the one shown in the following:
| 4B int | <-- unaligned integer stored across block boundary
| 2B | 2B | <-- 2 bytes in block k, 2 bytes in block k+1
----+-----------+-----------+-----------+--
... | block k | block k+1 | block k+2 | ... <- memory blocks ("cache lines")
----+-----------+-----------+-----------+--
An unaligned integer could end up being stored across the boundary between two memory blocks. This would require the processor to fetch two blocks of RAM into its fast cache memory, which would not only take longer, but also make the circuits much harder to build. With alignment, the circuit can assume that every integer (and indeed, every primitive type in C) is always contained entirely in one memory block.
| 4B int | <-- aligned integer stored entirely in one block
| 4B | <-- all 4 bytes in block k+1
----+-----------+-----------+-----------+--
... | block k | block k+1 | block k+2 | ... <- memory blocks ("cache lines")
----+-----------+-----------+-----------+--
The compiler, standard library, and operating system all work together
to enforce alignment restrictions. If you want to get the alignment of a
type in a C program, you can use the sizeof
operator's cousin
alignof
. In other words, alignof(int)
is replaced with 4 by the
compiler, and similarly for other types.
We can now write down a precise definition of alignment: The alignment
of a type T
is a number a
≥ 1 such that the address of every object
of type T
is a multiple of a
. Every object with type T
has size
sizeof(T)
, meaning that it occupies sizeof(T)
contiguous bytes of
memory; and each object of type T
has alignment alignof(T)
, meaning
that the address of its first byte is a multiple of alignof(T)
.
You might wonder what the maximum alignment is – the larger an
alignment, the more memory might get wasted by being unusable! It turns
out that the 64-bit architectures we use today have maximum 16-byte
alignment, which is sufficient for the largest primitive type,
long double
.
Note that structs are not primitive types, so they aren't as such subject to alignment constraints. However, each struct has a first member, and by the first member rule for collections, the address of the struct is the address of the first member. Since struct members are primitive types (even with nested structures, eventually you'll end up with primitive type members after expansion), and those members do need to be aligned. We will talk more about this next time!
Alignment constraints also apply when the compiler lays out variables on
the stack. mexplore-order.c
illustrates this: with all int
variables
and char
variables defined consecutively, we end up with the memory
addresses we might expect (the three int
s are consecutive in memory,
and the three char
s are in the bytes below them). But if I move c1
up to declare it just after i1
, the compiler leaves a gap below the
character, so that the next integer is aligned correctly on a four-byte
boundary.
But: if we turn on compiler optimizations, there is no gap! The compiler
has reordered the variables on the stack to avoid wasting memory: all
integers are again consecutive in memory, even though we didn't declare
them in that order. This is permitted, as there is no rule about the
order of stack-allocated variables in memory (nor is there one about the
order of heap-allocated ones, though addresses returned from malloc()
do need to be aligned). If these variables were in a struct
(as in
x_t
), however, the compiler could not perform this optimization
because the struct rule forbids reordering members.
Summary #
Today, we learned that certain arithmetic operations on numbers can
invoke the dreaded undefined behavior, and the confusing effects this
can have. We also dove into the tricky subject of alignment in memory,
where the compiler sometimes wastes memory to achieve faster program
execution, and learned how the bytes of types larger than a char
, are
actually laid out in memory.
Next time, we'll learn some handy rules about collections and their memory representation and review how these rules they interact with alignment, particularly within structs.