CS 300/1310 Spring 2021 Midterm Quiz
Welcome to our midterm quiz! We’re hoping this quiz provides you with an opportunity to review your knowledge and what you've learned in CS 300 so far. Our questions are designed to check your conceptual understanding, rather than asking you to memorize details.
Please make sure you read the rules below before you get started.
Rules
The quiz is open book, open notes, open computer. This means you can use the following resources:
- You may access a browser and a PDF reader.
- You may access your own notes and your assignment code electronically.
- You may access an internet site on which your own notes and assignment code are stored.
- You may access this year’s course website and notes on it.
- You may access pages directly linked from the course site, including our lecture materials and links therein.
- You may run a C/C++ compiler, including an assembler and linker, or a calculator.
- You may access manual pages.
- You may access Piazza.
- You may access Google or Wikipedia or other online resources.
- You absolutely may not contact other humans via chat, phone calls, or anything like it.
- You cannot ask quiz questions on online platforms like Stackoverflow, etc.
- You may not access solutions from any previous exam, by paper or computer, including exams from other courses.
Completing Your Quiz
Create a cs300-s21-quizzes repository. Clone it
to your computer. Before beginning
the quiz, merge your local cs300-s21-quizzes
repository with our handout (in case we release
any fixes):
$ git pull git://github.com/csci0300/cs300-s21-quizzes.git masterYou will enter your answers in the
midterm/midterm.md
file. Do not place your name in this
file. (We will grade your quiz anonymously.)
When you have completed the quiz, edit the file midterm/policy.txt
to sign your name. This
is your promise that you have obeyed the quiz rules in letter and spirit. (We won’t look at this file in
grading.)
Then commit your changes and push them to your repository on GitHub:
$ git commit -a -m "Midterm Quiz Completed" $ git pushIf you get an error message that you do not have access to push to
github.com/csci0300/cs300-s21-quizzes.git
, this means that you are trying to push to our
handout repository. Instead, push explicitly to your own repository:
$ git push https://github.com/csci0300/cs300-s21-quizzes-YOURGITHUBUSERNAME masterMake sure that you have entered your quiz repository URL on the grading server for the midterm.
Notes
Assume an x86-64 architecture and a Linux operating system unless noted otherwise in the question.
Memory Organization (10 pts)
QUESTION 1A (5 pts). Consider the following C++ program:
#include <stdio.h>
#include <vector>
#include <stdlib.h>
const char* hello = "hello";
int main(int argc, char** argv) {
if (argc == 1) {
printf("usage: ./say WHAT [MORE WORDS]\n");
exit(1);
}
std::vector<char*>* v = new std::vector<char*>;
for (int i = 1; i < argc; i++) {
v->push_back(argv[i]);
}
printf("%s ", hello);
for (auto iterator = v->begin(); iterator != v->end(); ++iterator) {
printf("%s ", *iterator);
}
printf("\n");
delete v;
}
What segment of memory (code, data, heap, or stack) is the memory for the following objects in?
hello
.- The code of the
push_back()
function onv
. v
.iterator
.argc
.
- Data segment / .rodata / static segment (all correct)
- Code segment / .text
- Heap
- Stack
- Stack
QUESTION 1B (2 pts). Which of the following statements is/are true? Select all that apply.
- In the stack segment of memory, C only allows memory objects with sizes known at compile time.
- The heap is, at all times, a contiguous segment of allocated memory.
- The code segment (also referred to as
.text
) contains the C code of the program. - Releasing dynamic lifetime memory requires the programmer to explicitly call
free()
(or for the process to exit).
- True
- False (the heap can have unallocated “holes”)
- False (it contains compiled machine code)
- True
QUESTION 1C (1 pts). You find a file called horrible.c
in a C codebase and try to
decipher it. Unfortunately, someone left TYPE
instead of the true type of x
. What should
TYPE
be to make the file compile without warnings or errors?
#include <stdlib.h>
int main() {
long l = 9999999999;
void* b = &(((char*)&l)[2]);
int* a[2];
a[0] = NULL;
a[1] = b;
TYPE x = *((&a[0]) + 1);
}
int*
. (void*
also works, though it's not what we intended.)
QUESTION 1D (2 pts). You find some hexdump output in a file. What type of data do the following snippets likely represent?
6869 2c20 6373 3330 3021 0a00
0100 0000
Other explanations possible, but these are the most likely contents.
- (ASCII) string, “hello cs300!\n”
- Unsigned int, value 1 / signed int, value 1 / array of bytes, first with value 1, others 0
Signed Numbers (5 pts)
QUESTION 2A (2 pts). What numbers do the following bytes represent when interpreted as a signed integer type of the appropriate length?
0xFC
0x8000'0001
- -4
- -2,147,483,647 / INT32_MIN + 1
QUESTION 2B (3 pts). Consider the following code:
#include <stdlib.h>
int main() {
unsigned long l1 = 10;
unsigned long l2 = UINT64_MAX;
long l3 = -1000;
int i1 = 42;
int i2 = 0x7FFF'FFF0;
int i3 = 0x7FFF'FFC0;
}
l1 + l2
l1 + (unsigned long)l3
i2 + i1
i1 + i3
(int)l1 * i3
(long)i1 - l3
- Overflow
- No overflow (casting
l3
tounsigned long
results in a large number, but that number + 10 is still within range)- Overflow (undefined behavior)
- No overflow: 0x7FFF'FFC0 = 0x7FFF’FFFF - 0x3F (63), so adding 42 will not overflow
- Overflow (multiplicative and undefined behavior)
- No overflow (valid subtraction of signed longs)
Collections and Alignment (10 pts)
QUESTION 3A (4 pts). You have been hired by Hooli, a large tech company, to develop a data structure that represents web pages in their search index. Hooli expects to use this data structure for billions of webpages, and must store its search index in memory for speed.
Which of the following data structures (A or B) is more efficient for Hooli's use case, and why?
struct webpageA { char* url; char preview[201]; // 200 characters + NUL terminator char crawled_today; // 0 or 1 int rank; };
struct webpageB {
char* url;
char preview[201]; // 200 characters + NUL terminator
int rank;
char crawled_today; // 0 or 1
};
webpageA is more efficient.
webpageA needs 216 bytes of memory:
struct webpageA { char* url; // 8 bytes // no padding needed char preview[201]; // 200 characters + NUL terminator // no padding needed char crawled_today; // 0 or 1 // 2 bytes of padding to get to 212 from start (divisible by 4) int rank; // 4 bytes // no padding needed }; // total: 8 + 201 + 1 + 2 + 4 = 216webpageB needs 224 bytes of memory:
struct webpageB { char* url; // 8 bytes // no padding needed char preview[201]; // 200 characters + NUL terminator // 3 bytes of padding to get to 212 from start (divisible by 4) int rank; // 4 bytes // no padding needed char crawled_today; // 1 byte // 7 bytes of padding to get to a size divisible by 8 }; // total: 8 + 201 + 3 + 4 + 1 + 7 = 224
QUESTION 3B (3 pts). Create a structure that contains exactly 27 bytes of data and 5 bytes of padding.
Many possible answers, but the simplest is something like:
struct s { long l; // forces 8-byte alignment, 8 data bytes char c[19]; // 19 data bytes // 5 bytes of padding to get to 32 for alignment }
QUESTION 3C (3 pts). Imagine a fictional hardware architecture called x86-64-yuge, which adds a new primitive
type ultralong
that has a size of 32 bytes and an alignment of 32. Assume all the usual rules of sizes and alignments in C
and x86-64 still apply otherwise. Answer the following questions.
- What is the minimum size of processor registers on x86-64-yuge?
- What is the alignment of addresses returned from
malloc()
on x86-64-yuge? - What are the size and alignment of a struct containing an
ultralong
and anint
on x86-64-yuge?
- 32 bytes / 256 bits, as an ultralong primitive must fit into a register
- 32 byte alignment, since memory from
malloc()
must be aligned for the largest primitive type (malloc rule)- Size: 64 bytes (must end on a 32-byte boundary by array rule)
Alignment: 32 bytes (must align for largest member by struct rule)
DMalloc (10 pts)
QUESTION 4A (4 pts). Consider the following code:
char* f() {
char* buf = (char*)malloc(100);
LINE // see below
}
LINE
? Mention all that apply, and specify what kind of UB
it is (or what triggers it).
return buf;
printf("%c\n", buf[4])
.buf[102] = '\0'
.memset(buf, '\0', 100);
*((int*)buf + 25) = 999;
- Not UB (accessing in called would be UB, as memory is uninitialized)
- UB, accessing uninitialized memory
- UB, boundary write error / heap buffer overflow
- Not UB
- UB, will write at buf[100] (via 25 * sizeof(int)), which is the first byte that is out of bounds
QUESTION 4B (2 pts). In the DMalloc project, you built a debugging allocator that could detect many kinds of error. Some students
used exclusively internal metadata, where the debugging allocator's data was stored within the same block of memory as the payload. Some students
used external metadata (e.g., an std::map
) to map pointers to metadata, and many used a mix of both.
- What type of error can be detected only with internal metadata?
- From the perspective of caching, which approach would likely yield better performance? Assume external metadata is stored in random heap locations.
- Boundary write errors (as they require secret values next to the allocation)
- Internal metadata, since the metadata is likely in the same cache block as the data
QUESTION 4C (2 pts). Ben Hackerbro is dismissive of the idea of debugging allocators. He argues that the heap is large, and he can
simply modify the OS kernel to put heap allocations at addresses far away from each other and surround them with a few thousand NUL bytes
(\0
). Explain one serious problem with Ben's approach.
Any of the following (or other reasonable answer) gets full credit:
- Ben’s plan wastes a lot of memory, particularly with small allocations.
- Ben’s plan cannot detect buffer overflows where the overflowing data is `\0` bytes (a common case would be a string allocation that didn’t account for the terminator).
- Ben’s plan leaves undefined behavior undetected, with potentially disastrous consequences.
- Ben’s plan allows overruns by a lot (>1000 bytes) to corrupt data on the heap, and he cannot detect it since he’s not using a debugging allocator.
QUESTION 4D (2 pts). After acceeding to the need for a debugging allocator, Ben goes to work on DMalloc. He ends up with the following code for the struct representing his metadata:
struct metadata {
size_t alloc_size;
char magic[13];
};
This violates themalloc()
rule, which says thatmalloc()
must return an address aligned to a multiple of the largest primitive (on Linux/x86-64, 16 bytes). The struct’s largest member is 8 bytes in size, so the end of the struct will be aligned with a multiple of 8. The struct has 21 bytes of data, so there will be 3 bytes of padding, for a final size of 24 (divisible by 8). But the payload will then start 24 bytes after the allocation’s start, which is not divisible by 16.
Assembly (8 pts)
QUESTION 5A (4 pts). Elliott Engineer opens a C source code file, but unfortunately they find that a few crucial
lines implementing function f()
were deleted by someone else:
long f(long x) {
// DELETED
}
int main() {
int x = 42;
f(x);
}
objdump
to extract the assembly code, and find the following:
f:
.LFB0:
movq %rdi, -8(%rsp)
cmpq $100, -8(%rsp)
jle .L2
movl $7, %eax
ret
.L2:
movq -8(%rsp), %rax
addq $1, %rax
ret
f
to the best of your ability. You can make plausible assumptions about the C types
removed on compilation.
long f(long x) { if (x <= 100) { return x + 1; } else { return 7; } }Other types (e.g., unsigned long) are also possible.
QUESTION 5B (4 pts). Consider the following disassembly:
.LFB0:
movl %edi, -20(%rsp)
movq $0, -8(%rsp)
movl $0, -12(%rsp)
jmp .L2
.L3:
movl -12(%rsp), %eax
cltq
addq %rax, -8(%rsp)
addl $1, -12(%rsp)
.L2:
movl -12(%rsp), %eax
cmpl -20(%rsp), %eax
jl .L3
movq -8(%rsp), %rax
ret
- The function took a
long
as its first argument. - The function contains a loop.
- The function returns the minimum of two numbers determined by an
if
statement. - The function declares two local variables (not including arguments).
- False (the function took an
int
, indicated by use of%edi
, a 32-bit register, and bymovl
)- True (jump to a label above the jump instruction indicates this)
- False
- True (
mov
into two locations off%rsp
,-8(%rsp)
and-12(%rsp)
)
Stack (5 pts)
QUESTION 6A (5 pts). Consider the following C program:
int f(int x) {
long y = 999;
long z = x + y;
return z;
}
int main() {
int x = 42;
f(x);
}
callq
assembly instruction inside main
in the code segment is
0x40'0010
, and the entry stack pointer for main()
(i.e., the address in the
%rsp
register when main()
gets called) is 0x7fff'0000'0000
.
Assuming a normal x86-64 calling convention, that compiler optimizations are disabled, and with
the base pointer enabled, sketch the state
of the stack just before return z
completes. You can assume that the callq
instruction is 9 bytes in size (1 byte opcode + 8 bytes address).
Provide your answer in the following format:
0x7fff'0000'0000: top of main()'s stack frame. 0x7ffe'ffff'fff8: stored value of %rbp on call to main() [pointer to base of main()'s caller's stack frame] ...
0x7fff'0000'0000: top of main()'s stack frame. 0x7ffe'ffff'fff8: stored value of %rbp on call to main() [pointer to base of main()'s caller's stack frame] 0x7ffe'ffff'fff4: local variable x on main()’s stack 0x7ffe'ffff'ffe8: return address 0x40'0019 (first byte after callq) [note 8-byte alignment] 0x7ffe'ffff'ffe0: stored value of %rbp on call to f() [pointer to base of main()’s stack frame] 0x7ffe'ffff'ffdc: argument x on f()’s stack 0x7ffe'ffff'ffd0: local variable y on f()’s stack 0x7ffe'ffff'ffc8: local variable z on f()’s stack
File I/O and Caching (10 pts)
QUESTION 7A (2 pts). Imagine you run a program that uses io300_read()
to read some
data from disk. At which layers does the data get cached before the processor can work on it in registers? Mention at
least two.
Possible answers:
- Memory/DRAM (in I/O cache)
- L3, L2, L1 processor caches
- SSD/disk’s internal cache
QUESTION 7B (4 pts). What cache hit rates does each of the following access patterns results in with a single-slot, 4096-byte cache in the FileIO project? Assume that there are no special optimizations for reverse or strided access. Specify hit rates as percentages; if you want, you can include brief reasoning.
byte_cat
on a 4 MB (4,194,304 bytes) file.block_cat 512
on a 4 MB file.reverse_byte_cat
on a 4 MB file.stride_cat 1024 512
on a 1 MB file.
- 99.98% hit rate: every 4096th access misses. Explanation: The first read of 1 byte causes a fetch of 4096 bytes. The next 4095 reads will be serviced from the cache. Hit rate = 4095/4096 = 99.9%.
- 87.5% hit rate: every 8th access misses. Explanation: The first read of 512 bytes will cause a fetch of 4096 bytes. The next 7 reads (4096/512 = 8) will all be serviced by the cache. Hit rate = 7/8 = 87.5%.
- Possible answers: 0% hit rate: every access seeks and must refresh the cache / ~96% hit rate: with seek offset at cache midpoint, 1 miss every 2048 bytes
- 86% hit rate: each 4096 byte block has 1 miss, 6 hits and seven strides in it, so 6/7 of accesses hit. Explanation: stride_cat 1024 512 means a block size of 1024, and a stride size of 512 . 86% hit rate: each 4096 byte block serves 7 reads (starting bytes 0, 512, 1024, 1536, 2048, 2560, 3072) where all but the first are hits. 6/7=85.7%.
QUESTION 7C (3 pts). Which of the following statements about UNIX file descriptors is incorrect?
- File descriptors are used as arguments to I/O system calls.
- A file descriptor in userspace is an integer value.
stdout
is typically file descriptor1
.- A file descriptor's user-space representation contains ("describes") the name of the file.
- The kernel keeps additional metadata associated with each file descriptor.
- True (the first argument to read, write, lseek, and close are all file descriptors (ints). Proof: see man pages)
- True (the first argument to read, write, lseek, and close are all file descriptors (ints). Proof: see man pages)
- True (stdin = 0, stdout = 1, stderr = 2. Proof: see unistd.h)
- False (FDs in user-space are just integers. Proof: see man 2 open)
- True (additional metadata is present. Proof: see fdtable.h in linux kernel)
Operating Systems (7 pts)
QUESTION 8A (2 pts). The folks at Banana Computer, Inc. are developing a new hardware architecture and OS kernel. They plan to remove the timer interrupt from the hardware, and instead plan for the kernel to put the maximum number of milliseconds a user-space process can run into a register when the kernel schedules the process. What is the problem with Banana Computer's plan, and how would a malicious attacker exploit it? Assume that Banana's computer is a single-processor device.
Problem:Attack:
- Nothing forces the user-space process to actually stop after the given number of milliseconds, and since there is no timer interrupt, it’s impossible to force it to stop.
- The timeout register value could be corrupted (though there are ways to design around this, such as allowing no changes to the register while in user-space).
- Overwrite timeout register value to grant yourself more time.
- Ignore timeout register value and never stop.
- Trick the kernel into writing a large value into the timeout register.
QUESTION 8B (4 pts). Banana's CEO has developed a plan to get a performance
advantage over competing devices. Their plan is to abolish system calls, and have user-space processes call into the
kernel with normal function calls (i.e., a call
assembly instruction). Lead kernel developer Donna
Bytetwiddle points out that this plan could cause serious security issues. What issues is she thinking about?
Describe two (about two sentences each).
There is no privilege separation on the proposed OS. All user-space processes must therefore run with full privilege and can access any memory, including kernel memory and the memory of other processes. User-space processes can turn off interrupts, write into kernel memory, grant themselves privileges to read secrets from other processes’ memory, etc.
QUESTION 8C (1 pts). Describe one responsibility of an Operating System other than isolation between processes, and explain what mechanism or technology the OS uses to fulfill this responsibility (2 sentences).
- Standard APIs for applications (syscall API)
- Mediating access to hardware (syscall API)
- Fair resource sharing (kernel scheduling, timer interrupt)
- Avoid starvation (timer interrupt, kernel memory protection)
CS 300 Feedback (6 pts)
QUESTION 9A (3 pts). If you could make one change to CS 300/1310 to improve it (other than firing the instructor and dropping the exams), what would you change? Any answer but no answer will receive full credit.
- More collaborative (group debugging) and conceptual hours (6x)
- More TA hours (5x)
- No ALRs, fewer (but longer) ALRs (5x)
- Improve FileIO handout and stencil comments (3x)
- Group projects rather than individual ones (3x)
- Better specification of functions' behavior in projects (2x)
- Separate conception questions from projects (2x)
- Less time for early projects, more for later ones (2x)
- More ALRs per student
- Cover real OSes
- FileIO was too much work
- Remove PLQs
- Replace PLQs with mini assignments
- More alignment between labs and projects
- Shorter/better calibrated labs
- Add TA-led sections
- Remove socially-responsible computing elements
- Add a textbook
- More time for FileIO
- More nitty-gritty technical details
- References to advanced material beyond the course content
- Group lecture watching times
- Synchronous labs
- Have students work with lecture code during lectures
- Time estimates for projects/labs
QUESTION 9B (3 pts). What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.
- Stack (11x)
- Assembly (11x)
- Caching (7x)
- C++ data structures (4x)
- Pointers (3x)
- C vs. C++ (3x)
- OS security (2x)
- Hardware, how it fits together with software (2x)
- Midterm review (2x)
- More focus on performance optimization (2x)
- Memory protection
- Memory organization
- GPUs
- Advanced C++, comparison to Go/Rust
- Signed/unsigned numbers
- C++, industry applications
- Alternative approaches to OS construction
- Security, buffer overflows
Handing in!
You're almost done!
When you have completed the quiz, edit the file midterm/policy.txt
to sign your name. This
is your promise that you have obeyed the quiz rules in letter and spirit. (We won’t look at this file in grading.)
To hand in your exam, please push to your cs300-s21-quizzes-YOURGITHUBUSERNAME
repository.
Important: You're welcome to push intermediate commits as you work on the quiz.
However, when you've completed the quiz, make sure to push a commit to your repository titled "Completed Midterm Quiz". This will indicate to us that your handin is ready to be graded. Once you push this commit, you can no longer edit your answers.
Finally, head to the grading server and add your quiz repository under in the "Midterm Quiz" section. Make sure to check that your answers show up in the diff displayed on the grading server.
Congrats on finishing your CS 300 Midterm! 🥳