CSCI 0300/1310: Fundamentals of Computer Systems

⚠️ This is not the current iteration of the course! Head here for the current offering.

CS 300/1310 Spring 2021 Midterm Quiz

Completion due: 6pm (AoE) on March 11. (No late hours!)

Welcome to our midterm quiz! We’re hoping this quiz provides you with an opportunity to review your knowledge and what you've learned in CS 300 so far. Our questions are designed to check your conceptual understanding, rather than asking you to memorize details.

Please make sure you read the rules below before you get started.

Rules

The quiz is open book, open notes, open computer. This means you can use the following resources:

You may access a browser and a PDF reader.
You may access your own notes and your assignment code electronically.
You may access an internet site on which your own notes and assignment code are stored.
You may access this year’s course website and notes on it.
You may access pages directly linked from the course site, including our lecture materials and links therein.
You may run a C/C++ compiler, including an assembler and linker, or a calculator.
You may access manual pages.
You may access Piazza.
You may access Google or Wikipedia or other online resources.

But:

You absolutely may not contact other humans via chat, phone calls, or anything like it.
You cannot ask quiz questions on online platforms like Stackoverflow, etc.
You may not access solutions from any previous exam, by paper or computer, including exams from other courses.

Any violations of this policy, or the spirit of this policy, are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Completing Your Quiz

Create a cs300-s21-quizzes repository. Clone it to your computer. Before beginning the quiz, merge your local cs300-s21-quizzes repository with our handout (in case we release any fixes):

$ git pull git://github.com/csci0300/cs300-s21-quizzes.git master

You will enter your answers in the midterm/midterm.md file. Do not place your name in this file. (We will grade your quiz anonymously.)

When you have completed the quiz, edit the file midterm/policy.txt to sign your name. This is your promise that you have obeyed the quiz rules in letter and spirit. (We won’t look at this file in grading.)

Then commit your changes and push them to your repository on GitHub:

$ git commit -a -m "Midterm Quiz Completed"
$ git push

If you get an error message that you do not have access to push to github.com/csci0300/cs300-s21-quizzes.git, this means that you are trying to push to our handout repository. Instead, push explicitly to your own repository:

$ git push https://github.com/csci0300/cs300-s21-quizzes-YOURGITHUBUSERNAME master

Make sure that you have entered your quiz repository URL on the grading server for the midterm.

Notes

Assume an x86-64 architecture and a Linux operating system unless noted otherwise in the question.

Memory Organization (10 pts)

QUESTION 1A (5 pts). Consider the following C++ program:

#include <stdio.h>
#include <vector>
#include <stdlib.h>

const char* hello = "hello";

int main(int argc, char** argv) {
  if (argc == 1) {
    printf("usage: ./say WHAT [MORE WORDS]\n");
    exit(1);
  }

  std::vector<char*>* v = new std::vector<char*>;
  for (int i = 1; i < argc; i++) {
    v->push_back(argv[i]);
  }

  printf("%s ", hello);
  for (auto iterator = v->begin(); iterator != v->end(); ++iterator) {
    printf("%s ", *iterator);
  }
  printf("\n");
  delete v;
}

What segment of memory (code, data, heap, or stack) is the memory for the following objects in?

hello.
The code of the push_back() function on v.
v.
iterator.
argc.

QUESTION 1B (2 pts). Which of the following statements is/are true? Select all that apply.

In the stack segment of memory, C only allows memory objects with sizes known at compile time.
The heap is, at all times, a contiguous segment of allocated memory.
The code segment (also referred to as .text) contains the C code of the program.
Releasing dynamic lifetime memory requires the programmer to explicitly call free() (or for the process to exit).

QUESTION 1C (1 pts). You find a file called horrible.c in a C codebase and try to decipher it. Unfortunately, someone left TYPE instead of the true type of x. What should TYPE be to make the file compile without warnings or errors?

#include <stdlib.h>

int main() {
    long l = 9999999999;
    void* b = &(((char*)&l)[2]);
    int* a[2];
    a[0] = NULL;
    a[1] = b;
    TYPE x = *((&a[0]) + 1);
}

int*. (void* also works, though it's not what we intended.)

QUESTION 1D (2 pts). You find some hexdump output in a file. What type of data do the following snippets likely represent?

6869 2c20 6373 3330 3021 0a00
0100 0000

Signed Numbers (5 pts)

QUESTION 2A (2 pts). What numbers do the following bytes represent when interpreted as a signed integer type of the appropriate length?

0xFC
0x8000'0001

QUESTION 2B (3 pts). Consider the following code:

#include <stdlib.h>

int main() {
    unsigned long l1 = 10;
    unsigned long l2 = UINT64_MAX;
    long l3 = -1000;

    int i1 = 42;
    int i2 = 0x7FFF'FFF0;
    int i3 = 0x7FFF'FFC0;
}

For each of the following expressions, will it overflow?

l1 + l2
l1 + (unsigned long)l3
i2 + i1
i1 + i3
(int)l1 * i3
(long)i1 - l3

Collections and Alignment (10 pts)

QUESTION 3A (4 pts). You have been hired by Hooli, a large tech company, to develop a data structure that represents web pages in their search index. Hooli expects to use this data structure for billions of webpages, and must store its search index in memory for speed.

Which of the following data structures (A or B) is more efficient for Hooli's use case, and why?

struct webpageA {
    char* url;
    char preview[201]; // 200 characters + NUL terminator
    char crawled_today; // 0 or 1
    int rank;
};

struct webpageB {
    char* url;
    char preview[201]; // 200 characters + NUL terminator
    int rank;
    char crawled_today; // 0 or 1
};

webpageA is more efficient.

webpageA needs 216 bytes of memory:

struct webpageA {
    char* url; // 8 bytes
    // no padding needed
    char preview[201]; // 200 characters + NUL terminator
    // no padding needed
    char crawled_today; // 0 or 1
    // 2 bytes of padding to get to 212 from start (divisible by 4)
    int rank; // 4 bytes
    // no padding needed
};  // total: 8 + 201 + 1 + 2 + 4 = 216

webpageB needs 224 bytes of memory:

struct webpageB {
    char* url; // 8 bytes
    // no padding needed
    char preview[201]; // 200 characters + NUL terminator
    // 3 bytes of padding to get to 212 from start (divisible by 4)
    int rank; // 4 bytes
    // no padding needed
    char crawled_today; // 1 byte
    // 7 bytes of padding to get to a size divisible by 8
}; // total: 8 + 201 + 3 + 4 + 1 + 7 = 224

QUESTION 3B (3 pts). Create a structure that contains exactly 27 bytes of data and 5 bytes of padding.

Many possible answers, but the simplest is something like:

struct s {
  long l;  // forces 8-byte alignment, 8 data bytes
  char c[19];  // 19 data bytes
  // 5 bytes of padding to get to 32 for alignment
}

QUESTION 3C (3 pts). Imagine a fictional hardware architecture called x86-64-yuge, which adds a new primitive type ultralong that has a size of 32 bytes and an alignment of 32. Assume all the usual rules of sizes and alignments in C and x86-64 still apply otherwise. Answer the following questions.

What is the minimum size of processor registers on x86-64-yuge?
What is the alignment of addresses returned from malloc() on x86-64-yuge?
What are the size and alignment of a struct containing an ultralong and an int on x86-64-yuge?

DMalloc (10 pts)

QUESTION 4A (4 pts). Consider the following code:

char* f() {
   char* buf = (char*)malloc(100);
   LINE // see below
}

Which of the following lines constitute undefined behavior (UB) when inserted for LINE? Mention all that apply, and specify what kind of UB it is (or what triggers it).

return buf;
printf("%c\n", buf[4]).
buf[102] = '\0'.
memset(buf, '\0', 100);
*((int*)buf + 25) = 999;

QUESTION 4B (2 pts). In the DMalloc project, you built a debugging allocator that could detect many kinds of error. Some students used exclusively internal metadata, where the debugging allocator's data was stored within the same block of memory as the payload. Some students used external metadata (e.g., an std::map) to map pointers to metadata, and many used a mix of both.

What type of error can be detected only with internal metadata?
From the perspective of caching, which approach would likely yield better performance? Assume external metadata is stored in random heap locations.

QUESTION 4C (2 pts). Ben Hackerbro is dismissive of the idea of debugging allocators. He argues that the heap is large, and he can simply modify the OS kernel to put heap allocations at addresses far away from each other and surround them with a few thousand NUL bytes (\0). Explain one serious problem with Ben's approach.

QUESTION 4D (2 pts). After acceeding to the need for a debugging allocator, Ben goes to work on DMalloc. He ends up with the following code for the struct representing his metadata:

struct metadata {
  size_t alloc_size;
  char magic[13];
};

If Ben's payload starts immediately after the struct, what rule does his implementation violate, and how could he fix it?

This violates the malloc() rule, which says that malloc() must return an address aligned to a multiple of the largest primitive (on Linux/x86-64, 16 bytes). The struct’s largest member is 8 bytes in size, so the end of the struct will be aligned with a multiple of 8. The struct has 21 bytes of data, so there will be 3 bytes of padding, for a final size of 24 (divisible by 8). But the payload will then start 24 bytes after the allocation’s start, which is not divisible by 16.

Assembly (8 pts)

QUESTION 5A (4 pts). Elliott Engineer opens a C source code file, but unfortunately they find that a few crucial lines implementing function f() were deleted by someone else:

long f(long x) {
  // DELETED
}

int main() {
  int x = 42;
  f(x);
}

Fortunately, Elliott still has the executable. They use objdump to extract the assembly code, and find the following:

f:
.LFB0:
	movq	%rdi, -8(%rsp)
	cmpq	$100, -8(%rsp)
	jle	.L2
	movl	$7, %eax
	ret
.L2:
	movq	-8(%rsp), %rax
	addq	$1, %rax
	ret

Please reconstruct the deleted source code of f to the best of your ability. You can make plausible assumptions about the C types removed on compilation.

long f(long x) {
  if (x <= 100) {
    return x + 1;
  } else {
    return 7;
  }
}
Other types (e.g., unsigned long) are also possible.

QUESTION 5B (4 pts). Consider the following disassembly:

.LFB0:
        movl    %edi, -20(%rsp)
        movq    $0, -8(%rsp)
        movl    $0, -12(%rsp)
        jmp     .L2
.L3:
        movl    -12(%rsp), %eax
        cltq
        addq    %rax, -8(%rsp)
        addl    $1, -12(%rsp)
.L2:
        movl    -12(%rsp), %eax
        cmpl    -20(%rsp), %eax
        jl      .L3
        movq    -8(%rsp), %rax
        ret

Which of the following statements about the C function that compiled to this assembly code are true?

The function took a long as its first argument.
The function contains a loop.
The function returns the minimum of two numbers determined by an if statement.
The function declares two local variables (not including arguments).

Stack (5 pts)

QUESTION 6A (5 pts). Consider the following C program:

int f(int x) {
   long y = 999;
   long z = x + y;
   return z;
}

int main() {
  int x = 42;
  f(x);
}

The address of the callq assembly instruction inside main in the code segment is 0x40'0010, and the entry stack pointer for main() (i.e., the address in the %rsp register when main() gets called) is 0x7fff'0000'0000.

Assuming a normal x86-64 calling convention, that compiler optimizations are disabled, and with the base pointer enabled, sketch the state of the stack just before return z completes. You can assume that the callq instruction is 9 bytes in size (1 byte opcode + 8 bytes address). Provide your answer in the following format:

0x7fff'0000'0000: top of main()'s stack frame.
0x7ffe'ffff'fff8: stored value of %rbp on call to main() [pointer to base of main()'s caller's stack frame]
...

0x7fff'0000'0000: top of main()'s stack frame.
0x7ffe'ffff'fff8: stored value of %rbp on call to main() [pointer to base of main()'s caller's stack frame]
0x7ffe'ffff'fff4: local variable x on main()’s stack
0x7ffe'ffff'ffe8: return address 0x40'0019 (first byte after callq) [note 8-byte alignment]
0x7ffe'ffff'ffe0: stored value of %rbp on call to f() [pointer to base of main()’s stack frame]
0x7ffe'ffff'ffdc: argument x on f()’s stack
0x7ffe'ffff'ffd0: local variable y on f()’s stack
0x7ffe'ffff'ffc8: local variable z on f()’s stack

File I/O and Caching (10 pts)

QUESTION 7A (2 pts). Imagine you run a program that uses io300_read() to read some data from disk. At which layers does the data get cached before the processor can work on it in registers? Mention at least two.

QUESTION 7B (4 pts). What cache hit rates does each of the following access patterns results in with a single-slot, 4096-byte cache in the FileIO project? Assume that there are no special optimizations for reverse or strided access. Specify hit rates as percentages; if you want, you can include brief reasoning.

byte_cat on a 4 MB (4,194,304 bytes) file.
block_cat 512 on a 4 MB file.
reverse_byte_cat on a 4 MB file.
stride_cat 1024 512 on a 1 MB file.

QUESTION 7C (3 pts). Which of the following statements about UNIX file descriptors is incorrect?

File descriptors are used as arguments to I/O system calls.
A file descriptor in userspace is an integer value.
stdout is typically file descriptor 1.
A file descriptor's user-space representation contains ("describes") the name of the file.
The kernel keeps additional metadata associated with each file descriptor.

Operating Systems (7 pts)

QUESTION 8A (2 pts). The folks at Banana Computer, Inc. are developing a new hardware architecture and OS kernel. They plan to remove the timer interrupt from the hardware, and instead plan for the kernel to put the maximum number of milliseconds a user-space process can run into a register when the kernel schedules the process. What is the problem with Banana Computer's plan, and how would a malicious attacker exploit it? Assume that Banana's computer is a single-processor device.

QUESTION 8B (4 pts). Banana's CEO has developed a plan to get a performance advantage over competing devices. Their plan is to abolish system calls, and have user-space processes call into the kernel with normal function calls (i.e., a call assembly instruction). Lead kernel developer Donna Bytetwiddle points out that this plan could cause serious security issues. What issues is she thinking about? Describe two (about two sentences each).

QUESTION 8C (1 pts). Describe one responsibility of an Operating System other than isolation between processes, and explain what mechanism or technology the OS uses to fulfill this responsibility (2 sentences).

CS 300 Feedback (6 pts)

QUESTION 9A (3 pts). If you could make one change to CS 300/1310 to improve it (other than firing the instructor and dropping the exams), what would you change? Any answer but no answer will receive full credit.

QUESTION 9B (3 pts). What topics do you wish we had spent more time on? Any answer but no answer will receive full credit.

Handing in!

You're almost done!

To hand in your exam, please push to your cs300-s21-quizzes-YOURGITHUBUSERNAME repository.

Important: You're welcome to push intermediate commits as you work on the quiz.

However, when you've completed the quiz, make sure to push a commit to your repository titled "Completed Midterm Quiz". This will indicate to us that your handin is ready to be graded. Once you push this commit, you can no longer edit your answers.

Finally, head to the grading server and add your quiz repository under in the "Midterm Quiz" section. Make sure to check that your answers show up in the diff displayed on the grading server.

Congrats on finishing your CS 300 Midterm! 🥳