CSCI 0300/1310: Fundamentals of Computer Systems

⚠️ This is not the current iteration of the course! Head here for the current offering.

CS 300/1310 Spring 2021 Final Quiz

Completion due: 6pm (AoE) on April 21. (No late hours!)

Welcome to our final quiz! We’re hoping this quiz provides you with an opportunity to review your knowledge and what you've learned in CS 300 so far. Our questions are designed to check your conceptual understanding, rather than asking you to memorize details.

Please make sure you read the rules below before you get started.

Rules

The quiz is open book, open notes, open computer. This means you can use the following resources:

You may access a browser and a PDF reader.
You may access your own notes and your assignment code electronically.
You may access an internet site on which your own notes and assignment code are stored.
You may access this year’s course website and notes on it.
You may access pages directly linked from the course site, including our lecture materials and links therein.
You may run a C/C++ compiler, including an assembler and linker, or a calculator.
You may access manual pages.
You may access Piazza.
You may access Google or Wikipedia or other online resources.

But:

You absolutely may not contact other humans via chat, phone calls, or anything like it.
You cannot ask quiz questions on online platforms like Stackoverflow, etc.
You may not access solutions from any previous exam, by paper or computer, including exams from other courses.

Any violations of this policy, or the spirit of this policy, are breaches of academic honesty and will be treated accordingly. Please appreciate our flexibility and behave honestly and honorably.

Completing Your Quiz

You will complete the final quiz using the same Git repository you already used for the midterm quiz. Before beginning the quiz, merge your local cs300-s21-quizzes repository with our handout to get the template for the final:

$ git pull git://github.com/csci0300/cs300-s21-quizzes.git master

You will enter your answers in the final/final.md file. Do not place your name in this file. (We will grade your quiz anonymously.)

When you have completed the quiz, edit the file final/policy.txt to sign your name. This is your promise that you have obeyed the quiz rules in letter and spirit. (We won’t look at this file in grading.)

Then commit your changes and push them to your repository on GitHub:

$ git commit -a -m "Final Quiz Completed"
$ git push

If you get an error message that you do not have access to push to github.com/csci0300/cs300-s21-quizzes.git, this means that you are trying to push to our handout repository. Instead, push explicitly to your own repository:

$ git push https://github.com/csci0300/cs300-s21-quizzes-YOURGITHUBUSERNAME master

Make sure that you have entered your quiz repository URL on the grading server for the final.

Notes

Assume an x86-64 architecture and a Linux operating system unless noted otherwise in the question.

Virtual Memory (14 pts)

QUESTION 1A (5 pts). CS300 student Bruno is working on WeensyOS, and observes that the project handout asks him to write code in process_setup to copy the kernel pagetable mappings into the pagetable of the newly created process. Bruno believes that this is unnecessary, arguing that WeensyOS will be more secure if user-space processes' page tables don't contain mappings for kernel memory. However, when he changes his code accordingly, Bruno finds that his WeensyOS kernel crashes and QEMU keeps rebooting. You are a CS300 TA seeing Bruno at hours! Explain why the kernel mappings must be copied.

When system calls are invoked (or traps, interrupts, or faults occur), the process must transfer control to the kernel. The syscall instruction changes protection level and jumps into the syscall handler, but it does not change the active page table. Hence, the kernel code that runs next must be mapped in the (still-active) process page table.

QUESTION 1B (4 pts). Identify whether each of the following statements about the kernel and virtual memory are true or false.

Virtual memory protects the kernel from uncontrolled access by user-space processes.
Virtual memory is utilized to protect user-space processes from each other.
Shells, browsers, and metadata about open files (e.g., the current file offset) all exist in userspace memory.
3 in the %cs register means that processor is running in privileged mode.

QUESTION 1C (5 pts). Some older computers supported segmentation, an alternative to page tables. In this scheme, contiguous segments of physical memory (of any size) are mapped into same-size, contiguous virtual memory address regions. For example, virtual addresses 0x20'0000 to 0x20'ffff might map to physical addresses 0x1'0000 to 0x1'ffff. For address translation, the processor has special hardware that understands the known segments and translates virtual addresses within a segment into the corresponding physical addresses via arithmetic (e.g., subtracting 0x19'0000 in the example).

Why are page tables preferable to this segmentation scheme? For each reason, state if it is true or false and give a brief explanation.

Page tables use less memory.
Page tables are more flexible because they can map use any physical page for any (page-aligned) virtual address.
The segmentation scheme can lead to memory fragmentation, which page tables do not suffer from.
Page tables allow for faster address translation.

Processes (15 pts)

QUESTION 2A (4 pts). Consider the common pattern of using the fork() and execv() system calls together to answer the following questions:

What would happen if execv() was called without fork() being called first?
Why would other interfaces, such as posix_spawn() or Windows's CreateProcess() be more efficient than fork()+exec()?

QUESTION 2B (11 pts). For each of the following statements, determine whether each is true or false. If the statement is false, explain why.

When a pipe is created using the pipe() system call, the kernel allocates memory for the array storing the file descriptor numbers for each end of the pipe.
When writing to a pipe, the data stays in the user-address space and is passed to other processes on demand.
Calling pipe() after calling fork() enables two processes to communicate.
The execv() family of system calls never returns.
The waitpid() system call can only wait for child processes of the calling process.
Zombie processes are a result of the kernel retaining the heap memory of an exited child process.

Threads (8 pts)

QUESTION 3A (4 pts). Consider the segment of code below, which is run by multiple threads.

int sum = 0;
// A
for (int i = 0; i < 10000000; i++) {
 // B
 sum += i;
 // C
}
// D

Which of the two locking strategies below is more efficient (i.e., scalable)? Explain your answer.

A = mtx.lock(), D = mtx.unlock()
B = mtx.lock(), C = mtx.unlock()

QUESTION 3B (4 pts). The code below creates 4 threads that each run threadfunc on a shared variable. Why does this code sometimes fail to produce the expected result of 20?

std::mutex mutex;

void threadfunc(unsigned* x) {
   for (int i = 0; i != 5; ++i) {
       mutex.lock();
       *x += 1;
       mutex.unlock();
   }
}

int main() {
   std::thread th[4];
   unsigned n = 0;
   for (int i = 0; i != 4; ++i) {
       th[i] = std::thread(threadfunc, &n);
   }
   printf("%u\n", n);

   // process exits, all threads terminate
   exit(0);
}

The main thread does not call join(), so it’s possible that the main thread finishes before the other threads do.

Synchronization (10 pts)

QUESTION 4A (2 pts). In which of the following situations could we get a potential deadlock? (Select one.)

A thread locks a mutex, and while holding the lock, tries to acquire the same lock again.
Thread A tries to acquire the mutex associated with thread B first, and then its own mutex (B's mutex -> A's mutex). At the same time, Thread B tries to acquire the mutex associated with Thread A first, and then its own mutex (A's mutex -> B's mutex).
A thread locks a mutex, encounters an error while running, and the thread exits without unlocking the mutex. Other threads are still running and waiting to lock the mutex.
All of the above.

QUESTION 4B (8 pts). Alice, a CS300 TA, is looking at a student's implementation of a synchronized bounded buffer. The implementation of the student's write() method looks like this:

ssize_t bbuffer::write(const char* buf, size_t sz) {
    std::unique_lock guard(this->mutex_);
    assert(!this->write_closed_);
    while (this->blen_ == bcapacity) {
      // wait for space to free up
    }
    size_t pos = 0;
    while (pos < sz && this->blen_ < bcapacity) {
        this->bbuf_[this->blen_] = buf[pos];
        ++this->blen_;
        ++pos;
    }
    if (pos == 0 && sz > 0) {
        return -1;
    } else {
        return pos;
    }
}

However, this doesn't seem quite right. Help Alice find the two bugs in this code, explain to the student why these are problematic, and show how to fix them.

Networking (10 pts)

QUESTION 5A (6 pts). Consider TCP (the network protocol we discussed in the course) compared to another networking protocol, the Unified Datagram Protocol (UDP). UDP has no notion of a "connection" (and thus, no handshake at connection setup) and sends packets through network with only a "best-effort" guarantee that they will arrive at the destination (i.e., the protocol does not check that the packet was actually delivered).

Which of the following statements about the benefits of using TCP compared to UDP to send data across a network are true and which are false? Give a brief justification for each answer.

TCP achieves better performance (higher throughput and lower connection setup latency) than UDP.
TCP is more secure than UDP.
TCP is more reliable than UDP.

QUESTION 5B (4 pts). The folks at Banana Computer are expanding their OS with networking capabilities. New CEO Brad Boardpleaser suggests that pipes and network connections are very similar, and that the kernel dev team can save some engineering time by supporting both with similar APIs. In particular, he requests that the team add a net_pipe() syscall with signature int net_pipe(int pipefds[2]), which will allow processes to communicate via read and write ends of a network connection.

Describe two reasons why Brad's suggested API will not work and/or is a bad idea.

Distributed Systems (20 pts)

QUESTION 6A (5 pts). Which of the following statements about vertical and horizontal scaling are true and which are false?

Vertical scaling is limited by the number of processors that a single computer can have.
Horizontal scaling is simpler (in terms of code/implementation) than vertial scaling.
Vertical scaling may not add linear performance benefits because of contention on mutex locks.
Requests in a horizontally-scaled system achieve lower latency than in a vertically-scaled system (assuming neither setup is overloaded).
Horizontal scaling increases flexiblity because it allows adding and removing machines at a finer granularity.

QUESTION 6B (2 pts). Consider this statement: "Sharding a key-value dataset across several machines reduces the likelihood of data loss."

Is the statement true or false? Explain your answer.

QUESTION 6C (2 pts). Consider this statement: "Storing a dataset and its replica on the same machine, but in different databases, is recommended because it allows for fast restoration from backup in case of a failure."

Is the statement true or false? Explain your answer.

QUESTION 6D (8 pts). Suppose you have a distributed key-value store with N servers. 2/3 of these servers are new and functional. However, 1/3 of the servers are old and their disks are worn. As such, they may sometimes read and return the wrong value for any given request! How might you design a key-value store that utilizes all servers (functional and faulty), while still maintaining confidence that you are returning the correct values to your users?

[Hint: The solution will involve both explaining how you store data and how your distributed system handles requests.]

QUESTION 6E (3 pts). Consider the following transactions on a key value store, executed in order 1 -> 2 -> 3 with the initial values a = 250, b = 100, c = 50. (Assume that all locks are released at the end of the transaction.)

BEGIN TX {
	LOCK(a);
	LOCK(c);
	v = GET(a);
	if (v >= 200) {
		x = GET(c);
		SET(a = v - 200);
		SET(c = x + 200);
	} else {
		abort;
	}
} END TX;

BEGIN TX {
	LOCK(a);
	LOCK(b);
	v = GET(a);
	if (v > 100) {
		x = GET(b);
		SET(a = v - 100);
		SET(b = x + 100);
	} else {
		ABORT;
	}
} END TX;

BEGIN TX {
	LOCK(b);
	v = GET(b);
	if (v < 150) {
		DELETE(b);
	} else {
		ABORT;
	}
} END TX;

For each of the following, specify if the statement is true or false:

Transaction 1 aborts.
Transaction 2 aborts.
Transaction 3 aborts.

CS 300 Feedback (8 pts)

QUESTION 7A (3 pts). What is one aspect of CS300 that we should definitely retain for future years? Any answer but no answer will receive full credit.

QUESTION 7B (3 pts). What is one aspect of CS300 (other than firing the instructor and dropping the exams) that we can improve for future years? Any answer but no answer will receive full credit.

QUESTION 7C (2 pts). Which CS300 projects were your most and least favorite ones? Any answer but no answer will receive full credit.

Handing in!

You're almost done!

To hand in your exam, please push to your cs300-s21-quizzes-YOURGITHUBUSERNAME repository.

Important: You're welcome to push intermediate commits as you work on the quiz.

However, when you've completed the quiz, make sure to push a commit to your repository titled "Completed Final Quiz". This will indicate to us that your handin is ready to be graded. Once you push this commit, you can no longer edit your answers.

Finally, head to the grading server and add your quiz repository under in the "Final Quiz" section. Make sure to check that your answers show up in the diff displayed on the grading server.

Congrats on finishing your CS 300 Final, and thanks for taking CS300! 🥳