Project 3: Caching I/O

Assigned: February 26, 2021
Project Due: March 5, 2021

Introduction

Caching, or the act of putting a small/fast/expensive data store in front of a large/slow/cheap one is at the heart of many optimizations in computing. Examples include:

The storage hierarchy (Hard Drive -> SSD -> DRAM -> L3-L1 Cache -> Registers), as seen in lectures.
Memoization (i.e., saving results of expensive computations for future reuse).
Your web browser’s cache, which prevents redundant network requests when you access pages you’ve recently visited.
Content Delivery Networks, a technology that prevents long-distance Internet traversals by instead keeping copies of web pages close to users (this is the idea behind Akamai and Cloudflare, two billion-dollar companies).

In this assignment, you will be speeding up a performance-critical aspect of systems programming: the reading and writing of data to and from a filesystem.

Motivation

Due to the design of Hard Disk Drives (HDD, “disks”), interacting with data on disk involves waiting on several slow physical mechanisms. For example, magnetic hard disks require requests to wait for platters to spin and the metal arms to move to the right place where it can read the data off the disk. If our programs were required to read and write their data solely from disk, they would be unacceptably slow. (We saw an example of this in the form of the disk-slow program in lectures.)

Fortunately, we can do better using caching! If we are willing to sacrifice a small amount of data integrity (i.e., in the event that your computer suddenly loses power, some data is lost), we can gain 100-1000x in performance. To do this, the computer temporarily holds the data you are using inside of its main memory instead of working directly with the disk. In other words, the main memory (DRAM) acts as a cache for the disk.

Project Description (10,000ft)

You will be implementing an Input/Output (I/O) library that supports operations like read() (reading data from files), write() (writing data to files), or seek() (moving to a different offset within the file). Your I/O library uses a cache to prefetch data and reduce the number of disk operations required.

Your approach to implementing this project should be:

Read the entire handout as some critical information is located in later sections.
Fill out the conceptual questions (four technical ones, two RCS questions).
Fill out the functions in impl/student.c to implement buffered file IO that conforms to the API specified in io300.h. In addition, you must ensure that make check reports that your implementation is correct, and make perf reports that your implementation is at most 5-10x slower than stdio (the standard library’s caching, “buffered” I/O functions) in all cases (see grading rubric).

Conceptual Questions

Write your answers to the following questions in your README.

You may want to read the handout in its entirety before tackling the general questions, as many of them relate to topics detailed in the handout.

General Questions:

Come up with a real world analogy to explain caching to someone who does not know what a computer is. For example:

Imagine you work at the front desk of a library and whenever someone wants a book, you go back to the stacks to fetch it for them. Normally, when they are done with the book, you promptly return it to its place in the stacks. However, instead of immediately returning the book, consider holding on to the most commonly requested books when they are returned to the front desk, thus saving yourself from making a trip to the stacks. This is caching.

Now, give an instance of caching in computer systems. What data is being cached, where is it being cached, why is it being cached, and for low long is the cache valid? (Note: you cannot answer “file systems” or describe the project as an answer.)
What are the benefits of having a standard File IO API provided by the operating system? How does this help programmers who might not know what hardware their programs will be running on?
Give an example of a situation where a Hard Disk Drive! (wait, isn’t that slow?) could be used as a cache. (Hint: I don’t think tape means what you think it means)
Read about the catchphrase “Everything is a file”. Why might this complicate caching I/O?

Socially Responsible Computing Questions:

Read the linked article on cache poisoning, and describe what cache poisoning is in your own words (3 sentences or less).

Please consult this article about a situation where a misconfigured Domain Name System (DNS) server, which is responsible for mapping Internet domains (like cs.brown.edu) to the servers holding the content for them, caused users across the world to be directed to incorrect servers for addresses like facebook.com.

How could caching have amplified the problem of one misconfigured DNS server? (1 sentence).
Identify and explain two social, political, or ethical issues raised by this event (4-6 sentences).
Give an example of a hypothetical situation in which such an issue negatively affects people’s lives (100-150 words).
Please fill out this form to provide feedback on our Socially Responsible Computing content so far, and to get a free point on the project (the form will ask your your grading server anonymous ID so that we can match forms to submissions)!

Project Details

Installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s21-projects.git

Then run:

$ git pull
$ git pull handout master

This will merge our Project 3(fileio) stencil code with your repository.

Once you have a local working copy of the repository that is up to date with our stencils, you are good to proceed. You’ll be doing all your work for this project inside the fileio directory in the working copy of your projects repository.

[Infrastructure Help]

Layout

This project, unlike DMalloc, is in C. This means you cannot use C++ language features and data structures in your solution.

In this project, we provide you with a number of existing test programs that do basic file manipulation (see test_programs/*.c). These programs are written in such a way that all interaction with the filesystem is done through the io300_file interface (a series of functions whose signatures are declared in io300.h). This means that if two libraries separately implement the functions in io300.h (potentially quite differently), the test programs will work with either implementation.

We provided you with two implementations, and you will develop a third.

The naive implementation (impl/naive.c) reads from and writes directly to the disk without any caching. The initial project stencil is identical to this implementation.
The standard I/O implementation (impl/stdio.c) leverages the existing C Standard Library’s buffered I/O, which does some clever caching.

The speed difference between these two solutions, as measured by running the test programs with each implementation, is astounding! Try it out for yourself:

$ make testdata    # generates 10MB test file
$ make IMPL=naive  # compiles naive implementation
$ time ./byte_cat /tmp/testdata /tmp/testout

real    0m18.515s
user    0m6.819s
sys     0m11.694s

$ make clean && make IMPL=stdio
$ time ./byte_cat /tmp/testdata /tmp/testout

real    0m0.140s
user    0m0.112s
sys     0m0.028s

The numbers will differ on your computer, but the general relationship (a ~100x performance difference) will hold. Note that real is the actual time taken (this is called “wall clock time”), while user refers to the part of that time spent in userspace, while sys is the time spent in the kernel.

In the project, your task is to fill in the third implementation (impl/student.c) and make it perform as close to the stdio implementation as possible.

Note: This project deliberately leaves you a lot of design freedom. You should feel free to come up with clever caching designs and benchmark them.

Both simple and involved schemes will work, but cleverer schemes will get closed to the performance of stdio, and may sometimes even beat it! We will award extra credit for impressive performance (see the grading rubric).

You’re welcome (and encouraged) to work in groups to come up with your design. As usual the code you hand in must be your own.

Stencil

We provide you with the following files:

File	Purpose
`io300.h`	A list of all the methods your implementation must supply to be used with the test programs.
`impl/`	Contains the C source for this project. `impl/student.c` is the only file you have to edit.
`test_scripts/`	Test scripts provided by us that working implementations will pass. Run `make check`.
`test_programs/`	Contains the test programs that use your implementation to do IO.
`test_files/`	Some files to run your implementation on while you are developing.
`example/`	Some C programs whose behavior offers insight into some of the technical aspects of this project.

Getting Started

The first thing you should do is look at io300.h to understand the public API that each implementation offers. Next, you should run make -B IMPL=stdio check to run the tests on the implementation we provide (you should also do make -B IMPL=naive check to see how ridiculously slow it is).

Next, take a look at the test programs (test_programs/*.c) to see how your IO library is going to be used.

Task:
Complete the functions in impl/student.c marked with TODO comments such that they implement buffered file IO.

It will make sense to proceed step by step, and we provide some guidance on getting started below.

Your final implementation should:

Conform to the API specified in io300.h.
Pass all tests run in make check reports that your implementation is correct
Perform no more than 10x slower than stdio in all cases tested in make perf.

We highly recommend that you thoroughly read and engage with the following sections when working on developing your caching I/O strategy.

Note: you cannot use any C Standard Library IO functions while implementing your library (this would defeat the purpose of the assignment!). These include functions like fread, fwrite, fgetc, fputc, fseek

Getting Started (cont.)

The first thing you’ll want to do is to work out why the stencil code (and the naive implementation) is so slow. To do so, you can use a handy tool called strace (for sycall trace), which prints the system calls (calls from your program into the OS kernel) that happen when your program runs.

System calls are expensive: they require changing the processor’s privilege mode, lots of safety checks, and they invoke kernel code. Moreover, the read or write syscalls often go directly to disk.

Start out by investigating how the naive implementation performs on a 755 byte test file:













$ make -B IMPL=naive
$ strace -e trace=read ./byte_cat test_files/words.rot13.txt /tmp/out
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\v\2\0\0\0\0\0"..., 832) = 832
[...]
read(3, "1", 1)                         = 1
read(3, ".", 1)                         = 1
read(3, "1", 1)                         = 1
read(3, " ", 1)                         = 1
read(3, "V", 1)                         = 1
read(3, "a", 1)                         = 1
read(3, " ", 1)                         = 1
read(3, "g", 1)                         = 1
[...]

The initial lines relate to read() system calls that happen as the OS starts up your executable (which is encoded in a format called “ELF”). You can ignore those lines. At some point, you’ll see a lot of lines like read(3, "V", 1) = 1. A line like this means that the program (or, in this case, the I/O library used) invoked the system call read() with arguments 3 (a number that refers to file handle), "V", and 1, and that the return value was 1 (as indicated by = 1).

Think about what this output means. Can you see a reason why this is inefficient?

Give me more hints!

Let’s consider a thought experiment, this time with the write() system call.

Imagine that I am writing a program that will sequentially write 1000 bytes (all of value 'c') to a file, one byte at a time.

One way to do it would be to call write() 1000 times, once per byte.



for (int i = 0; i < 1000; i++) {
  write(fd, 'c', 1);
}

Another option would be to create a local variable char buff[40] and put bytes into that. Once we have written 40 bytes to buff, we issue one single call to write(fd, buff, 40).






for (int i = 0; i < 1000; i++) {
  buff[i % 40] = 'c';
  // MORE LOGIC:
  // 1. check if buffer is full
  // 2. if so, call write(fd, buff, 40)
}

The critical thing to remember is that a call to write(fd, 'c', 1) and write(fd, buff, 40) will take nearly the exact same amount of time because the cost of executing a system call is much larger than the difference between moving 1 byte and moving 40 bytes.

So with the second solution, we can reduce the number of syscalls by 40x.

In this project, you will be generalizing the second solution to deal with things like writing bytes backwards (decreasing indices) and writing chunks of bytes all at once.

Things to Think About

The core idea behind this project is the following:

When a program asks the I/O library to write a byte to a file, instead of immediately going to the disk, put that byte in a cache, then write the entire cache to disk at a later point.

With this in mind, here are some questions to ask yourself before/while implementing:

When should I call read() and fill my buffer?
What happens to the data in the cache when a program reads or write the last byte in the cache? What about one byte past this?
How do I know or keep track of the fact that the cache has been modified?
What happens when flush is called, but nothing in the cache has been changed?
What happens if a syscall fails?
What happens when I read a byte from a file that is all 1s (0xFF, 0b11111111)? How is this different than the integer -1? How does this mess with return values from things like read() and write()? What is the deal with unsigned vs signed char? (some answers to these can be found in example/io_return_example.c).
What happens if we seek to a location that is within the cache? How should this differ from seeking to a location outside of the cache?
Only create one cache. Do not call malloc anywhere, except in the place that was provided to you.

Making Progress

Start by building a simple, single-slot cache. One of your early goals should be to get the byte_cat program working for a small text file.

$ touch /tmp/out.txt   # creates empty file
$ make -B
$ ./byte_cat test_files/tiny.txt /tmp/out.txt
$ cat /tmp/out.txt 
this is a test

The byte_cat program performs no seeks and is the most simple test case you can create. Here is a list of the functions that byte_cat uses: open, close, filesize, readc, writec. We provide all of filesize, and open/close are simple, so you just have to get your reading and writing logic down!

If you want to break things down further, you can implement the read side, keeping the naive write logic in place, test if your implementation works, and then continue with the write side.

Finishing Up

Once you have a working implementation, test it with make check (correctness) and make perf (performance). You probably won’t match stdio on all performance benchmarks yet, but you will meet the performance bar for this project if you achieve the following:

Your implementation is within 10x of stdio on the byte_cat benchmarks (byte_cat and reverse_byte_cat).
Your implementation is within 5x of stdio on the block_cat tests (block_cat, reverse_block_cat, and random_block_cat).

If you get closer to stdio (or even beat it on some tests), we will award extra credit.

Testing

make check tests your implementation for correctness. It does this by running all of the test programs with a variety of inputs on random files.

make perf tests your implementation for speed and compares it to stdio.

Debugging

You may find the following tools (in addition to GDB and sanitizers) helpful in debugging this project.

Hexdump

xxd <file> allows you to view the contents of binary files. This will be helpful when you have a buggy implementation and you want to see what is going wrong.

$ echo 'this is ascii' > out.bytes
$ xxd out.bytes
00000000: 7468 6973 2069 7320 6173 6369 690a       this is ascii.
$ dd if=/dev/urandom of=out.bytes bs=32 count=1
1+0 records in
1+0 records out
32 bytes copied, 0.000336055 s, 95.2 kB/s
$ xxd out.bytes 
00000000: 65b7 6c53 69f3 f1ed e6d2 09eb ec66 9403  e.lSi........f..
00000010: f33c e929 d703 314f e7dd 5e6b 56a0 2d28  .<.)..1O..^kV.-(

Diff

diff <file1> <file2> will tell you if the input files differ. This may again be helpful when you have a buggy implementation and you want to figure out where in the output file you’re differing from the expected content.

strace

strace, as mentioned above, is a tool that provides diagnostic information about the system calls a program is making. This may be especially helpful when you are trying to improve the performance of your implementation. A complete description of strace and its usage is in Appendix V.

dbg()

Check out the function:


static void dbg(struct io300_file *f, char *fmt, ...)

in impl/student.c. Use it to debug while you are working, and then you can silence its output with one keystroke when you want to hand in. It acts just like printf(), but it also logs your file’s metadata so you can see what is happening as your program executes. When you add fields to your file structure, be sure to include them in the format string after the TODO in dbg(). Here is an example of using it.



int io300_writec(struct io300_file *f, int ch) {
    dbg(f, "writing char: %c\n", ch);
    ...

Extra Credit

To optimize your implementation and achieve more impressive performance, you can consider some of the following extensions:

Detecting the direction of access in a file (i.e., forward or reverse) and adapting your caching strategy.
Detecting strides, that is, access patterns that jump by fixed amounts within a file: e.g., reading byte 1, then byte 1001, then byte 2001, and prefetching appropriately.
If you feel ambitious, research “memory-mapped I/O” and how it works, and use it when possible.

For graduate students taking CSCI 1310, please investigate one performance optimization over your basic implementation, and measure how well it works. Describe the optimization you implemented in your README file, and explain what your experiments showed.

Handing In & Grading

Handin instructions

As before, you will hand in your code using Git. In the fileio/ subdirectory of your project repository, you MUST fill in the text file called README.md.

Remind me again what the README.md should contain?

The README.md file will include the following:

Any design decisions you made and comments for graders, under "Design Overview". If there's nothing interesting to say, just list "None".
Any collaborators and citations for help that you received, under "Collaborators". CS 300 encourages collaboration, but we expect you to acknowledge help you received, as is standard academic practice.
Your answers to the conceptual questions at the start of this handout under "Conceptual Questions".
Notes on approximately how long it took you to complete the project. We won't use this for grading, but we collect the data to calibrate for future years.

Grading breakdown

30% (30 points) for passing the correctness tests. If your tests pass on the grading server with and without sanitizers, you’ve probably got all these points. Each test is worth 2 points, and you get 6 points for passing them all.
45% (45 points) for meeting the performance goals: your byte_cat runs must be within 10x of the stdio runtime, and your other runs (block_cat, etc.) must be within 5x of the stdio runtime. Each benchmark is worth 7 points.
12% (12 points) for answers to conceptual questions.
8% (8 points) for answers to the RC question.
Up to 10 points of extra credit for impressive performance optimizations (e.g., matching or beating stdio).

Graduate students taking CSCI 1310 should implement and measure one performance optimization, and describe it in their README. Make sure to cover, in a few sentences, what the optimization is, why you expected it to improve performance, and whether it did or did not improve performance in your experiments.

Now head to the grading server, make sure that you have the “FileIO” page configured correctly with your project repository, and check that your FileIO tests pass on the grading server as expected.

Congratulations, you’ve completed the third CS 300 project!

Appendix I: Definitions of Terms

Here is a list of some words with non obvious definitions.

API (Application Programming Interface) – This is a formal contract that a piece of software uses to declare to the outside world how it can be used. An example of this would be the List<E> interface in Java which states that anything claiming to be a List can be asked to add(), remove(), insert(), …, and other “list-like” things.
System Call (syscall) – A function provided by your operating system to accomplish important things like reading/writing to files. These are considered “slow” (relatively) because the kernel has to temporarily “stop” running your code, “start” running its own code (the syscall), and then “resume” your code.
Seek – To move the read/write head of a file to a new position. For example, you can “seek” from the beginning of a file to the end of a file to start adding new data to the end. See Appendix I.
Buffer – Often you will see variables with names like buf, buff, or buffer (additionally, buflen, BUFFER_SIZE, ...). These are common names for chunks of memory that you are using as an intermediate storage location before doing something else with that data.

Appendix II: Unix Files

Here are some statements about files that should be true for most Unices.

A file is an ordered sequence of bytes - not text, not ASCII characters, not UTF-8 characters, but just bytes. The way we interpret the bytes present in a file leads to a file’s colloquial “type” (like a text file, or a CSV, or a compiled binary). To this end, file extensions (.txt, .csv, .html, .c, .exe) are largely irrelevant. They merely serve as hints as to how their bytes should be interpreted. In fact, there is nothing special about the . character in a filename. So rather than thinking about a file called brandon.jpeg as an object called brandon of type .jpeg, it should be considered to be a sequence of bytes whose name is brandon.jpeg (additionally, you may take the .jpeg suffix to informally mean “this probably contains data in the JPEG format and is suitable to be opened by photo viewers”).
```
$ gcc helloworld.c -o helloworld
$ ./helloworld
hello world
$ gcc helloworld.c -o helloworld.cool.program.300
$ ./helloworld.cool.program.300 
hello world
$ head -c 10 helloworld.cool.program.300 | xxd
00000000: cffa edfe 0700 0001 0300     ..........
```
It’s just bytes!
The way we interact with files is through file descriptors. A file descriptor is an integer that the operating system uses to identify a file. Any time you want to do something to that file, you pass the operating system the file descriptor as a reference. The way we get file descriptors is with the open syscall.
```
int fd = open("myfile.txt", O_RDONLY);
```
From now on, if we want to do something with myfile.txt, we will pass the operating system fd so that it knows which file to manipulate on our behalf.
Once a file has been opened, we can start manipulating it. When a file is open, the operating system keeps track of an index into that file that represents the “read/write head” of that file. This is like using your finger to scan across the lines while you read. Wherever the “head” points is the next byte to be read, and if we want to write a byte, the location where the next byte will be written.
Once we have a file descriptor (which identifies a file), there are three basic operations we can do to modify the bytes within that file.
4.1 read(fd, buffer, size) will read size bytes from the file identified by fd into the location in memory pointed to by buffer. read returns the number of bytes that were successfully read from the file, and then increments the file’s read/write head by that number of bytes.
4.2 write(fd, buffer, size) will write size bytes starting at memory location buffer into the file identified by fd. write returns the number of bytes that were successfully written to the file, and then increments the file’s read/write head by that number of bytes.
4.3 lseek(fd, index, SEEK_SET) will change the the value of the read/write head of a file. This allows us to “skip” around in a file and read/write from arbitrary locations (we don’t always want to read/write sequentially from the start of a file).

Appendix III: Helpful Commands

man displays the authoritative return values for all functions.

man 2 open
man 2 close
man 2 read
man 2 write
man 2 lseek

dd if=/dev/urandom ... is used in our test scripts to transfer random binary data into a file.
wc -c <file> displays the number of bytes in a file.
du -h <file> also displays the size of a file and is better for large files (it gives human readable byte numbers).
make -B is used in the handout and test scripts. The -B flag says “force a remake, even if everything is up to date”.
head -c 40 | xxd or tail -c 40 | xxd will give you a peek at the beginning or end of a file. This is useful for inspecting large files where xxd would normally flood your screen.

Appendix IV: Implementation Swapping

This section explains how our test programs can use different implementations that match the io300.h API.

Consider the following to be our test program:












#include <stdio.h>

// extern means:
//  "I'm not defining this function here, but someone
//   else (another source file) will have to define it
//   before the program is executed"
// 
extern int getRandomNumber();

int main() {
    printf("%d\n", getRandomNumber());
}

Now we want an implementation of the getRandomNumber function that we can link with the test program to produce a fully functional number printer.




// real-rand.c -- delegate to C standard library (man 3 rand)
#include <stdlib.h>

int getRandomNumber() { return rand(); }



// my-rand.c -- our own implementation (very fast)

int getRandomNumber() { return 4; }

Now when we compile the main test program, we have a choice. Since both implementations conform to the same public API (namely, they provide one function getRandomNumber :: void -> int), either can be swapped in and the main program won’t know the difference.




$ gcc test-prog.c real-rand.c && ./a.out
16807
$ gcc test-prog.c my-rand.c && ./a.out
4

This is the principle behind the different implementations for this project, but instead of the public API being a single function, it is a family of IO related functions defined in io300.h

At a high level, to test your implementation, we will be doing something like this:



$ gcc our-test-program.c impl/stdio.c && time ./a.out
$ gcc our-test-program.c impl/naive.c && time ./a.out
$ gcc our-test-program.c impl/student.c && time ./a.out

to compare the speed of your implementation against the provided ones.

Appendix V: `strace`

strace (on Linux) allows you to view the system calls that a program makes. Here is an example of running strace on a plain hello world C program (fed in through stdin). Pay attention to the final two lines of the output, namely the call to write() and exit(). Everything else you see is just the shell starting the program and can be ignored.

$ printf 'int main(){printf("hello world\\n");}' \
>           | gcc -w -x c - && strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffe85175530 /* 26 vars */) = 0
brk(NULL)                               = 0x563c71960000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=26696, ...}) = 0
mmap(NULL, 26696, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8ee6db4000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\35\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030928, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8ee6db2000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f8ee67a1000
mprotect(0x7f8ee6988000, 2097152, PROT_NONE) = 0
mmap(0x7f8ee6b88000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f8ee6b88000
mmap(0x7f8ee6b8e000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f8ee6b8e000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f8ee6db34c0) = 0
mprotect(0x7f8ee6b88000, 16384, PROT_READ) = 0
mprotect(0x563c6fd55000, 4096, PROT_READ) = 0
mprotect(0x7f8ee6dbb000, 4096, PROT_READ) = 0
munmap(0x7f8ee6db4000, 26696)           = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
brk(NULL)                               = 0x563c71960000
brk(0x563c71981000)                     = 0x563c71981000
write(1, "hello world\n", 12hello world
)           = 12
exit_group(0)                           = ?
+++ exited with 0 +++

We can use strace to see how frequently our programs are making calls to read() and write() as an indirect way to measure performance.

The following is a demonstration of the number of these syscalls that take place when running byte_cat on a 755 byte long file.

Notice how the naive implementation makes around 750 reads and writes (this makes sense because the naive implementation calls read() and write() once per character) while the stdio implementation makes <10% fewer read and write calls.

$ wc -c test_files/words.rot13.txt 
755 test_files/words.rot13.txt
$ make -B IMPL=naive byte_cat
gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/naive.c -o byte_cat
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l
804
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l
758
$ make -B IMPL=stdio byte_cat
gcc -ggdb3 -Wall -Wextra -Wshadow -std=gnu11 -fsanitize=address -fsanitize=undefined -fsanitize=leak test_programs/byte_cat.c impl/stdio.c -o byte_cat
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep read | wc -l
50
$ strace ./byte_cat test_files/words.rot13.txt out.txt 2>&1 | grep write | wc -l
4

Acknowledgements: The FileIO project was developed for CS 300.