« Back to the main CS 300 website

Lab 7: Threads

Due March 30, 2021, at 8:00PM


Introduction

In Lab 6, you implemented a concurrent program using multiple processes. With multiple processes, each process gets its own distinct address space. By default, no data is shared, and in Lab 6, you dealt with this by using pipes to send data between processes. (Another approach is to designate a shared region of memory with the mmap system call.) However, sometimes you want concurrency within a process, with the different parallel executions truly sharing memory.

In this lab, you will gain experience creating concurrent programs using multiple threads. Using multiple threads rather than processes is beneficial as threads of the same process share an address space. The only private region of memory for threads is their stack, so any data contained in the heap or global sections are shared across all threads of the same processes, and their changes are visible to each other.

It is important to note the difference between a process and a thread. A process is the general container responsible for the execution of a program. Threads are the individual entities within a process which actually do the execution. A single process can contain multiple threads all working in its address space. Every process contains at least one thread, the “main” thread, whose execution starts in main!


Assignment

Assignment installation

First, ensure that your repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout 
https://github.com/csci0300/cs300-s21-labs.git

Then run:

$ git pull
$ git pull handout master

This will merge our Lab 7 stencil code with your previous work. If you have any “conflicts” from Lab 6, resolve them before continuing further. Run git push to save your work back to your personal repository.

Assignment Specifications

In this lab, you will again implement a parallel reverse index search; however, this time you will do so by taking advantage of multiple threads rather than multiple processes. You will work with the same basic reverse index program as in Lab 6. The program takes in a search term and produces a list of all occurrences of that term in a corpus of documents.

We provide you with the following files:

Note: this lab does not require you to use synchronization objects or atomics.

Why?

If you implement the lab correctly, each thread works on a different file and shares no state with other threads.

We also give you some input files that are helpful for testing.

Task:
Implement the run_workers function in revindex_threads.cpp to spawn multiple threads to execute find_word. More details of the implementation of run_workers can be found in the stencil comments.

Hints
  • It is important to not start the joining threads until after every thread has been created. If you were to create one thread and then wait to join with it before creating another thread, you will only ever be running one thread at a time. This is not actually concurrent programming!
  • Additionally, it is important that the wordindex pointer used as an argument to find_word is a pointer to a wordindex contained in the files vector. In c++, elements contained in a vector are allocated in the heap. It is necessary that your wordindex objects are contained in the heap so that they are visible across threads.

You may find it helpful to view the Relevant Library Functions portion of this handout when implementing your run_workers function.

Running and Testing:

Compile:

Run:

The books directory is provided for you, but feel free to create your own directories containing fewer or shorter files for preliminary testing purposes.

Test:

We have provided you with the expected output for some search terms over the books directory (all files titled WORD.txt), so that you can compare your results. You can also compare your implementation with the sequential implementation via make check. The results should be the same.

$ make check
./revindex_threads books < input.txt > output_threads.txt
./revindex_sequential books < input.txt > output_sequential.txt
diff output_parallel.txt output_sequential.txt
... You should receive no output/differences ...
$ time ./revindex_parallel books < input.txt
$ time ./revindex_sequential books < input.txt
...  the sequential implementation should be slower than the parallel ...

You can also compare the performance of your multithreaded reverse index search with the multi-process implementation from Lab 6. Which one do you expect to be faster?

Relevant Library Functions

Below you will find a list of functions you may find helpful in completing this lab. Feel free to also reference the man pages and C++ references for more details.

std::thread
  • std::thread(func, args...)
    • The std::thread constructor spawns a new thread that beings running the function, func, with the specificed arguments
    • Arguments:
      • A function to be executed by the new thread.
      • Any number of arguments as needed by the specified function
    • Return value: a thread handle that you can use to join the thread (see join()).
    • Below is an example of standard std::thread usage:
      void some_func(int x, int y){}
      
      void create_threads(){
          // create and run the thread
          std::thread t = thread(some_func, 1, 2);
          
          // wait for it to complete
          t.join();
      }
      
std::thread::join
  • join()
    • The std::thread join function waits for the execution of a thread to complete. Calling t.join() will halt the calling thread until thread t, has completed. (join will return immediately if the thread has already completed.)

Debugging Resources/ GDB

Switching between threads:

When your program is stopped in GDB, you can gain information on the current state of your threads running the command info threads (you can stop your program in GDB either through a breakpoint or by typing Ctrl-Z). The info threads command will provide you with a numbered list of every currently running thread. From there, you can use the command thread [number] (where [number] is the number listed by the thread) to inspect the current state of that thread and follow its execution.

Executing commands across multiple threads:

In GDB, you can use the command thread apply all <command> to apply a given command to all threads. For example, you can use thread apply all bt to get a the stack trace of every running thread.


Handin instructions

Turn in your code by pushing your git repository to github.com/csci300/cs300-s20-labs-YOURUSERNAME.git.

Then, head to the grading server. On the “Labs” page, use the “Lab 7 checkoff” button to check off your lab.