CSCI-2390: Privacy-Conscious Computer Systems

Assignment 4: Cryptography Crash Course

In this assignment, you will be exposed to three cryptographic techniques that are related to the papers we will read in the cryptographic systems course unit.

The purpose of this assignment is to get you a little familiar with what these techniques and protocols look like at the API level, what they could be used for, and some of the practical challenges developers face when trying to use them.

Overall Instructions

Overview of what you will do:

Implement a secure multiparty computation (MPC) simple voting application using the JIFF MPC framework.
Implement a private set intersection (PSI) protocol based on DDH and elliptic curve oblivious pseudorandom functions (OPRF).
Use private information retrieval (PIR) to privately query a database using the Checklist protocol.

We provide a lot of stub code to make your tasks easier. Find and clone the stencil code in this Github repo. Each directory has a README with some low level instructions about the code and how to run it, and some documentation references. Please read the instructions carefuly, as they are designed to make the assignment easier for you.

Submission instructions: Your submission will consist of a written report that has answers for the specific questions below, and a zip file containing all the implementation and code that you filled into the stencil. In your report, please write a few sentences or a short paragraph for every question, highlighting your approach and your findings. Do not just provide your answer in one word! Please show us your rational/steps. Submit your report and zip file to cs2390tas@lists.brown.edu by 11pm (Eastern time) on Friday, November 22, 2024

MPC Voting Application

Secure multiparty computation (MPC) allows multiple parties to perform a computation on their private sensitive inputs. The inputs remain unknown to all other parties, and only the final output of the computation is revealed at the end.

In the real-world, each party has its own device that it uses to participate in the computation, e.g. a laptop or phone. In this assignment, each party will be a separate process that you run in a separate terminal. These processes communicate together over sockets, as if they are separate devices. Additionally, our implementation uses a server to orchestrate communication between these parties, e.g. so that parties do not need to manage the identity and IPs of all other parties, or have to deal with NAT, or other networking challenges.

We will look into shamir secret sharing based MPC. In this variant, each party splits each of its secret inputs into a collection of secret shares, one per party. For example, if there are a total of 3 parties, each input will be split into 3 shares, each sent to a different party. When combined together these shares reconstruct the original input. In JIFF, the MPC framework we will use, this operation is called open(...). JIFF provides APIs to also operate over these shares. For example, it allows adding the shares together to compute a new sharing of the sum of the secret inputs.

Here's a simple example of summing one secret input per party, and opening the result, in JIFF.

Your task is to implement a secret ballot voting MPC application. The scenario is as follows: you and two other friends are trying to choose a restaurant to go to for dinner. You do not want anyone to know what you voted for, only the total votes each option received. At a high level, if a party voted for one restaurant, say "DuneBrothers", it should secret share a value of 1 for that option, and 0 for all other options. The MPC code should then sum the vote values per options, which would correspond to the count of 1s that option received from every participant. More implementation details and steps are available in the stencil.

After implementing and testing your code. Please answer the following questions in your written report.

Question 1: How many inputs does it each party provide into the MPC computation, i.e. how many 0s and 1s? How many inputs are there total over all three parties? How many input secret shares does your implementation create over all the parties?

Question 2: Choose one vote option, say "DuneBrothers", print all the share of that vote that each party received from party 1. Run the computation multiple times, look at the printed shares. What can you tell us about them? Is it obvious to you whether you can use less than three of them to reconstruct the secret vote?

To print the value of a secret share, you can use this code:

// share contains a secret share.
console.log('value of share', await share.value);

Question 3: Your implementation is secret ballot: no one knows what anyone else voted for. Are there other properties that such a voting system might need to have? Think about a user cheating, e.g. by voting many times to skew the results. Does your implementation satisfy the properties you identify? Why/Why not? Suggest some ideas for how the implementation can be modified to satisfy them.

Bonus Question: Modify your implmenetation to perform sanity checks over the vote's secret shares under MPC. Specifically, check that each secret vote value is either 0 or 1. Is your program noticably slower with these checks? Experiment modifying the modulos of the computation (the Zp parameter to JIFF). Does it have any effect on the computation time?

Private Set Intersection (PSI)

In this problem you will look at the following scenario. A server has a list of passwords it knows have been leaked or are otherwise vulnerable. This list is private to the server, and might be large. A client has one potential password it's considering using for some sensitive account. The client would like to know if the password is one of the vulnerable ones, without telling the server what the password is, and without the server revealing all leaked passwords to the client.

You will implement a private set intersection that allows the client and server to perform this check privately. A description of this protocol and instructions for how and where to implement its steps are in the stencil repo. Please follow them and then test your code. It should correctly identify the bad passwords in the provided list, but not any other passwords.

Question 4: How much computation work does the client and server perform? Specifically, how many elliptic curve operations, i.e. hashing and scalarMult, does each perform? How many points are exchanged over the network? Can any of this happen ahead of time, e.g. before the client connects to the server or issues a query?

Question 5: Hypothetically, if the client wanted to check several passwords. It can execute the protocol several times, once per password. This seems inefficient. Can you come up with some small modification to allow the clients to check all these passwords in one batch? How much computation and communication is needed for your modified protocol vs running the original protocol multiple times?
Hint: Start by assuming the client has exactly two passwords.
Hint 2: Start with modifying the protocol to reveal whether any of the two clients passwords are bad.
Hint 3: Now, try to make the protocol reveal *which* of the passwords are bad, if any.
Hint 4: Generalize to more than two passwords!

Private Information Retrieval (PIR)

PIR allows a client to query a database held at one or more servers. The query is private and unknown to the server, but the client still receives the output.

The most simple PIR variants assume that (1) the database is public / it's ok to reveal more information about the database than just the exact query answer, (2) the database is contigious vector, with indices between [0,n), and the client queries are secret indices in that vector. Furthermore, the protocol we will be using, Checklist, has a couple of extra setup details: (1) there are two non-colluding servers that both have a copy of the database, (2) the client and servers can perform a preprocessing step before the client issues queries, in which the client receives hints from the servers it can then use to make its querying more efficient. Checklist's offline preprocessing steps requires the server to do O(n) computation, and send O(sqrt(n)) communication to the client, where n is the size of the database. Then, each query requires O(sqrt(n)) computation on the server, and O(log(n)) communication between server and client.

Question 6: Run the existing stencil code and familiarize yourself with it. Does the client receive the correct output for its query? Look at the server file, the database is a hardcoded map from string keys to string values! Remember that I told you that basic PIR assumes the database have integer contigious indices in [0,n). What does the provided code do to square this circle? How does it transform the string keys to contigious numbers in [0,4)?

Question 7: What is one down side of the provided solution? Think of how much storage is required on the client side to allow it to transform the keys to integers. Come up with a better approach that uses the information you know about the keys! Describe your idea in writing, and then implement your idea and test if it works.
Hint: Would hashing work? How large is the domain if using hashing?
Hint 2: Could you do something better than hashing? the keys are made up from 2 character lower case english letters. Can you use some alphabetical ordering or ascii encoding instead?

Question 8: There are many variants of PIR that allow for more sophisticated functionality. In fact, one of these variants supports our use case out of the box as it allows querying databases with string (sparse) keywords instead of contigious (dense) integer indices. Search the web to find the name of this variant! Tell us what you find.

This is the end of the assignment! 🥳 Now, package up your answers and code and submit your work.