« Back to the main CS 300 website

Project 5B: Privacy-Compliant KVStore

Due Thursday, May 11th at 6:00pm (EST)

You can use at most 56 late hours on this project!


Introduction

When implementing a real-world storage system, it is essential to understand the legal frameworks that govern data privacy. In this assignment, you will implement features pertaining to different privacy regulations that exist in Europe and the US, with a particular focus on the European Union’s General Data Protection Regulation (GDPR) and the United States’ California Consumer Privacy Act (CCPA). Both these regulations are designed to ensure that users have agency over their data, including the right to both access and delete their information online.

Task: Please read the specific requirements of the GDPR and CCPA at the following links.

In a nutshell… 🥜

Right to Access

CCPA Right to Know – Users can request access to personal data from companies, as well as: the categories of their data, the categories of sources, the purpose of collecting that data, and information about what categories of data are being transferred to what types of third parties.

GDPR Right of Access (Article 15) – Users can receive a copy of their own data upon request, as well as the source of any data that was not collected from the subject. When relevant, they can also request: the reason their data has been processed, the categories of data being processed, who can access this data, and the duration for which their data will be stored. They must also be informed when their data is transferred to a third party.

What’s the difference?

Not much – for our application they share broadly similar data access rights!

Right to Delete:

CCPA Right to Delete – Users can request the deletion of all personal data at any time. Companies are not required to comply if the data is essential for: fulfilling the purpose for which it was collected, ensuring security and integrity, identifying and repairing bugs in functionality, avoiding infringing upon the rights of another consumer (e.g. to free speech), engaging in scientific, historical, or statistical research, “enabling solely internal uses that are reasonably aligned with the expectations of the consumer,” or complying with legal requirements.

GDPR Right to be Forgotten (Article 17) – Users can request the deletion of the personal data under six specific circumstances, including when it is no longer needed for the purpose it was collected, or the user has revoked consent. Companies are not required to comply if the data is essential for: avoiding infringing upon freedom of expression, complying with legal requirements, engaging in scientific or historical research, or reasons in the interest of public health.

What’s the difference?

The primary difference is that “the GDPR right only applies if the request meets one of six specific conditions while the CCPA right is broad. However, the CCPA also allows business to refuse the request on much broader grounds than the GDPR” (Practical Law).

Learning Objectives

This assignment will help you:

Assignment Installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s23-projects.git

Then run:

$ git pull
$ git pull handout main

This will merge our Project 5B stencil code with your repository.

Infrastructure Help

Tweeter

Context

In this part of the assignment, you’ll be working with a specialized and more complicated key-value store for a new social media platform, Tweeter. On Tweeter, users can choose their usernames, write posts that appear on their profiles, and respond to other users’ posts (which appear on both users’ profiles). Tweeters has users who are in the EU, so Tweeter must still comply with the GDPR’s right to access and the right to be forgotten.

In Tweeter’s database, there are five kinds of key-value pairs.

Key Value Pair Structure Example Return Value
user_id → username “user_14” → “malte”
post_id → post content “post_59” → “Hello, Tweeter!”
all_users → comma-separated list of user_ids “all_users” → “user_13,user_14,user_160,”
user_id_posts → comma-separated list of post_ids that user has posted “user_14_posts” → “post_59,post_1,”
post_id_replies → comma-separated list of post_ids that respond to post “post_59_replies” → “post_60, post_61”

Even though there can be multiple users with the same usernames, every user_id is unique. The same goes for posts: even though there can be multiple posts with the same text, every post has a unique post_id.

Stakeholders

You’re in charge of making a decision on how to handle a particular user’s request to exercise their right to access and their right to be forgotten.

Task: Check your email! You should have received an email containing information about which stakeholder you have been assigned to complete this portion of the assignment.

You will write a function,GDPRDelete(), that performs the delete request for your assigned stakeholder. There are several ways one might implement privacy-conscious deletion (think back to last week’s section for ideas…). While many different kinds of delete can be acceptable for your stakeholder pair, you cannot opt out of implementing some form of delete and reject their request, although your implementation may not meet all of their hopes and expectations. (In the real world, there are cases where deletion requests have been rejected completely, but for the scope of this assignment, we’ve selected the dataset and the stakeholders such that you should be able to implement some kind of delete, even if it is an unsatisfactory one for your stakeholder.)

The caveat with this particular assignment is that you are also accountable to an ethical auditor who is charged with scrutinizing your company’s handling of GDPR compliance. Therefore, when satisfying your stakeholder’s request for deletion in this assignment, you must also consider other stakeholders that could be affected by this deletion and whose legitimate claims and interests might be infringed upon.

The following introduces each stakeholder pair and explains the context in which the deletion request occurs:

PAIR #1:

PAIR #2:

PAIR #3:

PAIR #4:

PAIR #5:

The GDPRDelete() Function

The signature of GDPRDelete is as follows:

bool GDPRDelete(std::string& user_id);

In other words, the function receives a single argument, which is the user ID of the data subject who is invoking the right to erasure (i.e., your data subject stakeholder). The function can use this argument to look up data related to the data subject in the KVStore, or to find the user’s identifier (e.g., “user_1”) in other data.

Note that the function does not receive information about the context in which the deletion happens (e.g., what other users’ data the data subject might want to delete, or what the other users’ views on this are). Your design could extend the KVStore with auxiliary metadata that captures relevant context (e.g., special key-value pairs that indicate users who are of special interest, such as public figures), and GDPRDelete() may draw on this data. Using such metadata is not a requirement, however.

We will not grade you on the level of sophistication your GDPRDelete function achieves, but rather on whether it works. As long as your written answers justify your choices, you will get full credit, even if the function itself is simple.

Loading your KVStore with data

We provide you with the data stored on Tweeter’s instance of KVStore here. The same data is also in the gdpr/database.txt, and you can load it into your KVStore as follows.

In one terminal, run this command from the build directory to start a KVStore server listening on port 1234 (you can pick any number for the port, it just sets up a rendevous point for your client to connect with the server):

build$ ./server 1234 8

In another terminal, run the client and feed in the data:

build$ ./simple_client 127.0.0.1:1234 < ../gdpr/database.txt

After running this command, typing print store into the first terminal (which runs the server) should show that your KVStore contains our dataset. Note that since the KVStore is an in-memory store, you will need to re-load the dataset every time you restart the server, as it loses its contents on shutdown.

Now, a client can connect to the server and make API requests; for instance, to fetch user_1’s data from the server, start up a new client and make a Get request:

build$ ./simple_client 127.0.0.1:1234
get user_1

Before you write any code…

You might feel overwhelmed with the situation you’ve been presented with — we’ve engineered each situation such that the conflict is intentionally difficult to deal with. Because of this, we want you to consider the following questions before you touch any of the code. You don’t have to submit your answers for this, but we highly recommend writing down some notes for yourself.

  1. Consider why the data subject would want their data (and potentially the others’ data) to be deleted. What kind of morally significant claims would support or reject their request?
  2. Consider why the opposing stakeholder wouldn’t want data to be deleted. What kind of morally significant claims would support or reject their request?
  3. What strategies or mechanisms could you propose to mitigate potential harm to the opposing stakeholder while still respecting the data subject’s right to be forgotten?

Important note: we don’t expect you to find a solution that satisfies everyone — rather, the point is to make the most reasonable tradeoffs between the opposing parties’ claims.

Implementation Tasks

Task 1: Implement GDPRDelete() in client/simple_client.cpp.
Task 2: Leave detailed (header and/or inline) comments in your code that explain what kind of delete you are implementing.

GDPRDelete() should operate on any keys (such as user_ID, post_ID, etc) you deem necessary to implement your chosen strategy of deletion.

To test your GDPRDelete() function, you can use the gdprdelete <user> command in the client as follows:

$ ./simple_client 127.0.0.1:1234
gdprdelete user_10

You can then use print store in the server to see how your store contents have changed.

After you’ve finished implementing delete…

The right to know and the right to delete affect all stakeholders of a database: its users, its operators, and those who use the data for technology, studies, and historical records. Now that you have implemented your version of privacy-conscious access and deletion, you should consider the potential pitfalls of your design choices in practice.

Task: Answer the following questions in your README file:

  1. Who was your stakeholder pair? (<1 sentence)
  2. What kind of delete did you implement and why? Explain your decisions as if you were reporting to an ethical auditor who will determine whether your design is justified. Highlight and explain what you think is the most compelling reason that supports the specific kind of deletion you’ve implemented (1-2 short paragraphs)
  3. What are the shortcomings of your implementation? Who are some short term and/or long term stakeholders (beyond the ones we’ve asked you to consider) who could be adversely affected by your decision? (1-2 short paragraphs)
  4. How might your approach to this assignment change if you were asked to consider the interests of other stakeholders not mentioned in the scenario? (1-2 short paragraphs)

How will you be graded?

  1. Your implementation needs to work and do what you indicate it is doing in your comments.
  2. You’re not being graded on whether you picked the “right” answer — there is no one correct version of delete for each stakeholder pair. Rather, the bulk of your grade is determined by how well you explain why you chose your specific implementation: the justifications you offer in support of your implementation should reflect a comprehensive assessment of the context and competing claims and weigh them against each other in a nuanced way.

A good response identifies the legitimate claims that each stakeholder may have, explains why those claims are important, and compares the importance of both claims. It provides concrete reasons for (fully or partially) prioritizing or rejecting individual stakeholders’ claims and how those trade-offs are reflected in the chosen implementation of the right to be forgotten. A good response, importantly, also touches upon the limitations of those choices.

Grading Breakdown

Now head to the grading server, make sure that you have the “KVStore” page configured correctly with your project repository.

Congratulations, you’ve your last and final CS 300 project! :tada: :clap:

Handing In

Please hand in the files and answers for this assignment via Git in your cs300-s23-projects-YOURNAME repository. Put your answers into the README.md file in the kvstore/ subdirectory of your project repository, and also put all other files from this assignment into that directory.

By 6:00 PM on May, 11th, you must have filled in the file README.md in the kvstore directory in your projects repo, and pushed the code for your GDPRDelete() functionality.


Acknowledgements: This project was developed for CS 300 by Eva Schiller, Eva Lau, Colton Rusch, and Malte Schwarzkopf.