Project 5B: Privacy-Compliant KVStore

Due Thursday, May 11th at 6:00pm (EST)

You can use at most 56 late hours on this project!

Introduction

When implementing a real-world storage system, it is essential to understand the legal frameworks that govern data privacy. In this assignment, you will implement features pertaining to different privacy regulations that exist in Europe and the US, with a particular focus on the European Union’s General Data Protection Regulation (GDPR) and the United States’ California Consumer Privacy Act (CCPA). Both these regulations are designed to ensure that users have agency over their data, including the right to both access and delete their information online.

Task: Please read the specific requirements of the GDPR and CCPA at the following links.

In a nutshell… 🥜

Right to Access

CCPA Right to Know – Users can request access to personal data from companies, as well as: the categories of their data, the categories of sources, the purpose of collecting that data, and information about what categories of data are being transferred to what types of third parties.

GDPR Right of Access (Article 15) – Users can receive a copy of their own data upon request, as well as the source of any data that was not collected from the subject. When relevant, they can also request: the reason their data has been processed, the categories of data being processed, who can access this data, and the duration for which their data will be stored. They must also be informed when their data is transferred to a third party.

What’s the difference?

Not much – for our application they share broadly similar data access rights!

Right to Delete:

CCPA Right to Delete – Users can request the deletion of all personal data at any time. Companies are not required to comply if the data is essential for: fulfilling the purpose for which it was collected, ensuring security and integrity, identifying and repairing bugs in functionality, avoiding infringing upon the rights of another consumer (e.g. to free speech), engaging in scientific, historical, or statistical research, “enabling solely internal uses that are reasonably aligned with the expectations of the consumer,” or complying with legal requirements.

GDPR Right to be Forgotten (Article 17) – Users can request the deletion of the personal data under six specific circumstances, including when it is no longer needed for the purpose it was collected, or the user has revoked consent. Companies are not required to comply if the data is essential for: avoiding infringing upon freedom of expression, complying with legal requirements, engaging in scientific or historical research, or reasons in the interest of public health.

What’s the difference?

The primary difference is that “the GDPR right only applies if the request meets one of six specific conditions while the CCPA right is broad. However, the CCPA also allows business to refuse the request on much broader grounds than the GDPR” (Practical Law).

Learning Objectives

This assignment will help you:

Understand the implications of privacy laws for storage systems, and how developers need to implement technical functionality to give users rights over their data.
Reason about trade-offs between stakeholder interests and public goods relating to privacy and data protection.

Assignment Installation

Ensure that your project repository has a handout remote. Type:

$ git remote show handout

If this reports an error, run:

$ git remote add handout https://github.com/csci0300/cs300-s23-projects.git

Then run:

$ git pull
$ git pull handout main

This will merge our Project 5B stencil code with your repository.

Infrastructure Help

Tweeter

Context

In this part of the assignment, you’ll be working with a specialized and more complicated key-value store for a new social media platform, Tweeter. On Tweeter, users can choose their usernames, write posts that appear on their profiles, and respond to other users’ posts (which appear on both users’ profiles). Tweeters has users who are in the EU, so Tweeter must still comply with the GDPR’s right to access and the right to be forgotten.

In Tweeter’s database, there are five kinds of key-value pairs.

Key Value Pair Structure	Example Return Value
`user_id` → username	“user_14” → “malte”
`post_id` → post content	“post_59” → “Hello, Tweeter!”
`all_users` → comma-separated list of `user_ids`	“all_users” → “user_13,user_14,user_160,”
`user_id_posts` → comma-separated list of `post_ids` that user has posted	“user_14_posts” → “post_59,post_1,”
`post_id_replies` → comma-separated list of `post_ids` that respond to post	“post_59_replies” → “post_60, post_61”

Even though there can be multiple users with the same usernames, every user_id is unique. The same goes for posts: even though there can be multiple posts with the same text, every post has a unique post_id.

Stakeholders

You’re in charge of making a decision on how to handle a particular user’s request to exercise their right to access and their right to be forgotten.

Task: Check your email! You should have received an email containing information about which stakeholder you have been assigned to complete this portion of the assignment.

You will write a function,GDPRDelete(), that performs the delete request for your assigned stakeholder. There are several ways one might implement privacy-conscious deletion (think back to last week’s section for ideas…). While many different kinds of delete can be acceptable for your stakeholder pair, you cannot opt out of implementing some form of delete and reject their request, although your implementation may not meet all of their hopes and expectations. (In the real world, there are cases where deletion requests have been rejected completely, but for the scope of this assignment, we’ve selected the dataset and the stakeholders such that you should be able to implement some kind of delete, even if it is an unsatisfactory one for your stakeholder.)

The caveat with this particular assignment is that you are also accountable to an ethical auditor who is charged with scrutinizing your company’s handling of GDPR compliance. Therefore, when satisfying your stakeholder’s request for deletion in this assignment, you must also consider other stakeholders that could be affected by this deletion and whose legitimate claims and interests might be infringed upon.

The following introduces each stakeholder pair and explains the context in which the deletion request occurs:

PAIR #1:

Data Subject: Congressperson Kirby — Congressperson Kirby is currently the elected congressperson for Rhode Island’s first district and is running for reelection in the current election cycle. Recently, some of their tweets from when they were in college have resurfaced. These tweets were written 10 years ago concerning a pandemic that occurred at the time. While the congressperson has issued apologies, this has not been enough to stop the ongoing discourse from users all across the political spectrum. Because of this, the congressperson has made a request to exercise their right to be forgotten. They wish to delete their account, which they hope will delete their original tweets and related public discourse (other users quoting or rewording the original posts) about the controversy.
Opposing stakeholder: Freedom House Advocacy Group — This advocacy group is concerned with government accountability & transparency. They oppose Kirby’s attempt to remove evidence of their controversy from the past.

PAIR #2:

Data Subject: Sarsra Breisand — Sarsra Breisand is an American singer, actress and director. With a career spanning over six decades, she has achieved success in multiple fields of entertainment, and is among the few performers awarded an Emmy, Grammy, Oscar, and Tony (EGOT). Recently, a paparazzi reporter leaked information about where Sarsra Breisand lives. Despite her attempts to hide this information, she draws more attention to this leaked information. As a desperate final attempt to protect her privacy, Sarsra has made a request to exercise her right to be forgotten and delete her account. Sarsra understands that this will erase her social media profile, but hopes that this will put an end to the interest in her whereabouts.
Opposing stakeholder: Beth Abraham — Beth Abraham is a historian that has begun studying what she has named the Breisand Effect. She has released several research papers on this psychological and sociological phenomenon and is in the process of writing another.

PAIR #3:

Data Subject: Angel Yoline — Angel Yoline is an up-and-coming actress who is excitedly promoting her new film on Tweeter. However, a while ago she posted views on Tweeter that characterize working class people in a negative light. Her former partner, Brad Schmidt, has recently higlighted these problematic views as a means of getting revenge on Angel. She now wishes for her own post, Brad’s post, and the public discourse about the controversy to be deleted.
Opposing stakeholder: Film Enthusiasts — Film enthusiasts have been excited about the new film that Angel Yoline is starring in. When an account exposes something that Angel Yoline said about working class people, there is a surge of new posts discussing her controversial opinions.

PAIR #4:

Data Subject: Frank Blimp — Frank Blimp is a recent divorcee who doesn’t have custody of his child from the marriage. Falling into hard times, Frank missed a couple of child support payments, which his ex-wife, Marge, has called him out on Tweeter. Even though both Frank and Marge have private accounts, these posts are visible to Frank’s friends and family. Since this tweet, Frank has been able to repay the missed payments and is financially stable enough to continue making future payments. He’s asked Marge to take down her tweets shaming him, but she has refused. Frank now requests his data to be erased in the hope that this will also make posts mentioning him disappear.
Opposing stakeholder: Marge Blimp — Marge is Frank’s ex-wife who has main custody of their child. After he had missed multiple child support payments, she resorted to using Tweeter to call him out. She wants others to be aware of Frank’s past behavior.

PAIR #5:

Data Subject: Matt Bleat — Matt Bleat is a senior in high school who has just committed to Brown University after being recruited to their track and field team. Recently, there have been posts from Matt’s high school classmates calling attention to former allegations against Matt for sexual harassment. The school determined that there was insufficient evidence to support those allegations and declined to take further action. Matt invokes the right to erasure, expecting that it will remove his posts, but also remove posts that mention hashtags related to the controversy.
Opposing stakeholder: The Center for Changing Our Campus Culture — They believe that posts highlighting the allegations against Matt should remain on Tweeter, and Matt should be prepared to deal with the possibility that Brown could discover his tweet and rescind their offer of admission.

The `GDPRDelete()` Function

The signature of GDPRDelete is as follows:

bool GDPRDelete(std::string& user_id);

In other words, the function receives a single argument, which is the user ID of the data subject who is invoking the right to erasure (i.e., your data subject stakeholder). The function can use this argument to look up data related to the data subject in the KVStore, or to find the user’s identifier (e.g., “user_1”) in other data.

Note that the function does not receive information about the context in which the deletion happens (e.g., what other users’ data the data subject might want to delete, or what the other users’ views on this are). Your design could extend the KVStore with auxiliary metadata that captures relevant context (e.g., special key-value pairs that indicate users who are of special interest, such as public figures), and GDPRDelete() may draw on this data. Using such metadata is not a requirement, however.

We will not grade you on the level of sophistication your GDPRDelete function achieves, but rather on whether it works. As long as your written answers justify your choices, you will get full credit, even if the function itself is simple.

Loading your KVStore with data

We provide you with the data stored on Tweeter’s instance of KVStore here. The same data is also in the gdpr/database.txt, and you can load it into your KVStore as follows.

In one terminal, run this command from the build directory to start a KVStore server listening on port 1234 (you can pick any number for the port, it just sets up a rendevous point for your client to connect with the server):

build$ ./server 1234 8

In another terminal, run the client and feed in the data:

build$ ./simple_client 127.0.0.1:1234 < ../gdpr/database.txt

After running this command, typing print store into the first terminal (which runs the server) should show that your KVStore contains our dataset. Note that since the KVStore is an in-memory store, you will need to re-load the dataset every time you restart the server, as it loses its contents on shutdown.

Now, a client can connect to the server and make API requests; for instance, to fetch user_1’s data from the server, start up a new client and make a Get request:

build$ ./simple_client 127.0.0.1:1234
get user_1

Before you write any code…

You might feel overwhelmed with the situation you’ve been presented with — we’ve engineered each situation such that the conflict is intentionally difficult to deal with. Because of this, we want you to consider the following questions before you touch any of the code. You don’t have to submit your answers for this, but we highly recommend writing down some notes for yourself.

Consider why the data subject would want their data (and potentially the others’ data) to be deleted. What kind of morally significant claims would support or reject their request?
Consider why the opposing stakeholder wouldn’t want data to be deleted. What kind of morally significant claims would support or reject their request?
What strategies or mechanisms could you propose to mitigate potential harm to the opposing stakeholder while still respecting the data subject’s right to be forgotten?

Important note: we don’t expect you to find a solution that satisfies everyone — rather, the point is to make the most reasonable tradeoffs between the opposing parties’ claims.

Implementation Tasks

Task 1: Implement GDPRDelete() in client/simple_client.cpp.
Task 2: Leave detailed (header and/or inline) comments in your code that explain what kind of delete you are implementing.

GDPRDelete() should operate on any keys (such as user_ID, post_ID, etc) you deem necessary to implement your chosen strategy of deletion.

To test your GDPRDelete() function, you can use the gdprdelete <user> command in the client as follows:

$ ./simple_client 127.0.0.1:1234
gdprdelete user_10

You can then use print store in the server to see how your store contents have changed.

After you’ve finished implementing delete…

The right to know and the right to delete affect all stakeholders of a database: its users, its operators, and those who use the data for technology, studies, and historical records. Now that you have implemented your version of privacy-conscious access and deletion, you should consider the potential pitfalls of your design choices in practice.

Task: Answer the following questions in your README file:

Who was your stakeholder pair? (<1 sentence)
What kind of delete did you implement and why? Explain your decisions as if you were reporting to an ethical auditor who will determine whether your design is justified. Highlight and explain what you think is the most compelling reason that supports the specific kind of deletion you’ve implemented (1-2 short paragraphs)
What are the shortcomings of your implementation? Who are some short term and/or long term stakeholders (beyond the ones we’ve asked you to consider) who could be adversely affected by your decision? (1-2 short paragraphs)
How might your approach to this assignment change if you were asked to consider the interests of other stakeholders not mentioned in the scenario? (1-2 short paragraphs)

How will you be graded?

Your implementation needs to work and do what you indicate it is doing in your comments.
You’re not being graded on whether you picked the “right” answer — there is no one correct version of delete for each stakeholder pair. Rather, the bulk of your grade is determined by how well you explain why you chose your specific implementation: the justifications you offer in support of your implementation should reflect a comprehensive assessment of the context and competing claims and weigh them against each other in a nuanced way.

A good response identifies the legitimate claims that each stakeholder may have, explains why those claims are important, and compares the importance of both claims. It provides concrete reasons for (fully or partially) prioritizing or rejecting individual stakeholders’ claims and how those trade-offs are reflected in the chosen implementation of the right to be forgotten. A good response, importantly, also touches upon the limitations of those choices.

Grading Breakdown

25% (10 points) for your implementation of GDPRDelete.
25% (10 points) for comments and explanation of how your deletion function works.
50% (20 points) for answers to the four README questions.

Now head to the grading server, make sure that you have the “KVStore” page configured correctly with your project repository.

Congratulations, you’ve your last and final CS 300 project!

Handing In

Please hand in the files and answers for this assignment via Git in your cs300-s23-projects-YOURNAME repository. Put your answers into the README.md file in the kvstore/ subdirectory of your project repository, and also put all other files from this assignment into that directory.

By 6:00 PM on May, 11th, you must have filled in the file README.md in the kvstore directory in your projects repo, and pushed the code for your GDPRDelete() functionality.

Acknowledgements: This project was developed for CS 300 by Eva Schiller, Eva Lau, Colton Rusch, and Malte Schwarzkopf.

Project 5B: Privacy-Compliant KVStore

Introduction

Right to Access

What’s the difference?

Right to Delete:

What’s the difference?

Learning Objectives

Assignment Installation

Tweeter

Context

Stakeholders

The GDPRDelete() Function

Loading your KVStore with data

Before you write any code…

After you’ve finished implementing delete…

How will you be graded?

Grading Breakdown

Handing In

The `GDPRDelete()` Function