Projects
Final projects should seek to answer a research question through implementation of a new idea in a real system. This could take one of several forms:
- Prototype a new, privacy-centered system design.
- Apply a privacy-enhancing or privacy-preserving technique in an existing system, and measure its impact.
- Conduct a study of privacy risks and deficiencies in existing software, and analyze what it would take to address them.
You may work on projects individually, or in groups of two to three students. Your project deliverables include a proposal, a progress report, a final paper describing design and implementation, your code, and a presentation I will post the final presentation and writeup to the course website (unless you explicitly want it kept confidential for a good reason).
Important dates
- October 2, 2023: submit your project proposal (by 11pm).
- October 24, 2023: project conference, at which you present your progress.
- November 3, 2023: submit a progress report on your project (by 11pm).
- November 30 and December 5, 2023: presentation and demo.
- December 15, 2023: submit your code and final report.
Project proposal
Please use the OSDI 2023 submission template. Your proposal should be a one-page summary of what your idea is, how you plan to go about investigating it, and what techniques you will apply (or need to learn about beyond the course material).
Project ideas
Here's a list of some starter ideas to get you thinking. Please feel free to pursue your own ideas! Click on the project idea to get some more information.
Reproducing results is an important part of research, and an awesome way to really learn how something works! Pick any of the systems we learn about in the course and implement your own version of it. Then try to reproduce their results! You may wish to simplify some aspects of the system to make the reproduction practical within the time available.
K9db offers privacy compliance by construction for database-backed web applications. However, the system has some limitations: it stores duplicate copies of jointly-owned records, it only supports relational databases, and can only delete all of a user's data on request (rather than selected subsets of it). You can make it better!
- Investigate an alternative design to K9db that does not create duplicate copies for records that are jointly owned, e.g., using wrapped encryption keys.
- Define an extended version of K9db’s DOG that specifies ownership for data stored in multiple data stores (e.g., relational databases, key value stores, distributed filesystems and blob stores), and implement mechanisms for account deletion and retrieval to comply with GDPR.
- Extend K9db to support account and data recycling, by setting a time-to-live for pieces of data, and automatic deleting unused data that exceeds that threshold.
This project entails extending K9db's open-source implementation. Familiarity with C++ is very helpful, and some familiarity with the internals of a database is advantageous.
Alomhomora is a current research project in the ETOS group. It provides a Rust framework for writing web applications that helps developers write privacy-compliant code. Developers today lack practical tooling to check whether their code abides by the privacy policies, which leads to undetected violations. Remembering what policy applies to data and performing the appropriate checks adds a significant cognitive overhead to the development process. Developers therefore need small and clear regions of privacy-critical code to focus their attention on and automatic guarantees for the remaining codebase.
In Alohomora, privacy-sensitive data gets wrapped into policy-enhanced container types (à la Resin), and is protected by Rust's type system guarantees unless the application externalizes the data (e.g., sending it over an HTTP RPC), combines it with other data (e.g., aggregating across users), or modifies it. Alohomora uses a combination of static analysis, sandboxing, and human review to increase the likelihood that operations over sensitive data are correct.
As an early-stage project, Alohomora has plenty of potential for extension. Here are some ideas:
- Create privacy policies for an existing web application and enforce it using Alohomora.
- Create a small domain-specific language (DSL) for formally expressing privacy policies, and a mechanism for verifying that a small-to-medium Rust function respects it using automated theorem provers (e.g., Z3) or proof-assistants.
- Extend K9db to track user policies, and retrieve and aggregate these policies when corresponding data is queried from an Alohomora application.
- Create lightweight runtime protections for Python applications to ensure that applications meet their privacy policies, and identify any privacy critical code sections where the lightweight protections are insufficient and need to be augmented with manual or more expensive approaches.
This project will require you to learn Rust, but allows you to engage more closely with some of the key concepts we cover in the course. Familiarity with database-backed web applications is helpful.
A privacy linter is a new kind of program analysis tool that helps developers check that their software conforms to privacy policies. A privacy linter relates a flexible set of high-level privacy properties to concrete, frequentlychanging code, and must have good ergonomics such as low program annotation overhead and useful diagnostics. In the ETOS group, we're currently developing Paralegal, a privacy linter tool for Rust programs.
In this project, you will apply Paralegal to an application to find potential privacy bugs. Your task is to come up with privacy policies for an existing web application and enforce it using Paralegal. This involves formalizing the privacy policy as properties written over Paralegal's semantic program dependency graph (SPDG), and to add markers to the source code that allow Paralegal to understand the semantic meaning of program constructs.
This is a good project if you are intersted in compilers and programming languages, want to learn more about Rust, and enjoy finding bugs. Some familiarity or willingness to learn basic compiler concepts like data flow and control flow analysis is expected.
Try to make HotCRP, or other open source web applications, GDPR compliant! You can approach this different ways. You can implement the rights to access and deletion manually, by analyzing the schema of the application, and determining what tables, joins, and conditions are required in order to select or delete the entirety of a user data. Alternatively, you can annotate the schema of the web application with information about data ownership and its relation to different users, and automatically generate the required functionality by analyzing these annotations, or by using an automatic system for GDPR compliance developed by Malte's research group.
Wei et al. built and deployed a survey to study user perceptions of privacy and intrusiveness in online services, especially in the context of targeted advertising. This survey relies on participants exercising their GDPR rights to download their data, and then uploading this data to the survey software.
This project focuses on analyzing data acquired using the same GDPR rights, in order to provide additional analysis of data practices of online services. You are welcome to come up with your own questions that you would like to answer via this analysis, provided that the course staff think they are reasonably scoped and appropriate to the class, or you can use one of the suggestions below.
- Compare data across services: you can download your own data from multiple services, and use data science algorithms to compare these services. For example, finding out differences in how these services profile you, or differences in the ads they show you based on these profiling.
- Compare data across users: you can design a survey that compares the data of different users to find out how services interact with users that have similar or different interests or profiles. For example, finding out whether they are shown similar or different ads, or finding out if advertisers use different or conflicting advertising for different groups of users.
The GDPR right to portability is one of the technically most challenging rights to implement, and one of the least stringently enforced rights in practice. This project studies how the spirit of this right may be realized in practice, so that users have an easy way of switching between competing similar services.
You will need to implement a tool that consumes data that a user extracts from their existing account with the source service, and transform it into a format that can be imported into a similar target service, or provide that target service with this data directly via its API (if it exposes any). Additionally, this entails finding meaningful data translation or correspondance notions between these services (e.g. what is the Twitter equivalent of a Facebook event?).
There exists some third party tools that allow users to port their data between certain applications (e.g. Spotify to Youtube). You can use these tools as a high level reference to guide your design.
Researchers have in recent years developed various encrypted and federated data stores, in which the party operating the store (e.g., a cloud provider or web service) does not have access to the data stored, or only to parts of it.
In this project, you will investigate how to add GDPR-compliant subject access requests (under the right to access and the right to deletion) to such a data store. As an example, consider the data collected for salary equity by the City of Boston, and what GDPR rights over that data might look like.
This project will teach you about secure multi-party computation (MPC); some familiarity with cryptography (or willingness to learn) is essential.