Projects

Final projects should seek to answer a research question through implementation of a new idea in a real system. This could take one of several forms:

You may work on projects individually, or in groups of two to three students. Your project deliverables include a proposal, a progress report, a final paper describing design and implementation, your code, and a presentation I will post the final presentation and writeup to the course website (unless you explicitly want it kept confidential for a good reason).

Important dates

Project proposal

Please use the OSDI 2018 submission template. Your proposal should be a one-page summary of what your idea is, how you plan to go about investigating it, and what techniques you will apply (or need to learn about beyond the course material).

Project ideas

Here's a list of some starter ideas to get you thinking. Please feel free to pursue your own ideas! Click on the project idea to get some more information.

Reproducing results is an important part of research, and an awesome way to really learn how something works! Pick any of the systems we learn about in the course and implement your own version of it. Then try to reproduce their results! You may wish to simplify some aspects of the system to make the reproduction practical within the time available.

Try to make HotCRP, or other open source web applications, GDPR compliant! You can approach this different ways. You can implement the rights to access and deletion manually, by analyzing the schema of the application, and determining what tables, joins, and conditions are required in order to select or delete the entirety of a user data. Alternatively, you can annotate the schema of the web application with information about data ownership and its relation to different users, and automatically generate the required functionality by analyzing these annotations, or by using an automatic system for GDPR compliance developed by Malte's research group.

GDPR gives data subject the right to control how their data is processed and for what purposes. This project focuses on designing a set of policies governing when data is allowed to be processed for certain tasks. These policies might be coarse-grained, for example, this data cannot be used for any advertising purposes, or finer-grained, for example, this data can only be used when combined with data from at least a 100 other users, or this data can only be used in aggregate that are differentially private with some minimum privacy parameters.

In addition to designing this policy language, the project must devise a way to enforce these policies after they are set by the user. This may likely benefit from integrating with Pelton, a system for GDPR-compliance in web applications developed by Malte's research group. Pelton organizes data physically by the owning user, and represents complex computations over them as dataflows. A possible implementation of this project idea may store policies attached to each user's physically separated data, analyze the respective dataflows, and check that a flow satisfies the set policies prior to feeding it any user data.

Web frameworks and ORMs, such as Django, allow developers to express their schemas using flexible and customizable hierarchies of classes, rather than using SQL. This provides an opportunity to extend this schema language with privacy and ownership features, which can guide how the underlying system may automatically implement selected aspects of GDPR compliance, such as subject access, right to deletion, and purpose limitation.

This project entails extending ORM APIs with ways to encode developer-provided specifications of compliance, enforcing these specification within the ORM or underlying systems, and demonstrating the effectiveness of the implementation by applying it to some simple ORM-based web application(s).

Rethink how Resin-like policies may be used to enfroce purpose limitation while maintaining Resin's simple runtime enforced taint propagation and enforcement. This may require redesigning the policy language of Resin to be more suitable for expressing restrictions on purposes of computation, and allowing end-users to set or configure their policies independently.

GDPRBench is a standard benchmark for measuring the overhead of GDPR-compliance in Database systems. The GDPRBench paper describes experiments on GDPR-compliant versions of Redis and Postgres. This project focuses on running the GDPRBench benchmark for a different database system and measuring the overhead in that setting.

Adding database with different designs to the aforementioned ones might be the most interesting option, since their different design might have sizable consequences to the overhead of achieving GDPR-compliance for them. For example, key-value store-oriented DBs such as RocksDB, or document stores such as MongoDB.

GDPR requests to data deletion rarely end up cascading to delete all data associated with that user completely. GDPR allows for data to be preserved if the application requires it to comply with existing laws or contractual agreements. Data may be shared between multiple users, and deleting such data may be controled by these multiple users or a subset of them together. Finally, data essential to the application functionality may be blinded or anonymized, rather than deleted completely.

These scenarios differ from application to application, and must be encoded by developers as specification for how compliance is implemented. This project investigates developing a set of deletion primitives that can express these different diverse scenarios, drawing on inspiration from DELF's deletion policies. You may choose to integrate these policies into Pelton, a system for GDPR-compliance in web-application with simple deletion policies, or implement and evaluate them standalone.

Dory and Coeus allows users of a web application with many stored documents to search for and retrieve documents by keywords privately (i.e., without revealing their search query or retrieved document to the service). However, these systems rely on expensive cryptographics primitives (ORAMs, homomorphic crypto schemes) to maintain privacy.

An alternative design can make this functionality more efficient. Instead of relying on homomorphic encryption, the web application may be federated between multiple entities (e.g., Wikipedia and the EFF). Clients can submit their queries in secret shared form to both parties, such that each party has no information about the query separately but can perform the desired computation together.

This project focuses on implementing and measuring various iterations of multi-party algorithms and protocols for document search and retrieval. In addition to learning about novel cryptographic techniques and using state of the art cryptographic systems, this project will help you learn how cryptographic protocols are designed, optimized, and evaluated.

Wei et al. built and deployed a survey to study user perceptions of privacy and intrusiveness in online services, especially in the context of targeted advertising. This survey relies on participants exercising their GDPR rights to download their data, and then uploading this data to the survey software.

This project focuses on analyzing data acquired using the same GDPR rights, in order to provide additional analysis of data practices of online services. You are welcome to come up with your own questions that you would like to answer via this analysis, provided that the course staff think they are reasonably scoped and appropriate to the class, or you can use one of the suggestions below.

The GDPR right to portability is one of the technically most challenging rights to implement, and one of the least stringently enforced rights in practice. This project studies how the spirit of this right may be realized in practice, so that users have an easy way of switching between competing similar services.

You will need to implement a tool that consumes data that a user extracts from their existing account with the source service, and transform it into a format that can be imported into a similar target service, or provide that target service with this data directly via its API (if it exposes any). Additionally, this entails finding meaningful data translation or correspondance notions between these services (e.g. what is the Twitter equivalent of a Facebook event?).

There exists some third party tools that allow users to port their data between certain applications (e.g. Spotify to Youtube). You can use these tools as a high level reference to guide your design.

A variety of laws governing data privacy and protection have been enacted in different countries over recent years. Some of these laws are general-purpose and have broad applicability (e.g. GDPR, CCPA), while others are domain-specific (e.g. HIPAA, FERPA).

In this project, you will write a comparative analysis of 2-3 such laws. Your analysis may focus on differences in the scope of the laws, the rights or expectations of privacy that it sets, as well as the technical consequences they may have on computer systems and common applications. In particular, you may focus on how some of the techniques or papers that we see throughout the course can be used to comply with these different laws.