Can software offer better data privacy by construction?
Web services that store and process sensitive personal data are critical to the digital economy today, but are often built without sufficient attention to users' rights over their data and its privacy. But doing a good job at data privacy is difficult, and requires substantial manual effort that costs billions of dollars every year.
The goal of this research project is to develop new software systems that fundamentally "democratize" good privacy practices, make it easy for users and web service operators to handle data in compliance with privacy laws, and retain or improve the performance of today's software.
Privacy-Compliant Storage Systems.
Easier compliance with privacy laws (GDPR, CCPA) using off-the-shelf software.
Privacy laws like the European Union's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) give users new rights to control their data, with non-compliance carrying the risk of steep fines. But with today's systems, compliance with these rights requires onerous manual labor, particularly from small and medium-sized organizations.
We are designing new storage and data processing systems that automate compliance with privacy legislation. Realizing this "compliance by construction" requires innovation in system design: for example, we are developing a new database architecture that replaces relational tables (which mix different users' data) with per-user micro-databases (µDBs) as a primary abstraction. Making such a federation of µDBs efficient requires new techniques to track the impact of changes to users' µDBs on derived data, and our system relies on dataflow computing, a well-understood technique from scalable big data processing, to make compliant-by-construction web services efficient.
Flexible User Data Control with Edna.
New user data control choices via systematic data sealing in web services.
Privacy in complex, data-rich applications is hard. Consider a user who wants to remove their account from a service: even once all their data is found, only some of it should be removed; other data should be anonymized or decorrelated (for legal reasons, or to maintain application utility for other users). Or, a user might wish to disavow and anonymize some of their contributions, but retain others. Some of these transformations should also be reversible in case the user wants to return or reassociate with their data.
Edna is a library that helps web applictions implement secure data sealing and revealing without breaking application functionality for other users. Edna helps developers generate privacy transformations for database-backed web applications from a high-level specification and preexisting data relationships. Edna helps simplify privacy transformations that applications use today (such as account deletion), but also goes beyond and makes it easier to support fine-grained and nuanced policies that would be cumbersome to implement manually today (e.g., structural decorrelation of data, or "decay" of identifying information over time).
Retrofitting GDPR Compliance onto Legacy Databases
Archita Agarwal, Marilyn George, Aaron Jeyaraj, Malte Schwarzkopf
Privacy Heroes Need Data Disguises
Lilian Tsai, Malte Schwarzkopf, Eddie Kohler
GDPR Compliance by Constructiono
Malte Schwarzkopf, Eddie Kohler, M. Frans Kaashoek, Robert Morris
Poly 2019 workshop at VLDB 2019