Can software offer better data privacy by construction?
Web services that store and process sensitive personal data are critical to the digital economy today, but are often built without sufficient attention to users' rights over their data and its privacy. But doing a good job at data privacy is difficult, and requires substantial manual effort that costs billions of dollars every year.
The goal of this research project is to develop new software systems that fundamentally "democratize" good privacy practices, make it easy for users and web service operators to handle data in compliance with privacy laws, and retain or improve the performance of today's software.
Privacy-Compliant Storage Systems.
Easier compliance with privacy laws (GDPR, CCPA) using off-the-shelf software.
Privacy laws like the European Union's General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) give users new rights to control their data, with non-compliance carrying the risk of steep fines. But with today's systems, compliance with these rights requires onerous manual labor, particularly from small and medium-sized organizations.
We are designing new storage and data processing systems that automate compliance with privacy legislation. Realizing this "compliance by construction" requires innovation in system design: for example, we are developing a new database architecture that replaces relational tables (which mix different users' data) with per-user micro-databases (µDBs) as a primary abstraction. Making such a federation of µDBs efficient requires new techniques to track the impact of changes to users' µDBs on derived data, and our system relies on dataflow computing, a well-understood technique from scalable big data processing, to make compliant-by-construction web services efficient.
Flexible Privacy with Data Disguises.
New privacy choices via systematic data transformation with privacy-preserving "disguises".
Privacy in complex, data-rich applications is hard. Consider a user who wants to remove their account from a service: even once all their data is found, only some of it should be removed; other data should be anonymized or decorrelated (for legal reasons, or to maintain application utility for other users), and some of these transformations should be reversible in case the user wants to return.
Data disguising is a systematic approach that helps developers generate privacy transformations for database-backed web applications from a high-level specification and preexisting data relationships. Data disguising simplifies privacy transformations that applications use today (such as account deletion), supports fine-grained and nuanced policies that would be cumbersome to implement manually today (e.g., structural decorrelation of data, or "decay" of identifying information over time), and enables reversible transformations for users who change their mind.
Retrofitting GDPR Compliance onto Legacy Databases
Archita Agarwal, Marilyn George, Aaron Jeyaraj, Malte Schwarzkopf
To appear at VLDB 2022
Privacy Heroes Need Data Disguises
Lilian Tsai, Malte Schwarzkopf, Eddie Kohler
GDPR Compliance by Construction
Malte Schwarzkopf, Eddie Kohler, M. Frans Kaashoek, Robert Morris
Poly 2019 workshop at VLDB 2019