Thesis Proposal

 

"Cryptography and Similarity for Multidimensional Data"

Evgenios M. Kornaropoulos

Wednesday, March 22, 2017, at 3:00 P.M.

Room 506 (CIT 5th Floor)

Designing systems that discover, organize, and utilize information among collections of multidimensional data while maintaining the privacy of the participants is of paramount importance in a variety of applications. To name a few examples, e-discovery is an area where similarity queries are issued for a collection of private legal files that may contain corporate secrets; personalized medical treatments harvests the benefits of securely discovering other patients with similar health records, and storage services offer similarity-based deduplication backup and archival solutions.

In this thesis, we address the challenges of combining cryptographic techniques with similarity-related functionalities over multidimensional data. Specifically, we study the notion of similarity between multidimensional data objects from the perspective of secure multiparty computation and structured encryption and as a cryptographic primitive for secure storage applications. Using the lessons learned from uncovering vulnerabilities of state-of-the-art privacy-preserving techniques, we propose new cryptographic constructions that avoid the above attacks.

Our preliminary results introduce the notion of perturbation attacks on secure approximation protocols. In a perturbation attack, a party can minimally perturb her data (1) offline and (2) before the execution of any secure protocol so as to steer the approximation result to a maliciously chosen output. We introduce two attacks, with rigorous success guarantees, on the well-studied sketching techniques of minhashing and simhashing. In order to mitigate perturbation attacks, we introduce a server-aided architecture. In this new design, there is an additional party, the server, that assists in the secure similarity approximation by handling the common randomness as private data. We introduce and implement the necessary secure protocols so as to apply the minhashing and simhashing sketching techniques in the new server-aided architecture.

In our proposed work we will study reconstruction attacks on databases that securely respond to k-nearest neighbor (k-NN) queries. Our leakage-abuse attack is going to be solely based on the access pattern leakage. Such a result would demonstrate that regardless of the specifics of the cryptographic construction, a structured encryption scheme for k-NN queries can be recovered under the weakest form of adversary.

The second objective of our proposed work is to study the properties of secure similarity-based deduplication. We are planning to formulate a cryptographic primitive that captures the security properties of a similarity-based deduplication approach. We expect that this primitive is going to be the building block for future deduplication mechanisms that operate on encrypted data.

Host: Roberto Tamassia