CSCI 2390 2021 Meeting 1: Introduction CSCI 2390: Privacy-Conscious Computer Systems What does the course title mean? Realistic systems whose design takes user privacy into account. "Conscious" because this is about adding privacy-awareness to systems that have some other primary purpose. Introductions What will we do? Read, present, and discuss research papers. Try our practical systems. Complete a research project of your own. Problem setting: a typical web application [user browsers, internet, web servers w/ app logic, DB] users' data in DB, held on servers not under user control social media posts pictures emails e-commerce orders telemetry data about how user interacts with web site ... very successful model! always available no need to run servers no need to back up data application can combine multiple users' data running servers + DB costs money users pay? turns out people like "free" services better often: company monetizes user data problem: lack of control web service sets rules for who may access data web service controls how data stored and secured web service may snoop, sell information web service employees may snoop for personal reasons web service may snoop on behalf of the government any control over data that exists is through web UI or API defined by company can be more restrictive than users desire ex: FB privacy settings can have bugs that expose private data! ex: FB graph search reality is even more complex than this picture not just DB, but multiple storage systems machine learning models trained over data logging, backups, A/B testing, internal data analysis What can we do about this? option 1: give up accept that we're entering a Faustian bargain for free services could have drastic consequences in the future makes us vulnerable to identity theft, embarrassment, repression gives companies enormous, unprecedented power option 2: stop using web services unlikely to be realistic use often externally mandated (ex: by employer, university, etc.) option 3: go rogue, organize the grassroots replace all services with decentralized alternatives cryptocurrencies, blockchains, P2P idea: trust no one, all data resides on users' devices practicality unclear, but we'll look at this option 4: lobby for change companies want to avoid embarassing scandals but little incentive to change their business model legislators increasingly realize there's a problem but laws are slow to change, and intentionally vague technology moves much more rapidly our role as engineers and researchers: provide solutions that can help privacy not a chief concern in system design today traditionally we didn't care about how systems we built are used ex: general purpose relational database What are our goals? improve user privacy vague, and can mean many things candidate 1: make users anonymous, strong protections, tinfoil hats certainly reasonable in some settings, ex: dissidents communicating candidate 2: increase control over what companies do with our data less ambitious, but more compatible with current user experience aim to avoid preventable data exposure and unwanted data use concerns maintain current successful web site architecture? easy to program for developers good performance, reliability users generally happy can provide useful services by combining users' data ex: crowdsourced recommendations successful revenue model exists (advertising) overheads imposed on web site operators extra development cost extra hosting cost (backend performance, space) extra complexity restrictions in what backend can do costs of monitoring compliance overheads imposed on users need to understand implications of privacy choices already unwilling to read privacy policies... for some solutions, need to be much more active participants in service make technology recommendations industry and government need them but need to be realistic about chances of adoption invent new technology may come in the form of different system designs may apply existing research ideas in new ways may help better inform users about choices and their implications Course structure https://cs.brown.edu/courses/csci2390/2021 today: overview and discussion about privacy further meetings will be paper discussions (not lectures) either about a research paper, or a piece of software/project from meeting 4 onwards, you will present papers presentation (~30 min) highlight key ideas and lessons track down hard details (may want to read further related work) raise questions for discussion if possible, try out the program or system don't worry, we'll help each discussion (50 min) has leaders sign up via form linked from the schedule page for a paper to lead everyone should read and think about every paper NOT just turn up and wait for the discussion leads to explain! prepare to ask and answer questions! critique/support the work on a technical level (correctness, methods, performance?) on a pragmatic level (deployable, practical, flexile?) assignment: paper questions set questions about the paper space to ask *your* questions submit answers by 11pm on the day before class three larger assignments 1. request your personal data from web services and analyze it 2. a GDPR case study & presentation 3. differential privacy implementation exercise project pick a research idea, inspired by the course or related design, build and evaluate solution, check if the idea is good proposal (due Oct 1) project conferences report, final short presentation groups encouraged, but can work on your own too Constitutional podcast episode https://www.washingtonpost.com/podcasts/constitutional/episode--privacy/ what is privacy? "right to be left alone" protection from state, other individuals, corporations notion has evolved over time ex: American revolution in part about protection of property from search and seizure without warrant 4th amendment to U.S. constitution explicit about this => protection from state ex: tabloid press around 1900 protection of individual from distress due to gossip Brandeis tort laws recent: Hulk Hogan vs. Gawker media => protection from media/other individuals ex: Olmstead vs. U.S. wiretapping, new technology, how does 4th amendment apply? => protection from state abusing new tech ex: Carpenter vs. U.S. new tech vs. third party doctrine giving information to 3rd party => loss of expectation of privacy very problematic with today's technology! => protection from goverment, but also 3rd parties Q: what does privacy mean on the internet today? For next Tuesday: read paper on targeted advertising on Twitter submit & answer questions online