CSCI-2390: Privacy-Conscious Computer Systems

⚠️ This is not the current iteration of the course! Head here for the current offering.

Assignment 1: Take Control of Your Data!

In this assignment, you will investigate what degree of control you have over your personal data stored in different web services, learn something about what these web services know about you, and reflect on the challenges for privacy that can result. The goal is to understand how services collect and retain data, and how they make it available to their users.

Overview of what you will do:

Request and download your data from a few web services that you use.
Answer a few questions and write a short reflection on what you found.

Please read the detailed instructions below, as they are designed to help make the assignment easier for you. As you work through the tasks, you will encounter six questions; please take notes as you go along, as you'll write up your answers in the end.

Step 1: Investigate ways to get your data

We all use many web services today. Some of these services give users access to their own data in bulk form: in other words, you can download the information that you have given to the service, as well as information that the service stores about you.

In many cases, the companies behind these web services use your data to make money, and they may share your data with other organizations as part of this business model. Sometimes, you can find out what information has been shared with whom.

Your task is to pick a few web services that you use on a regular basis, and to investigate three things:

whether the service makes your data available for access and download;
what data is available; and
how easy it is to get access.

For some large services, you can find instructions online (e.g., Google, Facebook, Twitter, Instagram, Spotify), while others may require you to make a request (e.g., Pinterest, Netflix). For other services, a web search for "<service name> download data" often helps find instructions. Since web services have to provide this information under the GDPR, it can also help to look for GDPR-related instructions (e.g., search for "<service name> GDPR data request").

In addition, there are sometimes other places where you can view data about yourself. Twitter, for example, gives you the option to request a list of advertisers who have "uploaded" your contact details as part of an "audience" (go to "Settings", "Account", "Your Twitter Data", "Interests and ads data", "Tailored audiences").

It may also be very interesting to look into smaller sites and services, which are often less well-placed to provide this information. Some may require you to make a manual request (e.g., by email, or postal letter); you're welcome to do so if you'd like, but there is no expectation that you jump through huge hoops for this assignment. If you rely on the GDPR for your request, remember that strictly speaking only users in the EU are able to exercise rights under the GDPR; if you happen to be in a EU country, feel free to do so, but if you are in the US, services do not have to comply with your request under the GDPR.

Make sure to also keep notes on which services do not provide a way to access your data!

Step 2: Get the information

Now download or request your data. It may take a little while to get it; some services make you wait for anything from minutes to hours or days until they supply you with a file to download.

Question 1: Think about why services take a while to return your data. What technical reasons could there be? What non-technical, policy reasons could there be?

Look over the results, and choose one or two services as examples to answer the following questions.

Note: some services (e.g., Twitter, Facebook) supply a ZIP archive that contains HTML files to visualize data for easy viewing. However, the visualized data may not be everything that's in the archive in raw form! Do make sure to check the contents of files (e.g., JavaScript files, JSON files) for additional data. Twitter, for example, includes highly detailed information about ad targeting, but does not visualize it.

Step 3: Analyze what you got

Take a good look at the data you received. It may come in a variety of formats, such as a ZIP archive containing JSON files, or a big text dump.

Question 2: How is the data organized? Is it easy to understand what it contains, and would it be possible to process this information in an automated way (e.g., to visualize to non-technical users what information the service keeps about them).

Now, look into the content itself. You'll probably find many things you expect, such as posts you made, pictures you uploaded, etc.; but there may also be other information that you did not directly provide (e.g., information related to advertising). This data tells you something about how the service is using your data, and what information they track about you. (The service's privacy policy is also a good source for this information, but may provide it only in rather vague terms. You can use Polisis to get a summary of services' privacy policies.)

Question 3: What were you surprised by? Was there data you did not expect to see? Did you find information that you didn't even remember this service having? Did the data include any information about other users? Were you concerned about anything you found?

Finally, reflect on how you use the service, what it does, and what types of information the service needs to work. Often, this includes information that you did not directly provide: for example, the business model of many web services is targeted advertising, but in order to allow advertisers to target you, the service must classify your interests into a set of categories available to advertisers (e.g., demographic, interests, etc.).

Question 4: To a best guess, is this data complete? In other words, are there actions you took or information you provided to the site that you expect to have yielded personal data stored or processed that the service did not return to you?

There are certain categories of data – such as fully anonymous data that cannot be linked to an individual even with the aid of additional datasets, or information covered by any of the exemption cases in laws like the GDPR – that services are allowed to keep private. If there it something missing, reflect on whether you think that information would be covered by these.

Step 4: Reflect, and write up your experience

Finally, reflect on your experience retrieving and analyzing your data.

Question 5: Was it easy to get your data, and to understand the contents? Could you imagine less technical users (e.g., older family members) making use of this facility, and would they benefit? Did your opinion of the web service change at all in response to what you found?

While it is good to have convenient web services, the accumulation of sensitive data on computers that you do not control can also expose you to risks. Let's think about what could happen.

Question 6: What dangers are inherent in this data? What could an adversary (e.g., a hacker who compromises the web service, or someone who impersonates you to request the data) do with the information included? (Feel free to answer this question in general or hypothetical terms if you prefer to avoid revealing information about yourself or your data.)

Now write up your answers to the questions in a short document (1-2 pages; a paragraph for each question suffices). You're welcome to use whatever text format and editor suits you, but please submit a PDF, Markdown file, or plain text document.

Task: Submit your answers by email to Malte by 11pm (Eastern time) on Friday, September 18, 2020.