Some have professed themselves shocked that companies like Facebook and Google share user data with advertisers. Did any of these individuals ever stop to ask themselves why Facebook and Google don't charge for access?
Nothing in life is free. Everything involves trade-offs.
SEN. ORRIN G. HATCH, M. ZUCKERBERG SENATE HEARING - 10 APRIL 2018
We use multiple services everyday: a search engine to look for information, a cloud space to store our data, social networks to connect with people or to scroll through news. These services enable us to navigate easily through the net, most of the time for free. It is given that if we are not paying for a product, we are the product. Services such as Google and Facebook keep existing mainly for their data collection activities: our digital traces are carefully stored and used to profile our behaviours, interests and identities.
After the Cambridge Analytica leak and the GDPR introduction in the EU, we became much more aware about our "data pouring", how big is the amount of data we give for free to tech companies?
Is it possible to have control of those data?
We decided to investigate this issue top to bottom, by digging inside our own profiles and data. Our work wants to test companies behaviour when it comes to data collection and transparency.
This project wants to research how personal data are made accessible from companies.
We tried to split in categories personal data that we give to companies by using their products. We are going to analyze both the quantity and the quality of those data. This is a starting point to observe companies behavior in relation to new regulations. Our work is not meant to be complete: it is a starting point for everyone who wants to take back control of their own data and investigate further the role of tech companies in our lives.
We analyzed personal data that we gave out to companies. Our goal is to understand what information is being collected and how downloaded data are readable for users. We selected seven different platforms.
We chose the most famous social networks and services, based on their popularity among internet users.
We analyzed Facebook, Instagram, Google, Twitter, Whatsapp, Apple and Spotify.
A brief introduction is available by clicking on the names below.
First step: downloading our personal data.
Every platform should allow their users to access their data and to consult them. Once we had our folders we started by mapping paths to access every single file, then we assigned a label according to its content. The labeling was done manually for a qualitative result. First, we read all files to decide the main categories. Then, we assigned a category to every folder. We created two layers of categories. The first is more specific and describes the folder content. The second is wider and includes more general categories that describe the nature of its file (eg. Interaction, interest, list of contents, ecc.). This work is fundamental: we created standard labels, allowing a comparison between platforms.
The following visualization shows how wide is the variety of data collected by companies.
One single flake represents a platform. Every category of data has its own distinctive shape. The color is assigned according to the number of files inside one category. The darker one shape is, the bigger is the amount of files in that category. Platforms are ordered by foundation year and number of users - updated to 2018. The variety of data collected is influenced, of course, from the variety of services offered and the platform's dimension.
Facebook is the major data collector (including Instagram and Whatsapp), followed by Google that, is able to diversify its "harvest", thanks to its big services offer.
The first two things we noticed after downloading our data were the structure and the format each dataset has.
Format is an important feature when it comes at reading data: for the vast majority of users certain file formats can be very tricky to open and read.
Another important characteristic of data is the database structure: how the file folders are organized and nested, for example.
As GDPR (chapter V) requires certain standards to create easy and readable data for the user, we decided to visualize how the data are structured for each platform.
The visualization below represents a single platform by using a single circle: the bigger the circle, the "deeper" a user has to dig in order to find his files. Blue bubbles represent sub-folders, while green bubbles represent files. Different strokes stand for different file formats. It appears visible how complicate and tricky can be for one user to go through all the different subfolders, searching for informations. Often a naming issue is added: subfolders and files have misleading names that do not represent clearly the content. This could lead the user to confusion and discourage.
Personal data is a broad term to define informations that relate to an identified/identifiable living individuals.
But, specifically, what kind of data are collected for commercial reasons? What informations are more likely used to target us with tailored advertising and special offers that suddenly pop-up on our daily feed?
Every company has its own advertising and profiling rules, however we selected a portion of our data and filtered them for importance. We narrowed down to 6 distinct categories: interactions, interests, geolocalized informations, contents, personal informations and private conversations.
By carefully reading all the single files we inferred that those 6 types of data are probably the ones used to target us and recreate specific audiences for online advertising. We defined them "commercial data".
This visualizations displays our commercial data, showing the quantity released for every platform. Our aim was to compare our activities and online habits, in order to understand who give away the largest number of informations and also who is confusing them the most*. We wanted to demonstrate how different online behaviors can lead to more or less data dispersion.
*Confusing data is an effective practice that involve changing regularly personal informations, perform fake interactions or disable services in order to avoid a realistic profiling from companies.
The General Data Protection Regulation regulates the protection of natural persons with regard to the processing of personal data and on the free movement of such data. It aims primarily to give control to individuals over their personal data and applies to all companies processing and holding personal data of people living in the European Union, regardless of the company's location.
The EU General Data Protection Regulation (GDPR) replaces the Data Protection Directive 95/46/EC. The new regulation was approved by the EU Parliament on April 14, 2016 and enforced on May 25, 2018 - at which time those organizations in non-compliance may face heavy fines.
This regulation includes six main Data Subject Rights:
I. Breach Notification;
II. Right to Access;
III. Right to be Forgotten;
IV. Data Portability;
V. Privacy by Design;
VI. Data Protection Officers.
This is the main event time line.