THE INVISIBLE
CAPITAL
THE INVISIBLE
CAPITAL

Some have professed themselves shocked that companies like Facebook and Google share user data with advertisers. Did any of these individuals ever stop to ask themselves why Facebook and Google don't charge for access?
Nothing in life is free. Everything involves trade-offs.

SEN. ORRIN G. HATCH, M. ZUCKERBERG SENATE HEARING - 10 APRIL 2018

WHAT

We use multiple services everyday: a search engine to look for information, a cloud space to store our data, social networks to connect with people or to scroll through news. These services enable us to navigate easily through the net, most of the time for free. It is given that if we are not paying for a product, we are the product. Services such as Google and Facebook keep existing mainly for their data collection activities: our digital traces are carefully stored and used to profile our behaviours, interests and identities.

WHY

After the Cambridge Analytica leak and the GDPR introduction in the EU, we became much more aware about our "data pouring", how big is the amount of data we give for free to tech companies?
Is it possible to have control of those data?

We decided to investigate this issue top to bottom, by digging inside our own profiles and data. Our work wants to test companies behaviour when it comes to data collection and transparency.

HOW

This project wants to research how personal data are made accessible from companies.
We tried to split in categories personal data that we give to companies by using their products. We are going to analyze both the quantity and the quality of those data. This is a starting point to observe companies behavior in relation to new regulations. Our work is not meant to be complete: it is a starting point for everyone who wants to take back control of their own data and investigate further the role of tech companies in our lives.

We analyzed personal data that we gave out to companies. Our goal is to understand what information is being collected and how downloaded data are readable for users. We selected seven different platforms.
We chose the most famous social networks and services, based on their popularity among internet users.
We analyzed Facebook, Instagram, Google, Twitter, Whatsapp, Apple and Spotify.

A brief introduction is available by clicking on the names below.

Facebook

Instagram

Google

Twitter

Whatsapp

Apple

Spotify

Facebook

Facebook can be accessed from a large range of devices with Internet connectivity, such as desktop computers, laptops, tablet computers, and smartphones.
Registration is required. The application allows users to create a customized profile with their information and preferences. Users can create their own network adding to it their friends. People can exchange messages, post status updates, share photos, videos and links. All of these action are notified with special messages that inform the user about friends' activity.

Founded: February 4, 2004
Type: Social networking service
Headquarters: Menlo Park, California (United States)
Founder(s): Mark Zuckerberg, Eduardo Saverin, Andrew McCollum, Dustin Moskovitz, Chris Hughes
Owner: Facebook Inc.

Sources: Wikipedia, Statista

Instagram

Instagram (short for Instant Telegram) is a photo and video-sharing social networking service owned by Facebook. The application allows users to upload photos and videos to the platform.
Contents are organized, based on preferences, on a personal page but they are also visible in a shared board that collect all the media shared by contacts. Contents can be edited with various filters, and organized with tags and location information. An account's posts can be shared publicly or with pre-approved followers. Users can browse other users' content by tags and locations, and view trending content. Users can "like" photos, and follow other users to add their content to a feed.

Founded: October 6, 2010
Type: Photo and video networking
Headquarters:
Founder(s): Kevin Systrom, Mike Krieger
Owner: Facebook Inc.

Sources: Wikipedia, Statista

Whatsapp

WhatsApp Messenger is a freeware and cross-platform messaging and Voice service owned by Facebook.
The application allows users to send text messages and voice calls, video calls, images, videos and user location. The application runs from a mobile device; the service requires users to provide a standard cellular mobile number and Internet connection. The service is accessible also from desktop computers. Computer application synchronizes the mobile contacts using a QR code.

Founded: February 24, 2009
Type: Instant messaging and social media
Headquarters: Mountain View, California (United States)
Founder(s): Jan Koum, Brian Acton
Owner: Facebook Inc.

Sources: Wikipedia, Statista

Google

Google is a multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, search engine, cloud computing, software, and hardware.
It offers services designed for work and productivity (Google Docs, Sheets, and Slides), email (Gmail/Inbox), scheduling and time management app (Google Calendar), cloud storage (Google Drive), social networking (Google+), instant messaging and video chat (Google Allo/Duo/Hangouts), language translation (Google Translate), mapping and turn-by-turn navigation (Google Maps/Waze/Earth/Street View), video sharing (YouTube), note-taking (Google Keep), and photo organizing and editing app (Google Photos).

Founded: September 4, 1998
Type: Software and hardware company and Search Engine
Headquarters: Mountain View, California (United States)
Founder(s): Larry Page, Sergey Brin
Owner: Alphabet Inc.

Source: Wikipedia

Twitter

Twitter is an online news and social networking service on which users post and interact with messages known as "tweets".
Registration is required to post tweets, while those who are unregistered can only read them. Every tweet can include 280 characters maximum and it is possible to attach some media contents, like photos or videos.
Twitter is accessible through its website interface, through Short Message Service (SMS) or users can download the application.

Founded: March 21, 2006
Type: News and social networking service
Headquarters: San Francisco, California (United States)
Founder(s): Jack Dorsey, Noah Glass, Biz Stone, Evan Williams
Owner: Twitter Inc.

Sources: Wikipedia, Statista

Spotify

Spotify is a music streaming service that provides contents from record labels and media companies protected by the Digital Rights Management. Music can be browsed through or searched for by parameters such as artist, album, genre, playlist, or record label.
Users can create, edit, and share playlists and tracks on social media. Spotify provides access to more than 35 million songs.
The service is a freemium; that means basic features are free with advertisements or limitations, while additional features, such as improved streaming quality and music downloads, are offered via paid subscriptions.

Founded: April 23, 2006
Type: Music service
Headquarters: Stockholm, Sweden
Founder(s): Daniel Ek, Martin Lorentzon
Owner: Spotify Ltd

Sources: Wikipedia, Statista

Apple

Apple is a multinational technology company that designs, develops, and sells consumer electronics, computer software, and online services.
The company's hardware products include the iPhone smartphone, the iPad tablet computer, the Mac personal computer, the iPod portable media player, the Apple Watch smartwatch, the Apple TV digital media player, and the HomePod smart speaker. Apple is the world's largest information technology company by revenue and the world's third-largest mobile phone manufacturer.

Founded: April 1, 1976
Type: Software and hardware company and digital distribution
Headquarters: Cupertino, California (United States)
Founder(s): Daniel Ek, Martin Lorentzon
Owner: Apple Inc.

Source: Wikipedia

Which categories of data are the most interesting?

I. tasty data

First step: downloading our personal data.
Every platform should allow their users to access their data and to consult them. Once we had our folders we started by mapping paths to access every single file, then we assigned a label according to its content. The labeling was done manually for a qualitative result. First, we read all files to decide the main categories. Then, we assigned a category to every folder. We created two layers of categories. The first is more specific and describes the folder content. The second is wider and includes more general categories that describe the nature of its file (eg. Interaction, interest, list of contents, ecc.). This work is fundamental: we created standard labels, allowing a comparison between platforms.
The following visualization shows how wide is the variety of data collected by companies.
One single flake represents a platform. Every category of data has its own distinctive shape. The color is assigned according to the number of files inside one category. The darker one shape is, the bigger is the amount of files in that category. Platforms are ordered by foundation year and number of users - updated to 2018. The variety of data collected is influenced, of course, from the variety of services offered and the platform's dimension.
Facebook is the major data collector (including Instagram and Whatsapp), followed by Google that, is able to diversify its "harvest", thanks to its big services offer.

user profile health videos images text generic cloud activity device infos contacts saved items payments geolocalization chronology lists of content searches apps advertising calls messages social boundaries reactions network events mail posts PLATFORM NAME contents 1- 3 4 -10 11 - 20 21 - 30 interests geolocalization, payments, saved items, contacts, device infos, user profile, activities, health chronology, lists of content, searches,apps, advertising messages, social boundaries, reactions,network, events, mail, posts cloud, generic, text, images, videos interactions // type of data collected // quantity of collected data for every category personal
subscripted users in 2018 platform foundation year user profile activity device infos contacts apps network launched in: 2009 users: 1.200.000.000 monthly users WHATSAPP images user profile activity device infos contacts lists of content searches reactions posts launched in: 2010 users: 1.000.000.000 monthly users INSTAGRAM activity saved items lists of content social boundaries launched in: 2006 users: > 170.000.000 monthly users SPOTIFY generic user profile activity device infos contacts lists of content searches apps advertising messages social boundaries reactions network posts launched in: 2006 users: > 974.000.000 monthly users TWITTER images health activity contacts payments geolocalization chronology lists of content apps advertising calls messanges social boundaries reactions network events posts videos launched in: 2004 users: > 2.300.000.000 monthly users FACEBOOK images generic cloud user profile health activity device infos contacts payments geolocalization chronology lists of content apps calls social boundaries posts launched in: 1976 users: 588.000.000 APPLE videos images generic user profile activity device infos contacts payments geolocalization chronology lists of content searches apps advertising calls messages social boundaries reactions network events mail posts launched in: 1998 users: 1.000.000.000 GOOGLE
How are data structured?
// depth level // file typology html json mp3/mp4 jpg pdf others txt csv superficiallayer deepestlayer undefined Hover on bubbles to explore a file or a sub-folder name

II. dig your data

The first two things we noticed after downloading our data were the structure and the format each dataset has. Format is an important feature when it comes at reading data: for the vast majority of users certain file formats can be very tricky to open and read. Another important characteristic of data is the database structure: how the file folders are organized and nested, for example.
As GDPR (chapter V) requires certain standards to create easy and readable data for the user, we decided to visualize how the data are structured for each platform.
The visualization below represents a single platform by using a single circle: the bigger the circle, the "deeper" a user has to dig in order to find his files. Blue bubbles represent sub-folders, while green bubbles represent files. Different strokes stand for different file formats. It appears visible how complicate and tricky can be for one user to go through all the different subfolders, searching for informations. Often a naming issue is added: subfolders and files have misleading names that do not represent clearly the content. This could lead the user to confusion and discourage.

Apple Inc.
Twitter
Instagram
Google
Whatsapp
Facebook
Spotify
re data? - III. who produce more data? - III. who produce more data? - III. who am I? - III. who am I?

III. A ton of data!

Personal data is a broad term to define informations that relate to an identified/identifiable living individuals. But, specifically, what kind of data are collected for commercial reasons? What informations are more likely used to target us with tailored advertising and special offers that suddenly pop-up on our daily feed? Every company has its own advertising and profiling rules, however we selected a portion of our data and filtered them for importance. We narrowed down to 6 distinct categories: interactions, interests, geolocalized informations, contents, personal informations and private conversations. By carefully reading all the single files we inferred that those 6 types of data are probably the ones used to target us and recreate specific audiences for online advertising. We defined them "commercial data".
This visualizations displays our commercial data, showing the quantity released for every platform. Our aim was to compare our activities and online habits, in order to understand who give away the largest number of informations and also who is confusing them the most*. We wanted to demonstrate how different online behaviors can lead to more or less data dispersion.

*Confusing data is an effective practice that involve changing regularly personal informations, perform fake interactions or disable services in order to avoid a realistic profiling from companies.

do not allow to download all the data Filter per type of data: interactions interests geolocalized contents personal private conversations RESET FILTERS Data Noise ON OFF
Facebook Apple Spotify Twitter Google Instagram Francesca Moreno Ginevra places_you_created your_location_history messages pages advertisers_you_interacted_with your_search_history followed_pages ads_interests followers following received_friend_requests received_friend_requests sent_friend_requests your_event_responses your_events your_posts posts and comments polls_you_voted_on comments photos event_invitations saved searches comments connections likes messages photos videos media Store Re-download and update history Your Podcasts Podcasts Playstate iCloud Bookmarks Call History Health Store Free Transaction History Store Transaction History Safari Browsing History Transactions Labeled Places Maps > MyActivity Saved Places Ads > MyActivity Browser History Books > MyActivity News > MyActivity Bookmarks Search > MyActivity Search History Autofill Searches Engines Shopping > MyActivity Video_Search > MyActivity Watch History favorites my_comments playlists subscriptions Posts Plus1 Events Follow Playlist Search Queries Streaming History Your Library ad_engagements ad_impressions likes followers following tweets
How does GDPR work?

IV. take control

The General Data Protection Regulation regulates the protection of natural persons with regard to the processing of personal data and on the free movement of such data. It aims primarily to give control to individuals over their personal data and applies to all companies processing and holding personal data of people living in the European Union, regardless of the company's location.
The EU General Data Protection Regulation (GDPR) replaces the Data Protection Directive 95/46/EC. The new regulation was approved by the EU Parliament on April 14, 2016 and enforced on May 25, 2018 - at which time those organizations in non-compliance may face heavy fines.
This regulation includes six main Data Subject Rights:
I. Breach Notification;
II. Right to Access;
III. Right to be Forgotten;
IV. Data Portability;
V. Privacy by Design;
VI. Data Protection Officers.
This is the main event time line.

The Right of access and the Right of be forgotten are two important points of the GDPR. Those two rights refer specifically to the user freedom: every platform should provide an easy and clear way to access, download and eventually delete permanently their data. After the Regulation approval the vast majority of platforms (both the ones involved in this projects and the ones we left out) sent to their users a notification to certify their compliance. However, for fact checking sake, we decided to try ourselves.
The visualization above shows the path to retrieve data for every platform involved: the longer the line the harder is to retrieve data.
The most time consuming and complicated procedure - as far as we can tell - are Apple's and Google's, with up to seven phases needed.
Apple
Facebook
Google
Instagram
Spotify
Twitter
Whatsapp
Data Request
Right to be forgotten
V. The Invisibile Capital
Every time we use our devices we leaves traces and personal data behind without even knowing it. Those informations can seem worthless to you, but someone else can be interested. Companies collect data, create knowledge from them and then they sell this knowledge to anyone who is willing to pay. We decided to call our project "Invisible Capital" to symbolize how important and meaningful can be deciding to behave on a certain way online: data should be considered a precious commodity to protect.
This project is a preliminary analysis, that will be hopefully become richer as people ask themselves new questions. We discovered different paths, some of them could be deepened. After Cambridge Analytica, we would like to know how many people are actively interested in know more about their data. How many researches are made? Who is more interested? Which platform people judge negatively? Which kind of data people would rather not share? Which data are already meaningful and which others are generated starting from other informations?
Further Readings
Deciding to focus on a small part of a wider controversy opens further questions that will need further knowledge and material. Here you can find a list of articles, tools and books that hopefully will help you to deepen some aspects. We selected carefully written articles, data detox programs and hacktivism techniques, along with other interesting projects.
Color meaning:
lillac - official policies       violet - tools       navy - articles