The Messy World of Personal Data Ownership

Data is Valuable

With each passing day, our world becomes more interconnected than ever before. Around the globe more and more people continue to gain access to mobile technology, fueling the ongoing worldwide data explosion. In fact, a 2018 infographic by Domo showed that the average person generates 1.7MB of data every second of every day. Since that infographic was published, annual data creation has doubled, and is projected to double once again by 2025. A study from McKinsey projects that, when properly implemented, big data has the potential to generate upwards of $3 trillion in new revenues while pushing major advancements in technologies such as self-driving cars, personalized medicine and healthcare, as well as highly traceable food supply chains with reduced waste.

The sheer volume of data generated around the world on a daily basis is absolutely staggering. Consumers know this information is valuable. Why else would companies fight so hard to access and sell it? But beyond knowing that our data is prized, many people are simply left in the dark when it comes to personal data ownership. Who actually owns the data? What is it used for? Who is allowed to access it? Can they access it in its entirety or only pieces of it? Where is the data stored? Can it be sold?

It is a bit perturbing to consider the sheer amount of information large businesses have on each and every individual user. Spending habits, hobbies, interests, GPS locations, internet search history, social media connections, digital media consumption habits, and more. These companies have every data point on their consumers that you could possibly imagine.

Big Data Drives Machine Learning AI

This personal data is then aggregated and used to train various machine-learning models. With enough data, those models can be used to make predictions about entire demographic segments. Artificial intelligence (AI) and machine learning also utilize this data extensively throughout many industries for driving innovation in everything from product development to recruiting and manufacturing. And it appears it’s just getting started, with the International Data Corporation (IDC) projecting the global AI market will grow from $328 billion in 2021 to a potential $554 billion by 2025.

Personal Data Ownership and the Data Brokers

In sufficient volume, these datasets on specific demographics are coveted commercial assets. Data brokers sell these sets of data to all kinds of interested parties, ranging from corporations to marketers, investors, and political campaigns. In 2019, the market for personal data was worth more than $200 billion in the U.S. alone. This may leave you wondering why you haven’t received a check for the share of data you own. That’s because you don’t actually own the personal data you generate. As things stand today, U.S. law is ambiguous when it comes to the question of personal data ownership and therefore there is no personal right to it.

A Disconnect Between Producers and Consumers

Companies are constantly finding new ways to leverage consumers’ data. As the industry rapidly evolves, the producers, consumers, and stewards of personal data have found themselves without any kind of guidelines or playbook to speak of. Data producers want a trustworthy way to connect to their data, enabling them to make the most well-informed decisions. Data consumers (the companies) and stewards want safe and secure tools to share and sell that data with whomever needs it. As it stands, most technology platforms currently fall short in most or all of these areas, creating a disconnect and sense of mistrust between the people who produce the data and the companies who consume it.

Ideally, data should be owned solely by the person who created it. It could be treated like a raw material, sold at the owner's discretion in its entirety or as parts. This would give the individual owner absolute authority over who has access and which specific datasets they can see. It would make tracking your data easier while allowing you to directly benefit financially instead of the data brokers.

Decentralized Data and Lack of Transparency

Sadly, today’s world does not work like this. The data explosion and rapidly evolving nature of technology has created a long list of issues that continue to plague the industry. One of the core issues is a lack of transparency, trust, and privacy between the people who generate the data and the businesses that gather it. Most people simply don’t know what information is being gathered from them, who is gathering it, or what it’s being used for.

There is also the issue of data decentralization. Bits and pieces of data are constantly being harvested by different businesses for each online interaction. They are then stored at different locations and likely on completely different technology stacks. That means data gathered from a series of interactions the U.S. could end up stored in server farms as far away as Shanghai or Auckland. This makes accessing the data very tricky, and it would be extraordinarily difficult and inefficient to try and migrate the data to a centralized location.

Europe is Setting the Example

Despite the scale of the problem, there are data-sharing projects being undertaken that aim to create secure data exchanges. One project, called Gaia-X, is currently building a marketplace where data can be exchanged with the oversight of strict European data privacy laws. Backed by the European Union, Gaia-X not only bills itself as a way for data to be safely shared across industries, it envisions itself as a repository for large catalogues of data that can benefit research towards AI, data analytics, and the internet of things (IoT). The platform is built upon open standards with cloud native and democratizes access to data for both industry experts and common users alike.

By adopting the Gaia-X framework -or something similar to it- personal data could be pulled from the various IT landscapes and tech stacks and then be converted, standardized, and democratized for data transparency at scale. From data scientists to common users, everyone would know exactly what data they have, how to access it, and how to put it to use.

Understanding the Data Better

Beyond consolidating and standardizing personal data, industry leaders should also re-examine the vetting process used when selecting data for training AI and machine learning models. In order for AI and machine learning to realize their full potential, businesses and data scientists themselves need a better understanding of the data that drives their continued advancement. How does the data shape the way the models make their decisions? Could they develop a bias? Is the data correct and untampered with? If not, how will that affect their decisions? Adopting a more standardized and centralized approach to collecting, storing, and sharing personal data would help data scientists answer some of those questions.

Laying the Groundwork for the Future

This is obviously not an issue that can be solved overnight, but businesses and governments alike share the responsibility in laying the groundwork for a long-term initiative. Many industries are highly protective of their data and may resist parting ways with it. Likewise, many people may be unnerved by having their data consolidated to a single location where it could potentially be the target of a breach. But projects like Gaia-X offer heightened security, complete transparency, and improved data sharing for research purposes. All of this is backed by strict regulations and government oversight, protecting all involved parties. Creating a common ground where data can safely be exchanged is the crucial first step towards leveraging the data revolution to work for everyone.