What is metadata & what can it reveal about you?

Understanding the raw material of digital surveillance

11 mins Read
Network-1.svg

Maybe you don’t know much about metadata, but it knows a lot about you.

In the age of artificial intelligence (AI), metadata is the raw material of mass surveillance. It is collected to discover and track everything we do online: with what or whom we connect, when, from where, and how often. From metadata emerges long-term patterns in our digital life. These patterns can be discovered and used by anyone with the technical means to collect and analyze enough metadata.

Here’s everything you need to know about metadata so you can take steps to better protect your privacy online.

What is metadata?

Metadata simply means “data about data,” or information about information. Any digital asset has metadata: an image file on your computer no less than an encrypted message sent to a friend.

Think of a personal image file which is password protected so only you or trusted parties can view it. While the content of the image might not be accessible, information about the file itself is still visible: its size in MBs, whether it’s a .jpg or .gif, its location on your harddrive, even the date in which it was created or last modified.

While these details might sound unimportant, when it comes to communicating privately online, the consequences of metadata are more serious. Even when our communications are encrypted, there is still visible information about it that can be accessed. The question is how can this metadata be accumulated to know who we are and what we do?

Types of metadata

There are many types of metadata depending on the data or file type, as well as the software, system, and network being used. Let’s focus on data in transit: when we send digital information across a network, like when we send an email or request to access a website. Metadata for network traffic includes:

  • IP addresses: reveals multiple pieces of metadata, including the locations, devices, and ISP information about who is sending what to whom
  • Sizes of data packets: e.g., how many MBs a sent file is, or the total number of data packets sent to a given recipient
  • Timing signatures: when the data was sent and received, as well as how long a connection is made (e.g., time spent on a website or video call)
  • File types: if unencrypted, it might be visible whether what you’re sending is, for example, a .txt or .jpg image, not to mention what the content of the file is
  • Encryption type: the cryptographic protocol protecting your data can be identified through unique signatures, which can be the basis of censorship surveillance blocking access to information via a VPN

Can metadata reveal my identity?

Metadata like your IP address does not directly reveal your name or address. However, it is the prime information which is used to track people online. In many cases, it can directly link your long-term activity online directly to you via additional information gained, for instance, from your Internet Service Provider (ISP). But this is the least worrying part.

The scary thing about metadata is that once it is aggregated in mass by AI surveillance systems, it can reveal a lot more than our names. With information gained from our ISPs, it can link our name to everything we have done online, and even to what we are predicted to do based on algorithms.

What does metadata reveal about what I do online?

Take it from the mouth of the NSA’s General Counsel, Stewart Baker:

“Metadata absolutely tells you everything about somebody’s life. If you have enough metadata, you don’t really need content.”

When metadata is accumulated over time and in mass, it can reveal even more than decrypted content would.

  • Location histories. Your IP address* shows your proximate location when you make a web connection. Over time, this data can show your precise movements in the world. Metadata from apps like Google Maps give Big Tech companies the ability to keep records of everywhere you’ve ever been via geolocation tracking.
  • Connection histories. Tracking your IP address leads to detailed profiles of your browsing histories, including web services you’ve visited and transactions made.
  • Patterns of communications. Detailed analysis of connection histories can reveal patterns about your life, thoughts, habits, and desires. Records of the fact that you regularly visit an online health clinic might be used to deduce that you have a serious illness. Metadata of public transactions through a crypto exchange can lead to information about your financial assets and network. And algorithms can easily identify your political beliefs and opinions.
  • Known correspondents. While the content of our communication should be protected by encryption, metadata can give third parties a virtual phone book of our known contacts and associates, as well as the ability to discern who our close friends are and the histories of when and where we’ve talked.

Hear Nym’s Chief Scientist, Claudia Diaz, describe the importance of metadata.

*Note that we may likely have many IP addresses over time, depending on devices used or dynamic ones assigned by a network. And the use of a VPN will assign our traffic the VPN’s public IP while connected.

Who is tracking my metadata?

Whenever you do something online, it’s best to assume you are being tracked and surveilled in some way. Here’s who is certainly doing it in rough order:

  • Your ISP. An Internet Service Provider (ISP) is what enables people to access the public web. It is thus the first point of contact our traffic makes when we make a connection online. ISPs have access to the metadata details of all our activities unless otherwise protected with a VPN or proxy. ISPs keep logs of user traffic and are what are responsible for enforcing censorship restrictions at the behest of governments.
  • Big Tech companies. Big Tech companies like Google, Meta, and Apple are by far the biggest collectors of people’s metadata given the number of people using their devices and apps for all daily activities.
  • Governments. As the Snowden documents revealed, governments, law enforcement agencies, and intelligence agencies have powerful and global surveillance systems tracking almost everything we do, not just online but also through the metadata of phone calls and messages. Historically this information has been used to target individuals with and without legal warrants. Metadata continues to be the prime means in which people worldwide are prohibited from accessing censored information online.
  • Your VPN. Virtual Private Networks (VPNs) are tools used to protect data and metadata from being seen by ISPs to, for instance, avoid censorship. However, centralized VPNs are responsible for handling all of your traffic and thus fully capable of keeping records of user metadata over their network and linking you directly to what you do online. Some shady free VPN services make their money by selling this information to third parties, or by installing third-party cookies to track your metadata for commercial purposes.
  • The website you visit. Almost all websites track users based on their metadata. Sometimes this is to optimize site performance for visitors, like remembering a desired login credential. But more often than not, people’s activities on-site are recorded to be used for marketing or commercial purposes.
  • Data brokers. Data brokers make up a clandestine market of commercial entities which buy and sell mass amounts of user metadata from websites and ISPs. This data is aggregated to analyze people’s behavior patterns, profiling them in order to sell this information to third parties like advertisers, or even political parties and governments.
  • Advertising agencies. Consumer capitalism is fueled by metadata. Advertising agencies and other marketing enterprises are regularly buyers of mass metadata records, especially those compiled and analyzed by AI systems deployed by data brokers. Metadata gives companies detailed market trends of people’s desires and habits online, as well as their geolocations.
  • Hackers and cybercriminals. Metadata tracking is an important tool for cybercriminals to conduct major acts of fraud and theft. By accumulating details of people’s personal and work lives, for example, phishing acts can be tailored to convince people to disclose personal information. And metadata of financial transactions, including crypto, can target wallets for hacking.

How does AI affect metadata surveillance?

AI has many potential functions. But at their core, AI programs are surveillance systems. They compile mass amounts of human intelligence of all kinds to learn from us, even personal data that we might now realize was publicly available in the first place. AI excels at processing metadata where human-run analytics might struggle.

Metadata is very lightweight compared to payload data (encrypted contents), making it amenable to being analyzed in bulk by AI machine-learning systems. AI is making possible a form of surveillance that was previously too time-consuming and expensive. The job of AI surveillance is to find patterns in noise, such as a network busy with traffic. Metadata outlines all of our patterns in precise ways.

Can a VPN protect my metadata?

Most VPNs do not provide significant protections of metadata because they are centralized, single-server infrastructures. This means that they are designed to simply hide 1 piece of metadata: your IP address.

Centralized VPNs compromise your privacy

Centralized VPN services have a major vulnerability: while they might hide your IP from a website you’re visiting, the VPN service can view both (1) your true IP and (2) that of your connection. This means that despite encryption, the VPN company can link you to what you do online via your metadata.

People must essentially trust that the VPN service will not mishandle their data by keeping centralized logs of their traffic, leaking it through poor security protections, or releasing records to government, law enforcement, and censoring authorities if requested.

You can still be tracked using a VPN

Traditional VPNs like this can do little to defend users against AI surveillance of a network.

By observing the VPN network, including advanced traffic analysis techniques and end-to-end correlation.

NymVPN

NymVPN was designed by scientists, activists, and specialists in metadata surveillance to do what other VPNs don’t: actually protect people’s patterns of communication online against all forms of surveillance. Accomplishing this requires network technology capable of scrambling metadata in transit to the point where it becomes unreadable to AI surveillance systems, and thus unlinkable to us.

Decentralized routing

Whether you choose to use NymVPN’s Fast Mode with AmneziaWG or the mixnet Anonymous Mode, your traffic will be routed through a decentralized network.

Noise against metadata surveillance

All surveillance seeks to find patterns in the noise of a network – like the game Where’s Waldo?, it’s a question of parsing out and ignoring the irrelevant information to find what matters about a target.

To fight surveillance, NymVPN takes a lesson from the playbook: add enough noise to the network that patterns are too difficult to discern. With NymVPN’s Anonymous Mode, this includes 3 types of network noise:

  1. Cover traffic. Empty “dummy” packets are regularly sent through the network with your real data packets to increase the anonymity set of the whole network. The bigger the crowd (of indistinguishable data packets) passing through the network, the more anonymous everyone is.
  2. Data mixing. As your data passes through mix nodes, they are randomly mixed together with the data packets of other users. This ensures that as packets leave a server, they cannot be easily corresponded to you through traffic analysis techniques.
  3. Timing obfuscation. A result of data mixing is that the course of packets through mix nodes cannot be tracked based on timing analysis of first-in, first-out. All data packets leave in a random order.

NymVPN Fast Mode

Don’t need this level of anonymity for everything? No worries, simply choose Fast Mode in-app for decentralized protections that other VPNs fail to provide, but without the added fuss of noise. You’ll still benefit from better IP address protections without centralization and linkability than all other VPNs, and will be able to bypass censorship surveillance and restrictions.

Share

Keep Reading...

Privacy-1.svg

What is Internet privacy & why you should care

Our privacy online is under threat, but there is a lot we can do to protect ourselves

12 mins read
Privacy-1.svg

Can you be tracked while using a VPN?

VPNs are great privacy tools, but you can still be tracked. Choose the right type of VPN to avoid it.

12 mins read
Privacy-1.svg

What is encryption? (A comprehensive guide)

Explaining the technology behind online data security, and its limits for privacy

11 mins read
Privacy-1.svg

Who is tracking your internet activity, and why?

Your every move online is being tracked. Decentralized VPNs can better protect our privacy.

10 mins read
HERO NEW1.svg

Introducing NymVPN

Experience the world’s most private VPN. Advanced privacy built for the age of AI, starting at $5.49 / month for up to 10 devices. Get NymVPN today and save up to 60%.

Artboard 1.svg