Data Persistence

What Happens Online, Stays Online

What happens to information that you delete online? Does it simply vanish never to be seen again? Probably not. What happens online, stays online. Computer scientists refer to this as the persistence of digital data.

As discussed in the Representation module, digital information is easy to copy. See a digital image you like on a website? Right click on it, choose “Save image as” and now you have possession of a digital copy of that image that was posted online. This strategy applies to everything posted on the web from images to tweets, and even webpages.

Common misconception: Information posted on a webpage can be truly “deleted.”
  • Not really. Once a page is publicly available, it is potentially indexed by search engines, archived or cached by a service like the Wayback Machine, and downloaded/copied by an individual. Although you can then remove your original page, these other representations may exist independently.

For example, McDonald’s Corporation has changed its website many times in the past 16 years. Yet here is a copy of its 1996 homepage:

Data persistence can be very useful, and very annoying. Some information we would like to persist forever. For instance, if all of a sudden, Wikipedia went down, we hope that not all of its data would be lost (and the years of work put in by people to create it) because of data persistence. However, the embarrassing baby photo that your mom posted to Facebook of you? Data persistence might be annoying in that case.

Who Owns Your Data?

Herein lies a concern. When users sign up for certain online services like Facebook, Twitter, YouTube, etc., they are also turning over much of their digital data to the company controlling their account. Actually, this is, in essence, how and why these companies exist. If nobody posted information to Facebook, what would be the point of belonging? The service only has value because users give it data. However, once given, the service is free to use that data however it sees fit, as long as it is in accordance with its posted policies and user agreements.

Read the following text from Facebook’s Help Pages about deactivating or deleting personal pages and think about the following two questions:

  • Which option is more likely to retain your personal data?
  • Does either option guarantee permanent deletion of everything you’ve created/posted?

The language of “deactivation” states that (all) information is saved “just in case you want to come back to Facebook at some point.” This indicates that Facebook retains all of your information, just sets some of it to be publicly inaccessible. Data persistence at work, and for good! Maybe you’re just taking a vacation from Facebook and plan on coming back. Awesome job, Facebook!

However, “permanent deletion” states that “most personally identifiable information” is removed and that material posted “may [be] retain[ed] in [their] servers for technical reasons.” Although you have deleted your account, Facebook still retains some portion of your data, such as your photos, friends list, and more. By posting information to Facebook, you are turning over some semblance of ownership of that data to them. So, if users give up some ownership of their own data when they use or provide it, why do they still continue to do so?

Benefits of Data Persistence

Like so many other things, data persistence has pluses and minuses, including trade-offs in privacy and utility that all responsible Internet users must consider.

  • Digital data is immune to generation loss, as multiple identical digital copies of photos may exist at any given time. These digital copies can be copied further and be maintained by many entities.
  • A service like Flickr typically maintains redundant backups of all its data. The data themselves (photos, video, etc.) can exist through these means even after the actual camera and/or storage device that originally captured them are destroyed.

This means that we gain utility from posting online. By posting to Facebook or Flickr, not only are we able to share with our friends and the online communities, we are protecting this digital data from accidental deletion.

These benefits don’t even begin to approach the social, scientific, and economic benefits gleaned from users providing data to entities and of data persistence. You will discuss these types benefits more during the Privacy vs. Utility debate.