(Copyright: Getty Images)
Our online history is disappearing at an astonishing rate, creating a black hole for future historians.
The price of computer nostalgia
On January 28 2011, three days into the fierce protests that would eventually oust the Egyptian president Hosni Mubarak, a Twitter user called Farrah posted a link to a picture that supposedly showed an armed man as he ran on a “rooftop during clashes between police and protesters in Suez”. I say supposedly, because both the tweet and the picture it linked to no longer exist. Instead they have been replaced with error messages that claim the message – and its contents – “doesn’t exist”.
Few things are more explicitly ephemeral than a Tweet. Yet it’s precisely this kind of ephemeral communication – a comment, a status update, sharing or disseminating a piece of media – that lies at the heart of much of modern history as it unfolds. It’s also a vital contemporary historical record that, unless we’re careful, we risk losing almost before we’ve been able to gauge its importance.
Consider a study published this September by Hany SalahEldeen and Michael L Nelson, two computer scientists at Old Dominion University. Snappily titled “Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?”, the paper took six seminal news events from the last few years – the H1N1 virus outbreak, Michael Jackson’s death, the Iranian elections and protests, Barack Obama’s Nobel Peace Prize, the Egyptian revolution, and the Syrian uprising – and established a representative sample of tweets from Twitter’s entire corpus discussing each event specifically.
It then analysed the resources being linked to by these tweets, and whether these resources were still accessible, had been preserved in a digital archive, or had ceased to exist. The findings were striking: one year after an event, on average, about 11% of the online content referenced by social media had been lost and just 20% archived. What’s equally striking, moreover, is the steady continuation of this trend over time. After two and a half years, 27% had been lost and 41% archived.
This is just one investigation, and a preliminary one at that. The figures, though, suggest a clear linear trend: the loss of just over 10% of the resources shared via social media each year, even when archiving is taken into account, or around 0.02% of this content lost every day.
This isn’t the same thing as Tweets themselves vanishing. For those wishing to analyze exhaustively trends within social media utterances themselves, services like Gnip – which, for a fee, promises “complete and comprehensive access to every publicly available Tweet dating back to the very first Tweet from March 21, 2006” – offer an unprecedented “fire hose” of data, from which marketing and research firms are already gratefully guzzling.