We need a Data Journalism Archive. Before it becomes just another 404 error

404 page used by Bedmap.com. Which doesn't appear to work anymore — 404 page used by Bedmap.com. Which doesn’t appear to work anymore

Are we about to enter a dark age of data journalism?

The internet has made it possible to see the world’s information without moving a muscle, not matter how old that information is. You can absorb the first news page of The Guardian, from May 1821, which had data journalism at its heart, even then.

And the web has revolutionised online journalism so that the way we consume the news changes daily; and the basics of modern data journalism are grounded in that ability to visualise that data in more and more sophisticated ways.

It has also made the archiving of news content easier. In the past, archivists in each organisation would preside over rooms full of old clippings and background information. The web made that process straightforward: everything would be archived online and those collections of the past would even become sources of present-day content, such as the New York Times’ archive, which is regularly sourced and raided by both academics and journalists.

But data journalism is not part of these archives. Much of it has become a victim of code rot – allowed to collapse or degrade so much that as software libraries update or improve, it is left far behind. Now, try and find examples of this work and as likely as not you will end up at a 404 page.

Philip Meyer's work on the Detroit Riots — Philip Meyer’s work on the Detroit Riots

Data journalism itself also has a long history, certainly predating 2009. You can see it in the first fall of Abraham Lincoln or in the work of Philip Meyer in investigating the causes of the Detroit Riots. But the thing is: you can still see that work. Created in Print before the word ‘interactive’ had even been coined, it is kept so you can use it as inspiration. Without Precision Journalism, would Reading the Riots even have existed? The past (by which I mean less than five years ago) has a lot to teach us about the way we work today.

Paul Bradshaw has collated examples of modern data journalism, asking “is there a canon of data journalism?”. And while Minard’s famous chart or Florence Nightingale’s “butterflies” still exist, it is striking how much of it has vanished forever. This is just a sample:

ChicagoCrime.org

The progenitor of interactive news databases in the form you can see them at places like ProPublica, started by the godfather of modern data journalism, Adrian Holovaty (This May was its 10th anniversary). For years it produced just a 404 page. Now it links to a tiny section of EveryBlock.

The US Congress Votes Database

Frozen on a Feb 2014 vote, a tiny ‘this page has been archived’ note at at the top an inadequate replacement for a project that has no equivalent now.

MPs expenses

It would be possible to fill this article with examples of things I worked on that no longer work at The Guardian — the World Data Explorer or the Libya bombing interactive, for instance. But I’ve chosen this because it was the first large-scale news room crowdsourcing exercise, switched off because it couldn’t be maintained. Now no-longer even viewable.

Fixing DC’s Schools

Another pioneering piece of work from Holovaty — a forerunner of apps that are now commonplace among local media sources — the front page leads nowhere and the interactive design doesn’t work anymore.

Represent, from the New York Times

This page has been ‘about to relaunch’ for some time now (some say years). The site originally let people in New York find their congressmen and track all sorts of things about them.

It’s an issue across journalism. In The Atlantic last week Adrianne LeFrance wrote how key pieces of journalism are disappearing from the web

“If you want to save something online, you have to decide to save it. Ephemerality is built into the very architecture of the web, which was intended to be a messaging system, not a library.”

If even Pullitzer prize winning articles are at risk, where does that leave everyday data journalism?

At the same time, for many publishers every word, no matter how facile or pointless is saved as if it were a work of studied genius. This is the fantastic thing about archives: they give you a picture of a world from the past, one that can shape how you produce the future. But it’s only the words that are saved. Meanwhile a map, interactive guide or even just a set of interactive charts will vanish as if they never ever existed.

Data journalism, at its best, bridges the gap between those who have the data and those who want to understand it. It raises data from the prerogative of the few into the consciousness of the many. It can change the world, illuminating that which others would rather keep secret and misunderstood.

But if we’re not careful, this golden age of data journalism will only be remembered in a few animated gifs, texty analysis pieces and CSV downloads. Data will have returned to those who always owned it in the past; the rest of us will have to keep reinventing the wheel.

Nobody says archiving is easy, but what will be left otherwise? This article will. Plainly ironic: an article about the disappearing web left to survive. As will long academic and dry pieces of data analysis. But the apps, charts and visuals that bring them to life? They will vanish as if they never existed.

It’s time for a Data Journalism Archive. Before we forget everything we know.

5 responses to “We need a Data Journalism Archive. Before it becomes just another 404 error”

How to Plan a Responsible Digital Death | The Engine Room

August 3, 2016 at 5:53 pm

[…] piece outlines, it’s not only tech that’s going out of date. Simon Rogers also writes about how databases powering data-driven journalism stories are getting taken down, leaving a spate of 404 error pages in their […]

Chi abbandona la scuola? (I dati, cinque anni dopo…) – datajournalism.it

June 14, 2016 at 11:47 am

[…] di siti e strumenti, che sembra in sintonia con quanto predetto da Simon Rogers con il post “We need a data journalism archive. Before it becomes just another 404 error” sulla durata dei nostri lavori di dati nel mondo web, pubblicato a ottobre […]

We need a Data Journalism Archive. Before it becomes just another 404 error. – Vox | Stylish gadget shop

November 9, 2015 at 8:06 pm

[…] Simon Rogers is a data journalist and has worked at the Guardian, Twitter, and now at Google. This piece was originally published on his blog. […]

Mark Graham

November 2, 2015 at 11:23 pm

Thank you for highlighting this important issue Simon. The Internet Archive has been capturing and storing millions of websites for the past 19 years. You can see it in action via the Wayback Machine here: http://archive.org/web/

digidickinson

October 20, 2015 at 10:17 pm

This is so true. A lot of the innovative story telling easily falls by the roadside. I remember when the Rocky Mountain News closed and their amazing Final Salute audio slideshows disappeared.

5 responses to “We need a Data Journalism Archive. Before it becomes just another 404 error”

Leave a comment Cancel reply

Get the Book

Thank you for your response. ✨

We need a Data Journalism Archive. Before it becomes just another 404 error

ChicagoCrime.org

The US Congress Votes Database

MPs expenses

Fixing DC’s Schools

Represent, from the New York Times

Share this:

5 responses to “We need a Data Journalism Archive. Before it becomes just another 404 error”

Leave a comment Cancel reply

Get the Book

Thank you for your response. ✨