Data is everywhere: from governments publishing billions of bytes of the stuff, to visual artists creating new concepts of the world through to companies building businesses on the back of it.
And everyone wants to be a data journalist too – the barriers for entry have never been lower as free tools change the rules on who can analyse, visualise and present data. Truly, anyone can do it.
At the same time, journalism has undergone a transformation; it’s not that long ago that the only way to get a story published by a major news organisation involved years of training and interning and generally slaving away until you get noticed and published. Now, the power has shifted and the days when journalists could shut themselves away from the world in order to hand out gems of beautiful writing have well and truly vanished.
These are the days of open journalism, reporters who can use the power of the web can produce stronger, better stories. Open journalism involves the person reading and commenting on the story as much as the original reporter, and with the power to shape and influence the news they see in front of them.
But how does that connect to data journalism? These are two segments of the same pie chart – and for data journalism to develop beyond just being the latest fad, it has to engage and involve the people reading the news as well as creating it.
Data journalism is not (just) about being clever and showing the world how clever you are. It has to be about more than that.
It’s important because data journalism has its roots in publicly available data. As soon as data.gov launched in 2009 it didn’t matter how good or bad it was – the principle had been set: all government data must be public, and available in a form you can use. Since then, the world has been deluged in open data, with cities, states and regions around the globe publishing everything from detailed crime statistics to the locations of public toilets.
But having an open data portal doesn’t automatically make you a haven of freedom – even Bahrain and Saudi Arabia now have open data portals. As this prescient article by David Eaves in Slate points out. He says
For many of us who have campaigned for the right to access and reuse government information, it would be easy to pause and relish the sweet victory. We have the ammunition, so now, believe the most techno-utopian advocates, open data will fundamentally change politics—depoliticizing debates and eliminating irresponsible positions. But that would be a mistake.
This is where data journalists come in – by exposing and interrogating the data, we can test how accurate it is, mash it up with other datasets to produce results that tell you something new about the news.
Because, traditionally journalists have treated data with a kind of breathless trust which they would never accord a human source. Numbers are trusted, because investigating them is too scary. Former BBC reporter Michael Blastland, examined the norovirus – or winter vomiting bug – outbreak of 2008, showing exactly how easy it is to get the numbers wrong. The story was that three million people had gone down the previous year with the disease.
He looked at the confidence intervals – the guide to how reliable these numbers were – and realised that the number could just as easily be 280,000. Or even 34 million. The truth of the story was, nobody knew, but the story had been written up anyway.
There are historic examples of great journalists adopting data too, of course; Philip Meyer, the US journalist behind the Detroit project which inspired our Reading the Riots project – a reporter who used data to create classic new forms of journalism.
But this is not a niche form of reporting anymore.
The people that seem to care about the numbers are out there in the world: our readers. Arguably most of the million or so people who read the Datablog each month are predominantly general-interest readers; not developers, designers or even other journalists. They want the numbers because they want to trust the report, to test its veracity and sometimes to see what else they can find in the data.
Because everyone is an expert in something, that engagement can mean fascinating results – including us improving our map colours on the blog with the help of our readers.
However, plenty of data is still closed, and we in the media are part of this process. Often it’s more likely that the data will be alluded to in a story or interactive and you will be kept away from it.
It’s as if to say: “look – we made this fantastic interactive guide, you couldn’t possibly have anything to offer. So long, and thanks for looking!”
The raw data of key national and international events is often witheld – in favour of snazzy presentations of the information without allowing you access to the data behind it to download for yourself. Immediate and detailed election data is one example, supplied by paid-for feed and certainly not available for immediate download. Or there’s live Olympic results – you might be able to see an athlete winning a gold before your eyes, but is that raw data available for you to visualise and experiment with?
News organisations may be campaigners for open information but by withholding that data, become complicit in a system which essentially keeps data private until it’s no longer commercially valuable. It’s all very well calling for governments to throw open the doors of their data vaults, but if you are not willing to be open too, what is that worth?
Open data assumes the readers are an integral part of the story – open data journalism does the same. By publishing the data in an accessible form, making it available to whoever wants it, suddenly our journalism is stronger and better. So, this is our ten point guide to how it can work:
1. Expose the data behind the story
The best journalism reveals something new about the world – and feels timely and of the moment. Data journalism works when it forces itself into the agenda and makes the news. Timely means people will care about it enough to try and become part of the story.
2. Provide the key data people need
It’s a mess out there. As a reader, how can you find the key dataset you need at that moment, data that’s not too old or too unreliable. That’s where data journalism’s role in curating the key numbers can come in. The skills of research are a key part of being a reporter and the editorial role of selecting the key data mean you can help readers find what they’re really looking for. Journalists can be the bridge between the providers of data and the consumers – and interpret that data to bring it alive but also to test and check it.
3. Make it personal
All data is personal at some level – and the best interactive and visualisations allow users to see how the numbers reflect their lives where they live. As more and more granular data is released about tiny geographies, by making it personal, we can bring it to life.
4. Anyone can do it
They really can – often it’s the skills of knowing that something is a story or not that becomes most important. There are enough free tools out there to ensure that visualising and analysing the data is simple and easy.
5. Make our data open
What is open data? It is data published in a machine-readable format that anyone can use. That excludes PDFs which are – as Stephen Messer says – where data goes to die. By making our data open we mean producing it in a form people can use, whether it’s CSV, excel or even RDF.
6. “Do what you do best, and link to the rest”
Jeff Jarvis said that and the principle is simple: there’s bound to be someone out there doing something amazing – why not be open enough to embrace that?
7. Free data – now
It’s not enough to just aggregate data anymore – what about the raw, real-time live data behind everything from public transport to election results.
8. We’re not the experts
We can’t be experts in every aspect of life – why not try and engage those who are make them part of our process?
9. Make big data accessible
As the datasets that we can explore get bigger, it’s our job to make them smaller and simpler to understand.
At the end of the day, it’s all about the stories.
This is an edited version of a talk given today at Scoopcamp, Hamburg
This piece was first published on the Guardian Datablog