you're reading...
Data journalism

Introduction to data journalism

This has been the first week of the free data journalism MOOC, with more of the course still to come over the next few weeks. This is the text of the first part of my module. It’s not too late to sign up for the rest of the course when the real detail of learning data journalism will be taught. And, become part of an amazing community of over 21,000 people. 

So, what is data journalism? Ask a hundred data journalists and you will get a hundred different responses. This is my three-part take:


1) Data journalism is about using numbers to tell the best story possible. It is not about maths, or drawing charts or even writing code. It is about telling stories first and foremost – the maths and the charts and the code are all in service to that

2) You’re no longer thinking solely about words. Instead this is about the best possible way to tell that story.

3) The techniques of data journalism change all the time but they are marked out by an abundance of increasingly more accessible tools that allow sophisticated manipulation and analysis of data.


But most importantly, Data journalists aren’t born that way, they are created.

Back to the future


Imagine working as a reporter in, say, the 1980s. What would the tools of your trade be? A notebook and shorthand, for sure. A cassette recorder for backup – probably a reel-to-reel if you worked in broadcast. You would rely for research on a clippings library, probably staffed by someone who knew the contents from back to front. You have a research query? They would be your search engine. The chances are that you would write up your notes on a typewriter, the results of which would be laid out with the aid of a scalpel, while the pictures that went with it would be cropped with ruler and pencil.


And your output? It was printed on paper.


So, if you were confronted with numbers or statistics, what would you do? The tools of the statistician were not the tools of a reporter. You would be reliant on their analysis, their research and their results.


But somehow – and despite all the barriers – data has always been part of the way that news organizations work. Financial reporting has always been based on an understanding of the numbers. The Wall Street Journal essentially came out a data product: a daily Dow Jones customers’ afternoon letter published in 1883 which was based in turn on brief news bulletins hand-delivered throughout the day to traders at the stock exchange. Those “flimsies” as they are called later were aggregated in a printed daily summary called the “Customer’s Afternoon Letter.” That became the Wall Street Journal.


Sports reporting too has been based on data for over 100 years – the game of baseball is impossible to report without understanding the numbers that surround the game.


And for those worried about the resources involved in data journalism, take the example of John Snow. He may have been a Victorian doctor rather than a reporter, but his work on cholera changed the way the world worked and told a story in a way that everyone could understand.


In the world of the 1850s, cholera was believed to be spread by miasma in the air, germs were not yet understood and the sudden and serious outbreak of cholera in London’s Soho was a mystery.

So Snow did something data journalists often do now: he mapped the cases. The map essentially represented each death as a bar, and you can see them in the smaller image above. Maybe Snow’s map had such a huge impact on its own because it was simply a great data visualisation.


Snow wasn’t part of a giant team of interactive developers. He was a physician working on a hunch. He didn’t just produce a map; it was one part of a detailed statistical analysis. But it changed how we see data visualisations, and how we see microbes.


Since then we have had the rise of what used to be called ‘Computer Assisted Reporting’ and with it incredible investigative reporting, a precursor of where data journalism is now. Just look at work such as Philip Meyer’s investigation into the detroit riots or Clarence Jones of the Miami Herald’s investigation into the criminal justice system to see how journalism was changed by data.


Is it journalism?


Is it journalism? This is reporting facts in a way that people can understand about issues that matter. Back in 2009 one of the founding fathers of data journalism, Adrian Holovaty posed this question:


It’s a hot topic among journalists right now: Is data journalism? Is it journalism to publish a raw database? Here, at last, is the definitive, two-part answer:


1. Who cares?


2. I hope my competitors waste their time arguing about this as long as possible.


You’re here because you think data can be journalism. But as a data journalist your role is about bringing that data to life.


Why has this happened?


1) Tools

In the past data was the preserve of statisticians only because they were the only ones who had the ability to analyze the data at a basic level. Now every computer in the world has excel – and for the small number which don’t there are a plethora of free alternatives: Numbers, Google spreadsheet and open office


2) Open data: governments around the world have published thousands of data points, throwing open the doors of data. Some of these governments are not noted for being otherwise particularly focused on democracy. But they figure, probably correctly, that the amount of data being published is so much it is difficult for most people to comprehend. Data journalism is a way to open up that data and bring it back to the people who are paying for it to be collected


3) Trust: at the same time, there is remarkably little trust in journalists among the public. Providing raw and open data with your stories – essentially being transparent in what you do – encourages trust because you literally have nothing to hide

4) Tools (again) – once you’ve analyzed the data, there are now many free and available tools which cater for every skill level and allow anyone to produce  interactive maps and visualizations: tools such as Google Fusion Tables, Datawrapper. Or even publicly available libraries and tools such as d3. Data can be cleaned with a free tool like Open Refine – which can help you clean messy data. The tools have changed the landscape.


The data leaders

So, how do the world’s data journalism leaders do this? Many of these stories were all winners in the most recent Data Journalism Awards and each represents very different types of data journalism.


  • Just the facts

Le Pariteur by WeDoData uses a quirky graphics,and a quick questionnaire and a lot of information to compare and contrast the salaries of men and women. – See more at: http://appli-parite.nouvelles-ecritures.francetv.fr/

This visual app is simple at one level. It takes publicly available data and brings it together into one visualization which examines an important issue in an accessible way.


Data can be used to help people find the key facts about where they live – which is what Find my school  a site produced by Kenyan site Twaweza does. It allows users to get the key public information about their local schools in an accessible way and came out of the Code for Africa initiative, which embeds technologists into newsrooms / CSOs, with support from external teams of developers, tech incubators, and kickstarter funds, to help rewire the way that civic engagement happens.


News organisations have a long tradition of being ‘papers of record’ with many traditionally opting to reveal full election results or school league tables. This is the modern equivalent.


• Data-based news stories


Data journalism can often bring stories that are in the public eye to life, by revealing the numbers behind the news. Every story will have some data that goes with it.


This interactive guide by Pro-Publica (http://projects.propublica.org/sopa/) is a good example of data around the news collected by reporters and displayed in an accessible way. This list of voting on SOPA is compiled from a combination of legislative data and research to fill out the biographical information and position of each member of Congress. You can read more about how they did it here. http://www.propublica.org/nerds/item/sopa-opera-which-legislators-support-sopa-and-pipa


Election results are a staple of data journalism which provide a rich source of analysis for reporters from political staffers to the likes of Nate Silver. This app from the Associated Press http://hosted.ap.org/interactives/2012/election-trends/  is just one example a news based interactive visualization which has to compress a lot of data into a simple guide to the US 2012 results.


The Guardian Datablog – which I used to edit – also primarily reports the facts around the news, making publicly available data more available. It uses free tools to visualise the data – tools that anyone can use, such as Google fusion tables or Datawrapper.


• Local data telling stories

Data journalism provides a great way for local news organizations with small resources to tell stories that work for their communities. Data is being published by governments


Wales Online reporter Claire Miller’s work children in care won a Data Journalism Award last year. Following reports from a Kent newspaper that children placed in care in the county had been sent to Wales for placements, this project was an attempt to see which councils across Britain were sending children in their care to Wales, and where councils in Wales were sending the children in their care. The information was presented to readers as a front page splash and inside spread for Wales on Sunday, containing expert comment on the issue, as well as an online version with an interactive, that allowed readers to explore where different councils were sending children and where children sent to Wales were coming from. Claire is a classic lone ranger – a data journalist working on her own to create data-driven stories.


Another example of local data journalism is this interactive guide to neighborhoods in Washington DC to illustrate the gap in incomes across the city. It was actually a project of campaigning groups that then tells a story. Led by DC Action for Children, in partnership with DataKind and a group of dedicated pro bono data scientists, the project used both U.S. Census Bureau and local administrative data about the population and resources in District of Columbia neighborhoods. Collaborators obtained data on population counts and social characteristics from the Decennial Census and American Community Survey.


• Analysis and background

Data journalism has a mission to explain the facts behind the news – to be analytical as well as to expose great exclusives.


This story by the Texas Tribune looked at the voting records and personal interests of legislators in the state.


The data application includes extensive research into all 181 members of the Texas Legislature, plus key statewide elected officials. It details everything from a lawmaker’s employment history and financial records to stock holdings, property listings, campaign finance data and ethics investigations. It also contains reporter analysis — compiled over the course of nine months — into legislation filed and votes taken that could conflict with a lawmaker’s personal or financial interests. In other words, a combination of public data and reporter analysis. And all based on Google spreadsheets of data.


This interactive from Brazil’s Editora Abril showed the connections in the country’s scandal-hit ruling politicians and personalities. The work is actually incredibly visual: and interface that shows the connections between characters in every major political scandal in Brazil since 1986. It helps the user by providing comprehensive background and visualizing clearly the complexity of corruption involving politicians, companies and the governments in Brazil. The data was extracted from stories published by Veja magazine.


• Deep-dive investigations

Data lends weight to deep investigations which in turn can create complex news stories.


A great example of this is La Nacion in Argentina’s work around senate expenses.


After finding out that Senate have published expenses since 2004 in raw PDFs, some of them as images and completely unstructured, the LN data team scraped, the data and created a database. This data was the gift that kept giving for LN. The interrogation process provided several front page stories; forced replies from actual and former senate presidents, and caused the reopening of a judicial investigation.


Another example is the work at the Guardian around the Wikileaks data dump and the more recent stories around the NSA and its involvement in monitoring the internet. This database of documents has provided more than one story by being something that can be interrogated.
All of these data journalism stories show how it is such a flexible field. While the projects that I presented are obviously not the kinds of projects that you’re likely to work on as your first story, this course will give you basic skills to practice and start out with smaller things.

About Simon Rogers

Data journalist, writer, speaker. Author of 'Facts are Sacred', from Faber & Faber and a range of infographics for children books from Candlewick. Edited and launched the Guardian Datablog. Now works for Google in California as Data Editor and is Director of the Sigma awards for data journalism.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

About me

Data journalist, writer, speaker. Author of 'Facts are Sacred', published by Faber & Faber and a new range of infographics for children books from Candlewick. Data editor at Google, California. Formerly at Twitter, San Francisco. Created the Guardian Datablog. All opinions on this site are mine, not my employers'. Read more >>

Free to share

Creative commons

Please share me around. Everything here is free to use under a Creative Commons Attribution-NonCommercial 3.0 Unported License

Follow me on Twitter

%d bloggers like this: