Where do journalists post their data? It’s a pretty core tenet of open journalism that you share your sources; i.e. , you write a story about data then you make numbers available to download.
It matters because:
- Your audience is more likely to trust your story if they can test the sources
- Someone out there probably knows more about your story than you do — and can help make it better
- Your story can be improved upon and replicated
- Your data can be tested by the community for errors
- It encourages data visualizations of your work which you may not have the resources to do for yourself
So, you’d assume that with so much data journalism going on out there that we have record amounts of data curated ready to be downloaded and used. I’ve written before about the importance of the availability of the data behind these stories and this piece started as a set of links for me to use, with these points being the most-important:
- What’s the link? Is it a special interface to the datasets, or a Github repo up-front? Github has become de-rigeur for reporters as a storage centre. But there’s a difference between just linking direct to a Github page, which is less than user-friendly to the amateur, and creating a special interface that’s easy to use. It would be interesting to know if the enthusiasm for opening up data is directly proportional to how easy it is to access.
- How up-to-date is it? When was the datastore last updated and how complete is it?
- How many datasets are there? Without knowing exactly how many articles or pieces of work are covered it’s not possible to know what proportion of each site’s data journalism involves the data itself being published too.
You can read a bit more about open data news sources on The Source too.
This is a work in progress, but here’s the list so far (in alphabetical order):
538
Data link
Github page? Yes
Last update: September 25, 2014
Number of datasets: 26
Description: “This repository contains a selection of the data — and the data-processing scripts — behind the articles, graphics and interactives at FiveThirtyEight. We hope you’ll use it to check our work and to create stories and visualizations of your own.”
Buzzfeed
Data link
Github page? Yes
Last update: September 05, 2014
Number of datasets: 7
Description: “An index of all our open-source data, analysis, libraries, tools, and guides.”
Guardian Data
Data link
Github page? No (front page based on ScraperWiki scrape of Google Spreadsheets. Either it’s not working or no new datasets have been published since 2013)
Last update: June 5, 2013
Number of datasets: 800+
Description: “Lost track of the hundreds of datasets published by the Guardian Datablog since it began in 2009? Thanks to ScraperWiki, this is the ultimate list and resource. The table below is live and updated every day – if you’re still looking for that ultimate dataset, the chance is we’ve already done it.”
Full disclosure: Up until April 2013 I edited the Guardian Datablog. This data front page was created by the great @ChrisCross_UK, who has also left the Guardian.
Huffpost Data
Data link
Github page? Yes
Last update: July 08, 2014
Number of datasets: 3 (plus lots of code)
Description: None
La Nacion Data
Data link
Github page? No
Last update:
Number of datasets: hundreds
Description:
Propublica
Data link
Github page? No (mixture of free FOIA datasets, links to original data or premium datasets behind investigations)
Last update: June 2014
Number of datasets: 12
Description: “ProPublica is making available the datasets that power our data journalism. The raw data we received as the result of a FOIA request is available for free, and datasets that reflect substantial cleaning and processing by our staff are available for a one-time fee.”
The Upshot
Data link
Github page? Yes
Last update: September 09, 2014
Number of datasets: 9
Description: “A New York Times website with analysis and data visualizations about politics, policy and everyday life.”
Discussion
Trackbacks/Pingbacks
Pingback: Demystifying Data Journalism: Getting Started | - October 6, 2014
Pingback: Data Viz News [71] | Visual Loop - October 4, 2014