you're reading...
AI, Coding, Data journalism

AI for data storytelling

Image from Funes, OjoPúblico

Artificial Intelligence is already being used in data journalism. For a field which is obsessed about trying to automate tedious tasks, AI is custom made.

Data storytelling and journalism have always been at the forefront of technology, first to adopt the newest gadgets and techniques. When VR devices launched, data journalists at the WSJ designed a VR data rollercoaster; when drones became widespread, journalists such as Matt Waite started using them to tell visual stories; when AR was invented, the New York Times showed you pollution levels in your living room (and here’s another VR project we worked on with Accurat too).

Artificial Intelligence in data journalism projects often showcases some of the most imaginative aspects of how to use new tools to perform analyses that just weren’t possible before.

Often the AI is used to categorise images and texts, maybe social media posts or thousands of news reports. A human couldn’t possibly have time to read through those and would make mistakes. And mistakes can be made as AI can be dumb too, but the point is that it is getting better all the time.

My favourite quote from British journalist James Cameron really tells the story of the time we’re in:

Once upon a time the world was a realm of unanswered questions and there was room in it for poetry. Man stood beneath the sky and he asked “why?”. And his question was beautiful.
The new world will be a place of answers and no questions, because the only questions left will be answered by computers, because only computers will know what to ask.
Perhaps that is the way it has to be.

James Cameron, 1969

What Cameron didn’t know was that data journalists would be the ones to answer those questions now; they just weren’t able to find out the answers before AI was there to help them. That human factor leads to some really powerful work. Here is a global selection, many of them Sigma Data Journalism Award winners. Which ones would you highlight?

Worlds Apart

NRK Norway

Worlds Apart from NRK

What happens when journalists use AI to investigate TikTok’s algorithms and how they affect videos of the war in Ukraine? The result is this project by NRK, which sent robots to investigate. They used AI to look for specific keywords from images they had collected and then programmed a bot to look through videos for the images that the AI recognised. You can read more about the project here.

What the 1921 Tulsa Race Massacre Destroyed

New York Times, US

This project explained the horror of the Tulsa Massacre of 1921 when an entire black community was burnt to the ground by white rioters. The team used AI to reconstruct what this thriving community had looked like before from vintage maps and building height data to create a a powerful experience. You can read more about how it was done here.

Zones of Silence

El Universal, Mexico

How do you search for something that isn’t there? That is the issue El Universal wrestled with this project, which used natural language processing to analyse thousands of news stories and work out where the coverage gaps were in Mexico’s reporting of drug cartel murders. Full disclosure: this is a project I worked on.


OjoPúblico, Peru

OjoPúblico in Peru built their own algorithm to identify the potential for corruption among public contracts in the country. The system was built by statisticians, developers and programmers to comb through thousands of public records looking for risk factors. There’s a detailed methodology here.

Hot Disinfo from Russia

Texty, Ukraine

Texty have become innovators in using AI for journalism – and you can hear journalist Anatoly Bonderenko interviewed here by us for the Data Journalism Podcast live from the front lines of the war there. The team conducted natural language processing to analyse the propaganda war being fought in Europe by looking at over 3,000 pages or pieces of content a week. They built the tool themselves – and made it publicly available as an open source download.


PODER, Mexico

This massive open data project takes 4m+ Mexican government contracts and then puts them through a hefty algorithmic analysis to create a searchable dataset analysed by their very own ‘Groucho’ analysis engine. You can read more about the project here.

The Troika Laundromat

Organized Crime and Corruption Reporting Project (OCCRP)

You have 1.3 million leaked transactions from 238,000 companies – that is a hefty and almost impossible dataset. So, what do you do? The answer if you’re the OCCRP is to build your own AI data management system to look for patterns among the thousands of PDFs, CSVs and Excel files to revieal more than €26 billion in transfers out of Russia tracked over a 7 year period to expose a a complex financial system. The project brought together the OCCRP plus The Guardian – UK, Süddeutsche Zeitung – Germany, Newstapa – South Korea, El Periodico – Spain, Global Witness and 17 other partners who can be viewed here. You can find out more about the project itself here. You can play with the OCCRP’s Aleph system yourself here.

About Simon Rogers

Data journalist, writer, speaker. Author of 'Facts are Sacred', from Faber & Faber and a range of infographics for children books from Candlewick. Edited and launched the Guardian Datablog. Now works for Google in California as Data Editor and is Director of the Sigma awards for data journalism.


No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

About me

Data journalist, writer, speaker. Author of 'Facts are Sacred', published by Faber & Faber and a new range of infographics for children books from Candlewick. Data editor at Google, California. Formerly at Twitter, San Francisco. Created the Guardian Datablog. All opinions on this site are mine, not my employers'. Read more >>

Free to share

Creative commons

Please share me around. Everything here is free to use under a Creative Commons Attribution-NonCommercial 3.0 Unported License

Follow me on Twitter

%d bloggers like this: