//
you're reading...
Data journalism, Data visualisation

A conversation with Stephen Few about data visualisation. Kind of

Tableau's new word cloud tool

Tableau’s new word cloud tool

UPDATED WITH LATEST DISCUSSION – scroll down to see them

Stephen Few is a bit of a hero to those interested in data visualisation and his blog, Perceptual Edge is required reading for anyone interested in the field. Yesterday he published a cogent and fascinating piece about the latest version of Tableau, Tableau Veers from the Path. It’s great reading (although not so much for the Tableau guys) and it says a lot about the present state of infographics.

But one paragraph leapt out at me:

” When did Tableau, which was originally developed for visual analysis, become a tool for creating impoverished infographics? Did they add this feature to satisfy one of their prominent UK customers, the Guardian? Whatever the reason, with the addition of word clouds, how many of Tableau’s customers will waste their time trying to analyze data using this ineffective form of display?

This is interesting because as editor of the Guardian Datablog I can honestly say we don’t pay for Tableau services – we use Tableau Public sometimes as it’s a great tool. And we haven’t done a word cloud in the past two years.

It led to a conversation in the comments that I thought was interesting – and I’d love to know what you think too. This is how it stood today…

Are we right to showcase so many different types of graphics?

By Simon Rogers. March 14th, 2013 at 5:17 am

Really interesting piece marred only by a strange inaccuracy:

“Did they add this feature to satisfy one of their prominent UK customers, the Guardian? ”

We are not customers of Tableau – we’re more likely to use Datawrapper to produce bar charts, ironically. On the occasions where we do use Tableau, we use the public version.

Simon Rogers, editor, Guardian Datablog

By Stephen Few. March 14th, 2013 at 10:02 am

Simon,

The Guardian provides Tableau with high-profile, public exposure. In so doing, you provide a powerful promotional platform for the software, which gives you influence. I cited the Guardian as an example of an organization that might tempt Tableau to add eye-catching but ineffective forms of visualization to the product because I’ve seen some of the worst examples of this in your publication, especially those created by David McCandless, who is particularly fond of bubbles.

By Simon Rogers. March 14th, 2013 at 10:20 am

Thanks for replying Stephen. As I said, I really like the piece and agree with 98% of it.

And, yes, we do showcase visualisations by lots of people, including – very occasionally – David McCandless. It’s what our Show and Tell section is for . And yes, sometimes they have bubbles and sometimes they don’t. I want the blog to show lots of visualisations by lots of people – some of which I love, some of which I don’t but are interesting and will spark a debate.

But if you did a survey of data visualisation types we use in posts that we put together here, you would find simple bar charts well in the lead.

And we are not customers of Tableau.

Best wishes,
Simon

By Stephen Few. March 14th, 2013 at 10:45 am

Simon,

The Guardian is in a position to promote effective data visualization by only showcasing infographics that are well designed. As an editor, do you not see this as an opportunity to lead the way?

If you use Tableau’s software, which is the case, then by my definition you are a Tableau customer. The fact that you use it for free doesn’t change this fact. I am also a Tableau customer in that I use the software, even though I’ve never paid for it. Whether you are a customer or not really doesn’t matter, does it? What matters is that the Guardian uses Tableau’s software, showcases its functionality, and as such exercises influence. Who knows to what degree the Guardian’s occasional use of bubbles, improperly designed treemaps, and other ineffective forms of display contributed to Tableau’s motivation for introducing some of the flashy stuff? What I invite you to recognize, however, is that by showcasing ineffective visualizations at times, you influence people in ways that potentially undermine their uses of data.

By Simon Rogers. March 14th, 2013 at 11:08 am

Stephen,
I absolutely agree about showcasing things that are well-designed, which we do. But I also want to democratise the process and encourage more people to feel that data is something for them, rather than an abstract property belonging to a few – as well as show new ways of visualising data when we can. Show & Tell is about separating off, to some extent, these visualisations and asking our readers what they think too. It is not a place for long strips of marketing ‘infographics’ – that visual disease of the web – but for a variety of ways of seeing the world.

And, if you have read the comments, you will see that our community are a discerning bunch who will vigorously debate the merits of infographics and the data itself. I am happy to provide a platform for as many interesting visualisations as possible. Some of them you will like and some you won’t but I think that’s OK. Comment sections of newspapers have allowed various views to permeate their pages for decades: it’s about allowing as many opinions as possible.

I hardly think you can then use us a reason for Tableau moving into word clouds.

Re: the whole ‘customer’ thing: the Oxford Dictionary defines ‘customer’ as “a person who buys goods or services from a shop or business”. We do not do buy services from Tableau, therefore we aren’t customers. We are sometimes users of the software, when it fits, yes. But not customers.

We demand precision in data visualisations; we should do the same with language too.

Best wishes,
Simon

By Stephen Few. March 14th, 2013 at 11:33 am

Simon,

I use the term “customer” more loosely than you do. As you know, if you looked up the term in the Oxford English Dictionary, which I just now did as well, my use of the term fits within the OED’s various definitions of the term. Regardless, my point was that you have influence, which is true.

I too work to democratize the use of data. In fact, this is my life’s work. How we differ, however, is that I think we should democratize data only in useful ways. Showcasing ineffective forms of display undermines this effort. There is enough confusion on the Web today regarding data visualization. The Guardian could join me in trying to promote best practices by vetting its graphical content more thoughtfully.

As a journalist, would you ever showcase ineffective uses of the English language in the Guardian? I doubt that you would. Just like words, graphics are a form of communication. As such, syntax of a sort applies–rules for using graphics to communicate clearly and accurately. As a graphics’ editor for a large news publication, you have an opportunity to help people improve their graphical skills through example. This is a wonderful opportunity. You could extend your use of it in more helpful ways.

By Simon Rogers. March 14th, 2013 at 5:37 pm

Stephen
Thanks for your reply and for taking the time to engage. I don’t agree with the point about customers, but there you go.

I promise I’ll stop soon. But…

Just a few definitions and differentiations. Firstly, I am not graphics editor at the Guardian newspaper and I have never been. That falls presently to the brilliant Paul Scruton and previously to Michael Robinson. Both of whom know an awful lot more about graphical design than I do and are often fierce adherents of the kind of simple, clear graphics which you champion. There is nothing in the piece you have written with which they would disagree and I think you would be hard-pressed to find anything in the Guardian Graphics department’s work which does not fulfill those rules.

My job? I edit the Guardian Datablog, which is the home of a lot of the Guardian’s data journalism online. Our output on the blog comprises news stories, opinion pieces and data analyses. Often these are accompanied by charts and data visualisations which also follow your rules. At the moment we use Datawrapper a lot as it is a great way to produce simple clear charts at speed, as we are often working to tight deadlines. Most of the result of our work is in written articles, not data visualisations.

In terms of graphcs, I would say that 80% of the output of the Datablog is either our own simple charts or more ambitious graphics produced by the Guardian Graphics department working directly with our reporting team.

A small part of the output of the site is a ‘blog within a blog’, the Show and Tell section. This is where we publish interesting things from around the web. Are they all perfect? Probably not. But are they interesting or do they tell interesting stories? Yes, I think so.

Of course, that is my opinion. Yours or any of our readers’ may be different, and this is where the point about written articles comes in. I would not expect every article written by every writer for the Guardian to be in the same style or approach. I would expect them to be able to follow an argument, be literate and at least be accurate, however. And as long as data visualisations that are pitched to me follow those rules, will engender debate and discussion, then great. They may not all be to my taste, but that is what it is: my taste and opinion. I’m not sure there is an objective school of knowledge that says this is ‘right’ and that is ‘wrong’ – particularly as tastes and fashion move all the time.

Having said all that, if you would like to write or visualise data for us to show on the Datablog, I would be honoured. There is a real hunger out there for a glimpse of the knowledge you possess. Let’s share that.

By Stephen Few. March 15th, 2013 at 9:21 am

Simon,

As the editor of the Datablog, you have an opportunity to vet the content. I believe that it is in the Datablog that I’ve found the Guardian’s examples of ineffective data visualization that I referred to previously, such as McCandless’ work. I’ve used examples from you Datablog in lectures to illustrate how not to visualize data. These examples do not meet your journalistic requirement that the content “follow an argument, be literate and at least be accurate.” McCandless’ Billion Pound-O-Gram, which appeared in the Guardian, is an example of this.

There actually is “an objective school of knowledge that says this is ‘right’ and that is ‘wrong.’” It is based on many years of research into the way graphics are perceived and how they can assist or impede cognition. Many books, such as mine (also Tufte, Robbins, Ware, Cairo, Cleveland, etc.) explain the findings of this research in the form of data visualization best practices. Although style and fashion certainly exist in data visualization, this isn’t what I’m talking about. I’m referring to perception and cognition, and how graphics can be designed to work effectively for humans. Are you familiar with this body of research? If not, I invite you to become familiar with it and to let this body of knowledge help you vet the content of the Datablog in ways that will better showcase effective data visualization practices. In so doing, you will join in the effort to usher in a true Information age, rather than the dysfunctional data age in which we currently live.

By Simon Rogers. March 15th, 2013 at 9:54 am

Hi Stephen

For what it’s worth there is not much in what you just said that I would disagree with – in fact we are well aware of these principles of graphic design and I would say that both our own output and that of the Guardian graphics team adheres to those rules pretty firmly. And if you see something that either I or our graphics team produce which does not then please let me know.

But I will always want the Datablog’s show and tell section to be a place where graphics of all different types are published to engender debate and encourage new ways of visualisation. Obviously, the more people who visualise data in ways that are “effective” the better. But the nice thing about that section is that it is a place where everyone is free to submit something with a good chance it will be published.

By Stephen Few. March 15th, 2013 at 10:11 am

Simon,

By allowing a visualization to be featured in the Guardian, you are granting it a degree of credibility. It appeared in the Guardian, after all, so it must be good, or so readers will reason. Readers will emulate these examples, even when they’re completely ineffective as journalism, especially if they’re eye-catching, unless someone moderates the show and tell section to either deny bad visualizations a forum or critique their effectiveness so that readers can learn from them. In cases when novel visualizations are posted, that are not obviously either good or bad, you can frame them as experiments and invite critique. I’m encouraging you to get more involved as a moderator, using this forum that you edit to nudge readers toward better data journalism. You’re opportunity is golden. Use it more proactively.

By Simon Rogers. March 15th, 2013 at 11:12 am

Stephen
I wish that you would spend a bit more time on the Datablog – and we could have this conversation afterwards. “Frame them as experiments and invite critique” is exactly what we do. You never know – you may be pleasantly surprised by what we produce.

If you’re looking for me to say that I won’t use David McCandless’ work ocassionally, however, then you will be disappointed. Whether or not you like his work, he has probably turned more people onto data visualisation than any other data journalist I can think of living today. Those people will then I believe investigate the field more and hopefully discover too your work and those of others in the field. And when we use his work we will, as we always do, ask our readers what they think too.

By Stephen Few. March 15th, 2013 at 11:50 am

Simon,

I’m not telling you to deny McCandless a forum for his work. That isn’t my place. I’m merely pointing out that providing him and others who promote ineffective practices a forum for their work, you are supporting bad practices. In so doing, you are aiding a abetting a crime against data journalism. We’ll never progress in our use of data as long as people like McCandless are encouraging others to make the mistakes that we learned to avoid more than a generation ago. Must we continue to repeat the mistakes of the past?

McCandless does not understand data visualization because he never bothered to study the field. Instead, he began generating flashy infographics and, because publications like the Guardian endorsed his work by showcasing it, he gained notoriety in a field of practice without ever actually developing the essential skills that are needed to do it effectively. With this recognition and notoriety affirming his work, he hasn’t felt the need to step back and develop the skills that he’s lacking. Why would he. People like you are saying that his work as a data journalist is already good enough for major publications like the Guardian. You endorsed his work, not because it was good, but because it was popular. Is that what an editor should do? Perhaps for a publication like the Sun, but for the Guardian?

You can’t sit back and claim neutrality. You’re an editor. It’s your job, as I understand it, to promote good journalism. When publications like the Guardian begin to treat graphical journalism as conscientiously as they treat written journalism, then we’ll begin to see progress.

By Simon Rogers. March 15th, 2013 at 3:47 pm

Hi Stephen – maybe we will never agree (or even draw this conversation to a close) but let’s introduce some facts to the conversation as I fear you haven’t spent that much time with the section you are so critical of.

In 2012, the Datablog ran 725 pieces of content, which includes written articles, charts, videos and maps. We ran several awards, including one for statistical excellence in journalism from the Royal Statistical Society.

Of that number, we ran 67 pieces in our Show & Tell section – 9.24% of the total. These are pieces produced by people outside the Guardian in a separate section of the Datastore, as opposed to being part of the main part of the blog. This section is clearly labelled as being a place to showcase experimental and interesting visualisations and where we actively elicit comments from users as to what they think.

Last year we ran three pieces by David McCandless – which is 0.4% of all Datablog posts.

Oh and no word clouds.

This also means that 90.76% of our content is produced within the Guardian, often with simple charts and visualisations using tools like Datawrapper (which, by the way, was produced with advice from Tufte) or with graphics produced by the Guardian graphics team. These are hardly shabby and several have won prizes at Malofiej. In short, we adhere to pretty strict design rules and principles.

So we allow under 10% of our editorial space to be taken up with visualisations from outside the Guardian, often experimental; always interesting and worth discussing.

I appreciate what you say about ‘objective truth’ in data visualisation – although this is the only field I can think of where this view applies. In science, journalism and literature, there is a recognition of the subjectivity of those taking part. And clearly if we are to deny all alternative voices and methods then nothing new will ever develop.

We actively ask our readers what they can do with the data. To then turn around and say, actually we’re going to ignore all that work and not even allow a fraction of our site to showcase things which are interesting around the web would be odd, to say the least. Then it starts to sound less like “good” and “bad” and like something much less appealing: like a dogma.

By Stephen Few. March 15th, 2013 at 5:27 pm

Simon,

I appreciate your thoughtful and passionate defense of the work that you’re doing at the Guardian. We clearly disagree, however, about your responsibility as an editor and the effects of the poor data visualizations that you often showcase in the Show and Tell section of your Datablog. This is the logo that appears at the top of the Datablog’s main page:

Guardian DataBlog Logo

Facts are indeed sacred, if by facts you’re referring to those that are true. We don’t treat things that are sacred in the way that facts are sometimes treated in your blog. In the introduction for Show and Tell, you say: “Welcome to Show and Tell, our new site highlighting the best of the world of infographics and data visualisations on the web.” Let’s take a look.

In my most recent visit to your site, this is the first visualization that I found featured there:

This is a linear arrangement of bubbles (circles) that are meant to represent the percentages on Catholics in various countries. Only the table that appears above the graphic, however, is useful. The graphic suggests that circles of various sizes as shown here is an effective way to graphically represent the values and the differences between them. As you know, this graphic is an ineffective quantitative display. Where have you pointed out or even suggested that this graphic isn’t effective or shown a more effective way to display these values?

Here’s the next visualization that I found:

The caption for this says: “Twitter users grouped into tribes, annotated with words typically used by each group.” Can you think of more effective graphical ways to tell this story? I can. This is eye-candy, pure and simple. When people see examples like this and believe what you’ve said, that these are examples of the best infographics and data visualizations on the web, they will emulate them. What a waste.

I know that I’m being hard on you. That’s because you’re an editor for an influential news publication. You can do better than this. It’s your job to do better than this. I’m not saying that you’re a bad guy. I’m saying that you’re missing a wonderful opportunity to make the world a better place by actually doing what you claim: to use your Show and Tell section of your Datablog to show a curated set of great infographics and data visualizations. If you do that, I’ll praise what you’re doing.

You cannot dismiss my perspective as “dogma.” I am not dogmatic. I know all about dogma. I was raised in a fundamentalist religion. I was told, “This is the way it is because we say so, and if you question what we say, you risk going to hell.” I clawed my way out of dogma and now believe in a scientific view of the world. We discover what’s true through research, based on evidence. Science has informed the use of graphics for presenting data. If we want to progress in our ability to find, understand, and present the stories that live in data, we should take advantage of these scientific findings, using them as the foundation on which to then build ever better methods. Many of the examples in your blog are encouraging people to ignore science and emulate practices that we know don’t work. My attempts to discourage this are not examples of dogma, they are the compassionate advice of a teacher who wants to help people do better.

By Simon Rogers. March 16th, 2013 at 1:21 am

Stephen
Thanks for taking the time to look at part of the blog. Obviously those are two examples of external graphics on the site. In the last week, however you will find a piece about John Snow’s cholera map, another on the composition of Popes featuring, yes, bar charts; another on the use of antibiotics (more bar charts); a piece on wellbeing (bar charts); a large piece on a decade of war in Iraq (mostly bar charts); a historical list of uk voting intentions (line chart and pie chart); another on Syrian refugees (bar charts).

Having written that, I’m worried we are using too many bar charts but they are probably the most effective way of visualising that data.

But we also publish the raw data itself because we believe many of our readers, you included, may be in a position to do better things with the data; our role being as much to democratise data as visualise it.

As I said, external graphics are a tiny part of our output and are there to engender discussion. And next week we will publish one of David McCandless’ graphics which I’m sure you and other of our readers won’t like. However some will, and others will like part of it but may disagree on elements of it. But I think we are all mature enough to discuss those tastes online. That is what the blog is for – and I would love it if you became part of that process too.

About Simon Rogers

Data journalist, writer, speaker. Author of 'Facts are Sacred', from Faber & Faber and a new range of infographics for children books from Candlewick. Edited and launched the Guardian Datablog. Now works for Twitter in San Francisco as Data Editor

Discussion

7 thoughts on “A conversation with Stephen Few about data visualisation. Kind of

  1. Hi Simon,

    before I join the discussion I’d like to say that I’m very impressed with the Datablog’s use of data journalism to enlighten the public on important issues, I’m a regular reader.

    That being said, I find that the data visualisations the Guardian shows often fall short of the high standard of the journalism behind it. Like Stephen, I think that bubble graphs are a mediocre way to display accurate data (though they can be useful in showing trends – Alberto Cairo features a good discussion of this in his book The Functional Art). But there are problems beyond the choice of visualisation. More than once have I found actual mistakes in in-house graphics, like a bar chart of 34 being higher than one saying 38 (I sent this one to the corrections editor, but it hasn’t been changed since).

    I’d also like to comment on your article on John Snow’s cholera map. You may want to read “Map-making and myth-making in Broad Street: the London cholera epidemic, 1854″ by Brody et al. It’s an update of Tufte’s research and shows that a) Dr. Snow most likely did not draw a map until after his identification of the infected pump, b) he’d formed his theory on the spread of cholera before the event and was in fact doing a bigger experiment to prove it while the Broad Street epidemic happened and c) the implications of the map could only be understood through the right theory, and not just by anyone who saw it.

    This doesn’t diminish the value of the data journalist approach you describe in your article, but it shows that visual arguments are not even obvious to everyone when they’re correct – so how can we assume they do any good when they’re bad representations of the data? What will people understand better through them?

    Showcasing ineffective data visualisations in the Guardian sets a bad example. Worse for incorrect ones. I don’t know how the design process is organized exactly, but you write in Facts are Sacred that “Designing a graphic and analysing data are two different jobs.” But if the designers don’t understand the data, even bar charts can go wrong. (“Show and tell” is another story, though I think the separation from the datablog may be to subtle for most people to notice.) Do the responsible data journalists even look at the graphics in their article before it goes online? Who makes sure it’s quality work?

    Mathematics have rules, and so do statistics. If infographics are the visual representations of those, they need to have standards as well – at least if you want them to be a reliable source of information.

    Posted by Shaky (@enola_srouj) | April 5, 2013, 11:25 am
  2. Hey! I just wanted to ask if you ever have any problems with hackers? My last blog (wordpress) was hacked and I ended up losing months of hard work due to no back up. Do you have any solutions to stop hackers?

    Posted by google | April 2, 2013, 3:25 am
  3. Simon, I found your restraint in this discussion admirable. I too posted on his blog, but he would not allow it to stay up. I essentially told him that I could hardly see how the addition of a few chart styles–especially since Tableau did not take away anything–hardly warranted a 6,000 word screed on the topic.

    I told him there was a fine line between being an informed expert, on the one hand, and an arrogant, know-it-all, gotta-be-right, I’m-the-only-one-who-knows-best guy on the other, and that he had crossed it.

    He replied to me in email and asked me to cite how I could have come to that conclusion. I told him insisting you were a customer was perhaps the best example.

    Anyway, nice work.

    Posted by Jon Boeckenstedt | March 23, 2013, 6:11 pm
  4. I’m a long-term connoisseur of internet debate, and it’s always fascinating to watch a discussion such as this unfold. It was probably helped along in my own mind by picturing the two protagonists in their respective book-lined studies, sharpening fresh quills before furiously scratching out each instalment and ringing for the butler to send it on its way.

    On a capricious whim I wondered how each side of the debate might look when visualised: and what more appropriate way to do this than to employ a computer-generated scaled frequency representation? To that end, I’d like to share the following links with the authors and readers: Stephen (http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/stephens-comments) and Simon (http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/simons-comments).

    Interested readers may like to compare the shared and unique terms used in each cloud or even go on to generate their own visualisations from the datasets currently shared on the Many Eyes site.

    Posted by Stephanie Lay (@StephanieLay) | March 20, 2013, 2:23 pm
  5. I think Stephen Few needs to put his money where his mouth is. Submit one of his graphics to Show and Tell and let us all judge it, as he has judged the work of others. Then we will see if his graphic is engaging or memorable to anyone outside of the data/stats community. The Guardian can, I’m sure, also measure dwell times, bounce rates and use other analytic tools to see how his work compares – in terms of engagement – with some of the amateurs he maligns. Comments on his chart will also allow us to gauge whether the data has been clearly understood and, if it hasn’t, how it has been misunderstood. He mentions rules and research above which make no sense to me. We should never use bubble charts? Is he suggesting that, for example, the Guardian’s famous public spending bubble chart would have been somehow better as a bar chart? Or that the millions of people who engaged with it didn’t properly understand the numbers? I think they saw and understood some of these numbers for the first time. Finally, thank you Simon Rogers for publishing this exchange. Enlightening in so many ways.

    Posted by Alan D | March 16, 2013, 8:22 pm
  6. Just an update. Stephen posted this response recently. For what it’s worth there is not much there I would disagree with – but I will always want the Datablog’s show and tell section to be a place where graphics of all different types are published to engender debate and encourage new ways of visualisation. Obviously, the more people who visualise data in ways that Stephen would call “effective” the better:

    By Stephen Few. March 15th, 2013 at 9:21 am
    Simon,

    As the editor of the Datablog, you have an opportunity to vet the content. I believe that it is in the Datablog that I’ve found the Guardian’s examples of ineffective data visualization that I referred to previously, such as McCandless’ work. I’ve used examples from you Datablog in lectures to illustrate how not to visualize data. These examples do not meet your journalistic requirement that the content “follow an argument, be literate and at least be accurate.” McCandless’ Billion Pound-O-Gram, which appeared in the Guardian, is an example of this.

    There actually is “an objective school of knowledge that says this is ‘right’ and that is ‘wrong.’” It is based on many years of research into the way graphics are perceived and how they can assist or impede cognition. Many books, such as mine (also Tufte, Robbins, Ware, Cairo, Cleveland, etc.) explain the findings of this research in the form of data visualization best practices. Although style and fashion certainly exist in data visualization, this isn’t what I’m talking about. I’m referring to perception and cognition, and how graphics can be designed to work effectively for humans. Are you familiar with this body of research? If not, I invite you to become familiar with it and to let this body of knowledge help you vet the content of the Datablog in ways that will better showcase effective data visualization practices. In so doing, you will join in the effort to usher in a true Information age, rather than the dysfunctional data age in which we currently live.

    Posted by Simon Rogers | March 15, 2013, 4:50 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About me

Data journalist, writer, speaker. Author of 'Facts are Sacred', published by Faber & Faber and a new range of infographics for children books from Candlewick. Data editor at Twitter, San Francisco. Created the Guardian Datablog. All opinions on this site are mine, not my employers'. Read more >>

Free to share

Creative commons

Please share me around. Everything here is free to use under a Creative Commons Attribution-NonCommercial 3.0 Unported License

Follow me on Twitter

Follow

Get every new post delivered to your Inbox.

Join 14,206 other followers

%d bloggers like this: