Home » Articles posted by johannstan

Author Archives: johannstan

Data Journalism, or how open data can transform journalism


Today the job of a journalist is to be the first to report on a new event or a given topic. However, being first is often at the expense of quality. Even being the first is more and more of a challenge, as nowadays, every single citizen can become a journalist in five seconds. Just open a Twitter account, select relevant hash-tags (i.e. those keywords starting with a #) to your topic and start tweeting about what is happening around you (e.g. earthquake, epidemics, political problems etc.).

In this post, we will demonstrate how journalists could benefit from open data, allowing journalism to shift its main focus from being the first report on a development to being the first to telling us what it might actually mean. Using open data, journalists can help everyone to see possible solutions to complex problems. What I’m saying here is that journalism would be less guessing, less looking for quotes — instead, a journalist could build a strong position supported by data and this can affect the role of journalism greatly.

A first interesting and inspiring example in data journalism is the Las Vegas Do Not Harm series on hospital care (the next post will be about medical open data, stay tuned).

The Sun analyzed more than 2.9 million hospital billing records, which revealed more than 3600 preventable injuries, infections and surgical mistakes. They obtained data through a public records request and identified more than 300 cases in which patients died because of mistakes that could have been prevented. It contains different elements, including: an interactive graphic which allows the reader to see by hospital, where surgical injuries happened more often than would be expected; a map with a timeline that shows infections spreading hospital by hospital; and an interactive graphic that allows users to sort data by preventable injuries or by hospital to see where people are getting hurt.

Another data journalism project is called “Murder Mysteries” by Tom Hargrove of the Scripps Howard News Service. He built from government data and public records requests a demographically-detailed database of more than 185,000 unsolved murders, and then designed an algorithm to search it for patterns suggesting the possible presence of serial killers. An interesting input if you are considering in a particular geographical area, right?

Open Data journalism is the future of journalism. More concretely, it is journalism that leverages open data in order to unravel the meaning of a story. More specifically, it can be declined to the following dimensions:

  • Enable a reader to discover information that is personally relevant
  • Reveal a story that is remarkable and previously unknown
  • Help the reader to better understand a complex issue

Open Linked Data is a key for the success of data journalism. Also, powerful data visualization techniques are needed so that journalists can:

  • Find open data relevant to the subject of the article they are planning to write
  • Manipulate the available data (perform statistics, connect data etc.).

Data journalism may predict the next financial crisis, help fight poverty and corruption.


Big Bank(g)


Researchers say the so called Big Bang was at the origin of everything: time, space, eventually, life. Replacing one letter in the word Bang yields the Bank, which has a role as essential in our society as carbon for life. Interestingly the prefix Big makes sense in both cases. In the first, it’s the very name of that famous explosion or whatever it was. In the second, I’m hoping this post will make it clear. So let’s start defining the Big Bank.

Banks have more data about their consumers than companies such as Google, Facebook or Amazon. These companies have revolutionized consumer intelligence, i.e. techniques to have a 360 degree view of the consumer. This is why we love them, as we get a very intimate experience while using their services (think about the personalized product offers). In order to create those offers and show them at the right place and the right time, these companies need a 360 degree view of every consumer. Now let’s consider the bank. It has our credit card purchase history and buying preferences. The question is, if they have this data, why do we still receive completely irrelevant offers, such as a coupon for a shop we rarely visit and that is at least 100 miles away? The answer is simple: Banks have a fragmented view of their consumers. What does this mean?

  • Several internal data sources. Even worse, several technologies used to store and manage this data. Part of it could be plain text. Part of it XML. Clearly, it is not easy to aggregate such silos and extract meaningful knowledge about the consumer.

  • External unstructured data. The consumer’s Facebook and Twitter feed and web clickstream data. This is key for understanding their preferences and opinions. If a consumer shares information from an iPad, why recommend him/her the newest Samsung Galaxy? A coupon for an Apple device would make more sense, right? Also, if he/she is a fan of Lady Gaga, a coupon for Rigoletto at the MET would probably not make any sense.

  • Contextual data. More and more payments will be performed with a mobile device, so called mobile wallets. Thus, the location of the consumer can be retrieved. Remember those coupons we used to get for stores 100 miles away? History!

We have listed only a few challenges banks are facing to get a 360 degree view of their consumers. Big Data technologies are key for building the infrastructures that allows the unification of data silos. Our world is increasingly interconnected and banks should definitely think about leveraging Big Data technologies to gain business insight from the massive amount of data at their disposal. And we have just defined Big Bank.

%d bloggers like this: