Intro to Data Journalism

City, University of London
https://ddj.nicu.md/city/

👋 Welcome to Data Journalism

Who am I?

My name is Nicu Calcea.

I’m a data investigative journalist and City University alumnus.

I work at Global Witness, and I was previously doing data journalism at BBC News and the New Statesman.

Some stories

Get in touch

Who are you?

  • What’s your name?
  • What course are you in?
  • Do you have any experience in data journalism?
  • Why did you choose this module?

The plan

  • Week 1: Introduction
  • Weeks 2-3: Analysis
  • Week 4: Cleaning
  • Week 5: Stories
  • Week 6: Visualisation
  • Week 7: Maps
  • Week 8: Projects
  • Week 9: Scraping
  • Week 10: Conclusions

Week 1

Introduction to Data Journalism

What is Data Journalism?

In its most simple definition, data journalism is the practice of using numbers and trends to tell a story. — Betsy Ladyzhets

Data journalism [is] finding – in data – stories that are of interest to the public and presenting them in the most appropriate manner for public use and reuse. — Bahareh Heravi

History

Why do we need data journalism?

Tell richer stories

An increasing amount of human activity is recorded with data. This means there is a data angle for almost any subject.

Be more efficient

We tell some stories every year, month or day. We can greatly simplify or automate stories, giving us more time to focus on in-depth reporting.

Be more accurate

Though not without data quality issues and ethical considerations, accuracy is central to data journalism.

Unique angles

There are now stories where a data angle is the only or main angle. By using data, journalists can create news instead of covering them.

Personalise news

Make readers invested in a story by personalising it to their postcode, age or socio-economic status.

New audiences

Data journalism is exciting (I hope). The pandemic has shown that readers will reward publishers with their clicks.

The process

Question

As with all journalism, data journalism starts with a question that the reporter wants to answer.

Source data

Data can come from government sources, third parties, or be collected by the reporter themselves.

Clean data

In most cases, you will need to filter, sort and clean up any errors or missing information in your dataset.

The process

Analyse data

How do you find the answer to your question in the data?

Review

While data doesn’t lie, data publishers do. Do your findings make sense? Can you verify them?

Present results

Communicate data in the most suitable way. Usually, but not always, you will visualise your findings.

The process

Baby names

  1. Make a copy of this spreadsheet and pick one tab to work in. Data from the ONS.
  2. The yellow cells indicate where you need to fill in formulas.
  3. What are some other potential stories that you can think of? Are there more babies named after the royals? What about Game of Thrones characters? What are the most popular gender-neutral names? Long-term trends?

[=A1+A2]

Returns one number added (+) or subtracting (*) another.

[=A1/B$1]

Returns one number divided (/) or multiplied (*) by another.

[=SUM()]

Returns the sum of a series of numbers and/or cells.

=AVERAGE()

Returns the numerical average value in a dataset, ignoring text.

=MEDIAN()

Returns the median value in a numeric dataset.

[=(NEW-OLD)/OLD]

Shows percentage change.

Averages

=MODE()

Finds the most common value in a range.

   

=MEDIAN()

Finds the value that’s right in the middle of a dataset.

   

=AVERAGE()

Sum all the values and divide by the number of records.

How did the Mail do it?

2018

2019

Assignments

Critique a data journalism project
  • A 20-25 minute long narrated group PowerPoint presentation critiquing a data project that won or was shortlisted for the Sigma Awards.
  • 500-word group reflection, with appropriate references.
  • A 200-word reflection on your own learning.

Deadline: Friday, 13 December, 16:00 Marking: 40% of your final mark

Data journalism portfolio
  • One news story (400 words).
  • One EITHER feature story OR news investigation (800 words) substantially based on data techniques; and published digitally with appropriate visualisations.
  • A 200 word reflective blog-post style log on you own learning journey.

Deadline: Friday, 24 January, 16:00 Marking: 60% of your final mark

Contact

Week 2

Introduction to Data Journalism
https://ddj.nicu.md/city/

Sourcing data

Open Data

Plan ahead

Closed data

FOIs

Scraping

Census exercise

  1. Make a copy of this spreadsheet (here’s the original data).
  2. Before you do anything else, what are some questions you would like to answer?
  3. Free first row, filter the table to the region you’re from (or London)
  4. Fill in the columns at the end
  5. What are some other potential stories that you can think of?

[=A1+A2]

Returns one number added (+) or subtracting (*) another.

[=A1/B$1]

Returns one number divided (/) or multiplied (*) by another.

[=SUM()]

Returns the sum of a series of numbers and/or cells.

=AVERAGE()

Returns the numerical average value in a dataset, ignoring text.

=MEDIAN()

Returns the median value in a numeric dataset.

[=(NEW-OLD)/OLD]

Shows percentage change.

[=IF()]

Returns one value if the result is true, another if it’s false.

[=COUNTIF()]

Count all the cells that match a condition.

[=SUMIF()]

Sum all the cells that match a condition.

[=CONCATENATE()]

Combine multiple bits of text together. Use =SPLIT() for the opposite.

[=VLOOKUP()]

Match the values in a cell with the corresponding row in another dataset.

[=XLOOKUP()]

Same as =XLOOKUP() but more flexible and easier to grasp.

Contact

Week 3

Introduction to Data Journalis
https://ddj.nicu.md/city/

Toolbox

XLOOKUP

=XLOOKUP(search_term, col_to_search, col_to_return)

XLOOKUP exercise

  1. Make a copy of this spreadsheet.
  2. Fill in the empty columns with formulas we learned last time.

Pivot Tables

Pivot tables are extra tables in your spreadsheet, in which you can summarise data from your original table.

You can calculate averages, counts, max/min values or sums for numbers in a group.

Pivot table exercise

  1. Make a copy of this spreadsheet.

Bonus points: Grab a CSV from police.uk and do it yourself.

Averages

=MODE()

Finds the most common value in a range.

   

=MEDIAN()

Finds the value that’s right in the middle of a dataset.

   

=AVERAGE()

Sum all the values and divide by the number of records.

Contact