City, University of London
https://ddj.nicu.md/city/
My name is Nicu Calcea.
I’m a data journalist and City University alumnus, originally from Moldova, currently based in London.
I work as a data journalist at BBC News, previously Data Projects Editor at the New Statesman.
My personal website: nicu.md
Introduction to Data Journalism
In its most simple definition, data journalism is the practice of using numbers and trends to tell a story. — Betsy Ladyzhets
Data journalism [is] finding – in data – stories that are of interest to the public and presenting them in the most appropriate manner for public use and reuse. — Bahareh Heravi
Since the pandemic, nearly every newsrooms has prioritised data journalism and has been massively hiring for data journalism positions.
New(-ish) platforms like Datawrapper and Flourish allow journalists to create and visualise data stories easier and without much technical expertise.
However, the increased supply of data journalists from courses like this means there are higher entry requirements (R, Python, SQL).
An increasing amount of human activity is recorded with data. This means there is a data angle for almost any subject.
We tell some stories every year, month or day. We can greatly simplify or even automate those stories, giving us more time to focus on in-depth reporting.
Though not without data quality issues and ethical considerations, accuracy is central to data journalism.
There are now stories where a data angle is the only or main angle. By using data, journalists can create news instead of covering them.
Make readers invested in a story by personalising it to their postcode, age or socio-economic status.
Data journalism is exciting (I hope). The pandemic has shown that readers like clear, beautiful data stories and will reward publishers with their clicks.
As is the case with all journalism, data journalism starts with a question that the reporter wants to answer.
Data can come from government sources, third parties, or be collected by the reporter themselves.
In most cases, you will need to filter, sort and clean up any errors or missing information in your dataset.
How do you find the answer to your question in the data?
While data doesn’t lie, data publishers do. Do your findings make sense? Can you verify using other sources? Have you made any mistakes?
Communicate data in the most suitable way. Usually, you will visualise your findings, but that is not always necessary.
Source: Paul Bradshaw
Returns one number added (+) or subtracting (*) another.
Returns one number divided (/) or multiplied (*) by another.
Returns the sum of a series of numbers and/or cells.
Returns the numerical average value in a dataset, ignoring text.
Returns the median value in a numeric dataset.
Shows percentage change.
Deadline: Friday, December 10, 4pm Marking: 40% of your final mark
Deadline: Friday 7 January 2021, 4pm Marking: 60% of your final mark
Introduction to [Data Journalism]
https://ddj.nicu.md/city/
Source: Reuters
Source: FT
Source: Die Zeit
Source: The New York Times
Source: Wyborcza
Source: The Pudding
Source: ddj.nicu.md
Returns one number added (+) or subtracting (*) another.
Returns one number divided (/) or multiplied (*) by another.
Returns the sum of a series of numbers and/or cells.
Returns the numerical average value in a dataset, ignoring text.
Returns the median value in a numeric dataset.
Shows percentage change.
Returns one value if the result is true, another if it’s false.
Count all the cells that match a condition.
Sum all the cells that match a condition.
Combine multiple bits of text together. Use =SPLIT() for the opposite.
Match the values in a cell with the corresponding row in another dataset.
Same as =XLOOKUP() but more flexible and easier to grasp.
Introduction to [Data Journalism]
https://ddj.nicu.md/city/
Source: The New York Times
Source: Urban Complexity Lab
Source: The Register
Source: Aman Bhargava
Bloomberg is expanding its data journalism and visualization teams globally by hiring approximately 40 new data journalists, data visualization reporters, editors and engineers.
— Bloomberg Graphics (@BBGVisualData) October 25, 2023
a thread 🧵… pic.twitter.com/kQklY9cyPf
Source: ddj.nicu.md
Pivot tables are extra tables in your spreadsheet, in which you can summarise data from your original table.
You can calculate averages, counts, max/min values or sums for numbers in a group.
Finds the most common value in a range.
Finds the value that’s right in the middle of a dataset.
Sum all the values and divide by the number of records.
Data [Cleaning]
https://ddj.nicu.md/city/
Source: BBC
Source: The Washington Post
Source: The New York Times
Source: Bloomberg
Sometimes, records disappear or were never collected. It may not always be the obvious when that is the case.
Records can be repeated, either due to technical mishaps or due to repeated input.
Humans make mistakes. Assume any dataset manually created by humans to have missspelings.
Some spreadsheets are designed to be read by humans, not computers. We then need to teach software to read it correctly.
Wrong Excel formula? Üṅṛëċöġṅïṡëḋ characters? Old Excel version? These can all mess with your data.
A column can have different unites, spell categories differently or record different methodologies.
Source: Datawrapper
Source: ONS
Also check last year’s tables: here.
Source: The Guardian
Source: OpenRefine
Source: Food Standards Agency
You will be split into teams of four to create a narrated PowerPoint presentation critiquing a data project featured in the Sigma Awards. You can choose a winner or a short-listed project (at the bottom of the page).
Deadline: Friday, December 8, 4pm
Deliverables
Source: Joe Murphy
Data [Stories]
https://ddj.nicu.md/city/
Source: Erwan Rivault
Source: Jana Tauschinski
Source: FT
Summarising data, like we did in previous lessons, is not always enough to reveal pattern or trends.
Visualising it can provide insight we’d otherwise lose out on.
Position
Size
Width
Height
Area
Colour
Fill
Colour
Opacity
Pattern
Shape
Location
Standard scatter plot
Change scale to log
Size by population
Colour by continent
Animate over time
Source: FT
Data [Visualisation]
https://ddj.nicu.md/city/
Source: Reuters
Source: Bloomberg
Source: Washington Post
Source: ddj.nicu.md
Horizontal or vertical rectangles with lengths proportional to the values that they represent.
Good for comparing across different values or showing a trend over time.
Shows values on a continuous scale. Similar to a scatter plot, except all dots are connected.
Good for showing trends over time.
Similar to a line chart but the area underneath the line is coloured in. When stacked, it can show multiple data series as well as their cumulative trend.
Good for showing trends over time.
Plots a dataset across two continuous dimensions, each on a different axis (X and Y).
Good for showing correlation between different data series.
Only works with geographical data (duh!).
Even with geographical data, other charts can often be a better choice.
Source: ddj.nicu.md
Source: Datawrapper
Source: Daily Mail
Source: The Sun
Source: Reuters
Source: CBS News
Source: WTF Visualizations
Source: Reddit
[Maps]
https://ddj.nicu.md/city/
Source: Datawrapper
Source: BBC
Source: ONS
Source: Jan Pánek
Source: Bloomberg
Source: The New York Times
Source: Washington Post
Source: openDemocracy
Pre-defined areas such as countries, regions or districts are coloured (either sequential, diverging or categorical) in proportion to values in a dataset.
Circles are drawn on top of a map, with their size or colour proportional to values in a dataset.
Cartograms resize regions in proportion to a variable in your dataset, such as population.
Hex maps standardise administrative units into same-sizes hexagons, squares or triangles.
Try to only use maps when there’s a geographical pattern to your data.
Don’t make a map if it’s going to basically be a population map.
Source: xkcd
This can be polygons (areas), lines or points. Some tools have a few options by default, or you can get additional ones from the ONS, Natural Earth or ArcGIS.
This is the data that will be placed in the shapes on your map. Normally contains region IDs or coordinates (latitude and longitude).
Data [Projects]
https://ddj.nicu.md/city/
Source: Friends of the Earth
Source: The New York Times
Source: Financial Times
Source: AP
Source: Wall Street Journal
Source: Bloomberg
Source: Texty
Often, the best narratives warrant going further than simple graphics.
Tailored visualisations designed specifically for a story will almost always be the best way to tell a story.
Interactivity can sometimes help portray the intricacies of a story better.
Source: Gurman Bhatia
Source: Masters of Media
Trackers are data visualisations connected to a data source that is periodically updated.
Examples include FiveThirtyEight’s Biden approval rating tracker, Bloomberg’s Pret Index and the New Statesman’s Covid-19 tracker.
Source: New Statesman
Calculators allow readers to input their own data and receive a result.
Examples include the FT personal data worth calculator, the New Statesman election calculator and the BBC’s energy calculator.
Source: BBC News
Scrollytelling is the use of a browser’s scrolling functionality to interactively tell a data story.
Examples include The Impatient List, the New York Times delta variant story and the New Statesman million years lost investigation.
Source: BBC News
Information visualisation is meant to clarify data, but too much interactivity hinders understanding by transferring responsibility from the designer to the reader to work out the important points. — Martin Stabe (FT)
Readers just want to scroll, […] if you make the reader click or do anything other than scroll, something spectacular has to happen. — Archie Tse (NYT)
Interactive graphics are not just a fun addition but can actually increase the transparency of our work, open us for criticism, and thereby, hopefully, help re-build some trust in journalism. — Gregor Aisch (Datawrapper)
Source: USAFacts
Source: Bloomberg
Source: CNN
Source: Brendan Bycroft
Source: McKinsey
Source: Le Monde
Data scraping, in its most general form, refers to a technique in which a computer program extracts data from output generated from another program. — Cloudflare
Many organisations still publish data in PDFs, a proprietary format that is difficult to work with. Sometimes, they even do it on purpose.
If you can’t find the data behind a chart, ask the author. If you can’t do that either, read it from the image.
If you’ve got structured information on your page, you’ll most likely be able to download in a format that you can analyse.
Source: Tabula
Source: WebPlotDigitizer
Source: Map Digitizer
Source: Online Journalism Blog
Source: Inspect Element
=IMPORTHTML()
formula to import one of the tables.{"_id":"missingpersons","startUrl":["https://missingpersons.police.uk/en-gb/case-search/?page=[1-10]&orderBy=dateDesc"],"selectors":[{"id":"gender-age","parentSelectors":["case"],"type":"SelectorText","selector":"a:nth-of-type(7) div:nth-of-type(1) div","multiple":false,"regex":""},{"id":"reference","parentSelectors":["case"],"type":"SelectorText","selector":"div.Detail:nth-of-type(2) div","multiple":false,"regex":""},{"id":"location","parentSelectors":["case"],"type":"SelectorText","selector":"div.Detail:nth-of-type(3) div","multiple":false,"regex":""},{"id":"case","parentSelectors":["_root"],"type":"SelectorElement","selector":"a.CaseThumbnail","multiple":true},{"id":"link","parentSelectors":["case"],"type":"SelectorLink","selector":"_parent_","multiple":false},{"id":"ethnicity","parentSelectors":["link"],"type":"SelectorText","selector":"div.Entry:nth-of-type(3) div.Value","multiple":false,"regex":""}]}
Take part in the module evaluation survey.