Chapter 11. Exploring Data with pandas

In the previous chapter, we cleaned the Nobel Prize dataset that we scraped from Wikipedia in ChapterÂ 6. Now itâs time to start exploring our shiny new dataset, looking for interesting patterns, stories to tell, and anything else that could form the basis for an interesting visualization.

First off, letâs try to clear our minds and take a long, hard look at the data to hand to get a broad idea of the visualizations suggested. ExampleÂ 11-1 shows the form of the Nobel dataset, with categorical, temporal, and geographical data.

Example 11-1. Our cleaned Nobel Prize dataset

[{
 'category': 'Physiology or Medicine',
 'date_of_birth': '8 October 1927',
 'date_of_death': '24 March 2002',
 'gender': 'male',
 'link': 'http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein',
 'name': 'CÃ©sar Milstein'
 'country': 'Argentina',
 'place_of_birth': 'BahÃa Blanca,  Argentina',
 'place_of_death': 'Cambridge , England',
 'year': 1984,
 'born_in': NaN
 },
 ...
 ]

The data in ExampleÂ 11-1 suggests a number of stories we might want to investigate, among them:

Gender disparities among the prize winners
National trends (e.g., which country has most prizes in Economics)
Details about individual winners, such as their average age on receiving the prize or life expectancy
Geographical journey from place of birth to adopted country using the born_in and country fields

These investigative lines form the basis for the coming sections, which will probe the dataset by asking ...

Get Data Visualization with Python and JavaScript, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Visualization with Python and JavaScript, 2nd Edition by Kyran Dale

Chapter 11. Exploring Data with pandas

Example 11-1. Our cleaned Nobel Prize dataset

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly