Chapter 11. Exploring Data with pandas
In the previous chapter, we cleaned the Nobel Prize dataset that we scraped from Wikipedia in Chapter 6. Now itâs time to start exploring our shiny new dataset, looking for interesting patterns, stories to tell, and anything else that could form the basis for an interesting visualization.
First off, letâs try to clear our minds and take a long, hard look at the data to hand to get a broad idea of the visualizations suggested. Example 11-1 shows the form of the Nobel dataset, with categorical, temporal, and geographical data.
Example 11-1. Our cleaned Nobel Prize dataset
[{
'category'
:
'Physiology or Medicine'
,
'date_of_birth'
:
'8 October 1927'
,
'date_of_death'
:
'24 March 2002'
,
'gender'
:
'male'
,
'link'
:
'http://en.wikipedia.org/wiki/C%C3%A9sar_Milstein'
,
'name'
:
'César Milstein'
'country'
:
'Argentina'
,
'place_of_birth'
:
'BahÃa Blanca, Argentina'
,
'place_of_death'
:
'Cambridge , England'
,
'year'
:
1984
,
'born_in'
:
NaN
},
...
]
The data in Example 11-1 suggests a number of stories we might want to investigate, among them:
-
Gender disparities among the prize winners
-
National trends (e.g., which country has most prizes in Economics)
-
Details about individual winners, such as their average age on receiving the prize or life expectancy
-
Geographical journey from place of birth to adopted country using the
born_in
andcountry
fields
These investigative lines form the basis for the coming sections, which will probe the dataset by asking ...
Get Data Visualization with Python and JavaScript, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.