Chapter 4. Working with Excel Files
Unlike the previous chapterâs data, not all the data in this and the following chapter will easily import into Python without a little work. This is because some data formats were made to be machine readable, while others, such as the ones weâll look at next, were meant to be interacted with through desktop tools. In this chapter and the next, we will look at two example file typesâExcel files and PDFsâand provide some generic instructions to follow if you encounter a different file type.
So far in this book, the solutions youâve learned about for importing data have been pretty standard. In this chapter, we will begin to learn about processes which will vary greatly each time you perform them. Although the processes are more difficult, the end goal is the same: to extract the useful information and put it into a usable format in Python.
The examples we use in this chapter and the next contain data from UNICEFâs 2014 report on The State of the Worldâs Children. The data is available in PDF and Excel format.
When you have to extract data from files in these more difficult formats, you might think there is someone out there who hates you, because it can be painful. We assure you in most cases, the person who generated the file with the data inside simply did not identify the importance of also releasing it in a machine-readable format.
Installing Python Packages
Before we can continue, we need to learn how to install external Python packages (or libraries). Up until this point, we were using Python libraries that came standard with Python when you installed it. Do you remember importing the csv
and json
packages in Chapter 3? Those were packages in the standard libraryâthey came with your Python installation.
Python comes with a set of frequently used libraries. Because many libraries serve a niche purpose, you have to explicitly install them. This is so your computer doesnât get bloated with every Python library available.
Python packages are collected in an online directory called PyPI, which stores the packages along with their metadata and any documentation.
In this chapter, we are looking at Excel files. If you visit PyPI in your browser, you can search for libraries relating to Excel and see lists of matching package results you can download. This is one way to explore which package you should use.
We will be using pip from this point forward to install packages. There are multiple ways to install pip, and you should have already done so in Chapter 1 of this book.
First, we will be evaluating Excel data. Letâs install the package to do thatâ xlrd
. To install the package, we use pip in the following way:
pip install xlrd
To remove the package, we would run the uninstall
command:
pip uninstall xlrd
Try installing, uninstalling, and then reinstalling xlrd
. Itâs good to get a handle on the pip
commands, as youâll be using them throughout this book and your data wrangling career.
Why did we choose xlrd
when there are many possible packages? Choosing a Python library is an imperfect process. There are different ways to go about your selection. Donât worry about trying to figure out what is the right library. When you are perfecting your skills and you find a couple of options, use the library that makes sense to you.
The first thing we recommend is searching the Web to see which libraries other people recommend. If you search for âparse excel using pythonâ, you will find the xlrd
library surfaces at the top of the search results.
However, the answer is not always obvious. In Chapter 13, we will learn more about the selection process when looking into Twitter libraries.
Parsing Excel Files
Sometimes the easiest way to extract data from an Excel sheet is finding a better way to get the data. There are times when parsing is not the answer. Before you start parsing, ask yourself the following questions:
-
Have you tried to find the data in another form? Sometimes other forms might be available from the same source.
-
Have you tried to use a phone to figure out if the data is available in another form? Check out Chapter 6 for more tips.
-
Have you tried to export the tab or tabs into CSV form from Excel (or your document reader)? This is a good solution if you only have a couple of tabs of data or isolated data in one tab on the Excel sheet.
If you have exhausted these options and you still donât have the data you need, youâll need to use Python to parse your Excel file.
Getting Started with Parsing
The library we identified for parsing Excel files is xlrd
. This library is part of a series of libraries for working with Excel files in Python.
There are three main libraries for handling Excel files:
xlrd
-
Reads Excel files
xlwt
xlutils
-
A set of tools for more advanced operations in Excel (requires
xlrd
andxlwt
)
Youâll need to install each separately if you want to use them; however, in this chapter we will only use xlrd
. Because we want to read Excel files into Python, youâll need to make sure you have xlrd
installed before continuing:
pip install xlrd
Warning
If you get the following error, that means you donât have pip installed:
-bash: pip: command not found
For installation instructions, refer to âInstall pipâ or https://pip.pypa.io/en/latest/installing/.
Set up your work environment for this Excel file by doing the following (or something like it, depending on your organizational system):
-
Create a folder for your Excel work.
-
Create a new Python file called parse_excel.py and put it in the folder you created.
-
Place the Excel file from the bookâs repository called SOWC 2014 Stat Tables_Table 9.xlsx in the same folder.
From this folder, type the following command in your terminal to run the script from the command line:
python parse_excel.py
By the end of this chapter, we will write a script to parse child labor and marriage data stored in this Excel file.
To start our script, we need to import xlrd
and open our Excel workbook in Python. We store the opened file in the book
variable:
import
xlrd
book
=
xlrd
.
open_workbook
(
'SOWC 2014 Stat Tables_Table 9.xlsx'
)
Unlike CSVs, Excel books can have multiple tabs or sheets. To get at our data, we are going to pull out only the sheet with the data we want.
If you have a couple of sheets, you could just guess at the index, but that wonât work if you have lots of sheets. So, you should know about the command book.sheet_by_name(somename
), where somename
is the name of the sheet you want to access.
Letâs check out the names of the sheets we have:
import
xlrd
book
=
xlrd
.
open_workbook
(
'SOWC 2014 Stat Tables_Table 9.xlsx'
)
for
sheet
in
book
.
sheets
():
sheet
.
name
The sheet that we are looking for is Table 9
. So, letâs put that into our script:
import
xlrd
book
=
xlrd
.
open_workbook
(
'SOWC 2014 Stat Tables_Table 9.xlsx'
)
sheet
=
book
.
sheet_by_name
(
'Table 9'
)
sheet
If you run that code, it exits with an error that provides you with the following information:
xlrd
.
biffh
.
XLRDError
:
No
sheet
named
<
'Table 9'
>
At this point, you might be really confused. The problem lies in the difference between what we see and what actually exists.
If you open up your Excel workbook and select the name of the sheet by double-clicking it, you will find that there is an extra space at the end. This space is not visible to users in the browser. In Chapter 7, we will learn how to troubleshoot this in Python. For now, update your code to reflect the space.
Change this line:
sheet
=
book
.
sheet_by_name
(
'Table 9'
)
to this:
sheet
=
book
.
sheet_by_name
(
'Table 9 '
)
Now, if we run our script it should work. You will see output similar to this:
<xlrd.sheet.Sheet object at 0x102a575d0>
Letâs explore what we can do with a sheet. Add the following after you assign the sheet
variable and rerun your script:
dir
(
sheet
)
In the returned list, youâll see a method called nrows
. We will use this method to iterate over all rows. If we write print sheet.nrows
, the total number of rows will be returned.
Try this now:
sheet
.
nrows
You should have gotten back 303
. We need to iterate over each row, which means we need a for
loop. As we learned in âHow to Import CSV Dataâ, for
loops iterate over items in a list, so we need to turn 303
into a list we can iterate over 303 times. To do this, we will use the range
function.
With the addition of the range
function we can transform 303
into a list our for
loop can iterate over, our script should look like the following:
import
xlrd
book
=
xlrd
.
open_workbook
(
'
SOWC 2014 Stat Tables_Table 9.xlsx
'
)
sheet
=
book
.
sheet_by_name
(
'
Table 9
'
)
for
i
in
range
(
sheet
.
nrows
)
:
i
Loops over the index
i
inrange(303)
, which will be a list of 303 integers incrementing by one.Outputs
i
, which will be the numbers from 0 to 302.
From here, we need to do a lookup on each of the rows to pull out the contents of each row instead of just printing the number. To do a lookup, we will use i
as an index reference to take the nth row.
To get each rowâs values we will use row_values
, which was another method returned by dir(sheet)
earlier. We can see from the row_values
documentation that the method expects an index number and returns the corresponding rowâs values. Update your for
loop to reflect this and rerun your script:
for
i
in
range
(
sheet
.
nrows
)
:
sheet
.
row_values
(
i
)
Uses
i
as the index to look up the rowâs values. Because it is in afor
loop that spans the length of the sheet, we call this method for each row in our sheet.
When you run this code, you will see a list for each row. The following is a subset of the data youâll see:
[
''
,
u
'TABLE 9. CHILD PROTECTION'
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
[
''
,
''
,
u
'TABLEAU 9. PROTECTION DE L
\u2019
ENFANT'
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
[
''
,
''
,
''
,
u
'TABLA 9. PROTECCI
\xd3
N INFANTIL'
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
[
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
[
''
,
u
'Countries and areas'
,
''
,
''
,
u
'Child labour (%)+
\n
2005
\u2013
2012*'
,
''
,
''
,
''
,
''
,
''
,
u
'Child marriage (%)
\n
2005
\u2013
2012*'
,
''
,
''
,
''
,
u
'Birth registration (%)+
\n
2005
\u2013
2012*'
,
''
,
u
'Female genital mutilation/
cutting
(
%
)
+
\n2002
\u20132012
*
', '', '', '', '', '', u'
Justification
of
wife
beating
(
%
)
\n
2005
\u20132012
*
', '', '', '', u'
Violent
discipline
(
%
)
+
\n2005
\u20132012
*
', '', '', '', '', '', '', '', '', '', '', '', '', '',
''
,
''
,
''
,
''
]
Now that we can see each row, we need to pull out the information we want. To help us determine what information we need and how to get it, itâs much easier to open up the file in a program for displaying Excel files, such as Microsoft Excel on Windows or Numbers on Mac. If you visit the second tab on the spreadsheet, you will notice quite a few header rows.
Note
In our code, we will aim to grab the English text. However, if you want an extra challenge, try to pull out the French or Spanish headings and countries.
On the second tab, look at the information you can extract and think about how to best organize it. We provide one possible way to do this here, but there are many different ways using different data structures.
For this exercise, we will pull out child labor and child marriage statistics. The following is one way to organize the dataâweâll use this as an example to work toward:
{
u
'
Afghanistan
'
:
{
'
child_labor
'
:
{
'
female
'
:
[
9.6
,
'
'
]
,
'
male
'
:
[
11.0
,
'
'
]
,
'
total
'
:
[
10.3
,
'
'
]
}
,
'
child_marriage
'
:
{
'
married_by_15
'
:
[
15.0
,
'
'
]
,
'
married_by_18
'
:
[
40.4
,
'
'
]
}
}
,
u
'
Albania
'
:
{
'
child_labor
'
:
{
'
female
'
:
[
9.4
,
u
'
'
]
,
'
male
'
:
[
14.4
,
u
'
'
]
,
'
total
'
:
[
12.0
,
u
'
'
]
}
,
'
child_marriage
'
:
{
'
married_by_15
'
:
[
0.2
,
'
'
]
,
'
married_by_18
'
:
[
9.6
,
'
'
]
}
}
,
.
.
.
}
If you are viewing the data in Excel, some of these numbers might appear off. This is because Excel will often round numbers. We are showing the numbers you will find when you use Python to parse the cells.
Tip
Planning what you want the outcome to look like and writing an example of your data will save you time as you begin coding. Once you have identified how youâd like to format your data, you can ask yourself, âWhat do I need to do next to get there?â This is especially helpful when you feel blocked on your next step.
There are two Python constructs we are going to use to pull the data out. The first method we will use is a nested for
loop, which is a for
loop inside another for
loop. This is often used when you have x rows that contain y objects. To access each object you need a for
loop for each row, then another for
loop for each object. We also used a nested for
loop in an example in Chapter 3.
We are going to use a nested for
loop to output each cell from each row. This will output the items we saw earlier, where each row was listed.
for
i
in
xrange
(
sheet
.
nrows
)
:
row
=
sheet
.
row_values
(
i
)
for
cell
in
row
:
cell
Takes the list that is each row and saves it to the
row
variable. This makes our code more readable.Loops over each item in the list, which represents each cell for the current row.
Outputs the cell value.
If you run your complete code with the nested for
loop, you will notice your output is not so helpful anymore. That brings us to the second mechanism to explore our Excel fileâa counter.
Letâs add a counter to our code so we can step through the cells and rows to find what we want to pull out. Be careful where you place the counterâyou will have very different results if you place it on the cell level versus the row level.
Reset your for
loop to look like the code shown here:
count
=
0
for
i
in
xrange
(
sheet
.
nrows
)
:
if
count
<
10
:
row
=
sheet
.
row_values
(
i
)
i
,
row
count
+
=
1
Now, if we go back to what we want our final output to look like, what we really need to figure out is where the country names begin. Remember, the country names are the first keys of our output dictionary:
{
u
'Afghanistan'
:
{
...
},
u
'Albania'
:
{
...
},
...
}
If you run your script with the counter in it where count < 10
, you will see from the output that we have not yet reached the row where the country names start.
Because we are skipping a few lines to get to the data we are interested in, we are looking to identify which row number we will need to start our data collection. From our previous attempt, we know the country names start past row 10. But how can we tell where to start?
The answer is in the next code example, but before you look, try updating the counter to start at the row where the country names start. (There are multiple ways to do this, so if your answer is slightly different than what we have in the following code example, thatâs OK.)
After you identify the proper row number, you will need to add an if
statement to begin pulling out values after that row. This is so we only work with the data below that line.
If you were able to get that working, your code should like something like this:
count
=
0
for
i
in
xrange
(
sheet
.
nrows
)
:
if
count
<
20
:
if
i
>
=
14
:
row
=
sheet
.
row_values
(
i
)
i
,
row
count
+
=
1
This line will iterate through the first 20 rows to identify which row the country names begin on.
This
if
statement starts the output at the point where the country rows appear.
At this point, you should have output that looks like this:
14
[
''
,
u
'Afghanistan'
,
u
'Afghanistan'
,
u
'Afganist
\xe1
n'
,
10.3
,
''
,
11.0
,
''
,
9.6
,
''
,
15.0
,
''
,
40.4
,
''
,
37.4
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
90.2
,
''
,
74.4
,
''
,
74.8
,
''
,
74.1
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
15
[
''
,
u
'Albania'
,
u
'Albanie'
,
u
'Albania'
,
12.0
,
u
' '
,
14.4
,
u
' '
,
9.4
,
u
' '
,
0.2
,
''
,
9.6
,
''
,
98.6
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
36.4
,
''
,
29.8
,
''
,
75.1
,
''
,
78.3
,
''
,
71.4
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
16
[
''
,
u
'Algeria'
,
u
'Alg
\xe9
rie'
,
u
'Argelia'
,
4.7
,
u
'y'
,
5.5
,
u
'y'
,
3.9
,
u
'y'
,
0.1
,
''
,
1.8
,
''
,
99.3
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
u
'
\u2013
'
,
''
,
67.9
,
''
,
87.7
,
''
,
88.8
,
''
,
86.5
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
,
''
]
....
more
Now, we need to turn each row into our dictionary format. This will make the data more meaningful to us when we try to do other things with it in future chapters.
Looking back at our earlier example of how we want our output to be organized, we are going to need a dictionary and we are going to use countries as keys. To pull out the country names, we will need to do some indexing.
Letâs add a dictionary to our code, and then pull out the country name from each row and add it as a key to our dictionary.
Update your for
loop to reflect this:
count
=
0
data
=
{
}
for
i
in
xrange
(
sheet
.
nrows
)
:
if
count
<
10
:
if
i
>
=
14
:
row
=
sheet
.
row_values
(
i
)
country
=
row
[
1
]
data
[
country
]
=
{
}
count
+
=
1
data
This creates an empty dictionary to store our data.
row[1]
pulls out the country from each row we iterate over.data[country]
adds the country as a key to thedata
dictionary. We set the value to another dictionary, because that is where we are going to store our data in the following steps.This outputs the data, so we can see what it looks like.
At this point, your output should look something like this:
{
u
'Afghanistan'
:
{},
u
'Albania'
:
{},
u
'Angola'
:
{},
u
'Algeria'
:
{},
u
'Andorra'
:
{},
u
'Austria'
:
{},
u
'Australia'
:
{},
u
'Antigua and Barbuda'
:
{},
u
'Armenia'
:
{},
u
'Argentina'
:
{}}
Now, we need to match up each of the values in the rest of the row with the proper values in the spreadsheet, then store them in our dictionary.
Note
As you try to pull out all the values and check them against your Excel sheet, you will make lots of errors. That is fine and expected. This process should be embracedâit means youâre working your way through the problem.
First, letâs create an empty version of our data structure where we can store our data. Letâs also remove our counter, as we know that the rows of data start at line 14. Because we know xrange
can accept a start and end point, we can begin our counting at 14 and end at the end of the file. Letâs take a look at our updated code:
data
=
{
}
for
i
in
xrange
(
14
,
sheet
.
nrows
)
:
row
=
sheet
.
row_values
(
i
)
country
=
row
[
1
]
data
[
country
]
=
{
'
child_labor
'
:
{
'
total
'
:
[
]
,
'
male
'
:
[
]
,
'
female
'
:
[
]
,
}
,
'
child_marriage
'
:
{
'
married_by_15
'
:
[
]
,
'
married_by_18
'
:
[
]
,
}
}
data
[
'
Afghanistan
'
]
We can remove all of our references to the counter and just begin our
for
loop starting at the 14th row of our sheet. This line begins the loop with a value of 14, so we automatically skip the lines we donât need for our dataset.This line expands the dictionary to multiple lines to fill out the other data points.
This creates the key
child_labor
and sets it equal to another dictionary.The dictionary has strings to explain each part of the data it holds. For each of these keys, the values are lists.
This outputs the values associated with the key
Afghanistan
.
Our output data for Afghanistan
looks like this:
{
'child_labor'
:
{
'total'
:
[],
'male'
:
[],
'female'
:
[]},
'child_marriage'
:
{
'married_by_18'
:
[],
'married_by_15'
:
[]}
}
Letâs now populate the data. Because we have access to each column of each row using the index, we can populate these lists with the values from the sheet. By looking at our sheet and lining up which columns relate to which parts of the data, we can update the data dictionary to reflect the following:
data
[
country
]
=
{
'
child_labor
'
:
{
'
total
'
:
[
row
[
4
]
,
row
[
5
]
]
,
'
male
'
:
[
row
[
6
]
,
row
[
7
]
]
,
'
female
'
:
[
row
[
8
]
,
row
[
9
]
]
,
}
,
'
child_marriage
'
:
{
'
married_by_15
'
:
[
row
[
10
]
,
row
[
11
]
]
,
'
married_by_18
'
:
[
row
[
12
]
,
row
[
13
]
]
,
}
}
Because there are two cells for each of the columns, our code stores both values. Because in this line our child labor totals are the fifth and sixth columns and we know Python is zero-indexed, we want the fourth and fifth indexes.
When we run our code again, we get output like this:
{
'child_labor'
:
{
'female'
:
[
9.6
,
''
],
'male'
:
[
11.0
,
''
],
'total'
:
[
10.3
,
''
]},
'child_marriage'
:
{
'married_by_15'
:
[
15.0
,
''
],
'married_by_18'
:
[
40.4
,
''
]}}
}
Warning
Before you continue, output a couple of records and check the number in the dictionary. It is easy to end up one index off and ruin the rest of your data.
Finally, to preview our data, we can use pprint
instead of a print
statement. In complicated data structures (like dictionaries), this makes it a lot easier to review the output. Add the following to the end of your file to preview the data in a formatted fashion:
import
pprint
pprint
.
pprint
(
data
)
Imports the
pprint
library. Normally,import
statements come at the beginning of the file, but we are putting it here for simplicity. After you are done, you will want to delete these lines, because they are not critical to your script.Passes data to the
pprint.pprint()
function.
If you scroll through your output, you will notice the majority of it looks good. But there are a couple of records that seem out of place.
If you look at the spreadsheet, you should note the last row for countries is Zimbabwe. So, we want to look for when the country is equal to 'Zimbabwe'
, and exit there. To exit, we add a break
to our code, which is how we prematurely break the for
loop to continue with the rest of script. Letâs add that as our stopping point. At the end of the for
loop, add the following and rerun your code:
if
country
==
'
Zimbabwe
'
:
break
Warning
After adding the break
, did you end up with a NameError: name 'country' is not defined
error? If so, check your indentation. The if
statement should be indented four spaces to be in the for
loop.
Stepping through code can be helpful in identifying an issue. If you need to troubleshoot to figure out what a variable, such as country
, is equal to in a for
loop, try adding print
statements inside the for
loop and watching the values before your script exits with an error. They will likely give you a hint as to what is happening.
At this point, our scriptâs output matches our end goal. The last thing we want to do to our script is to make sure we document it with some comments.
Your script should now look something like this:
""" This is a script to parse child labor and child marriage data.
The Excel file used in this script can be found here: http://www.unicef.org/sowc2014/numbers/ """
import
xlrd
book
=
xlrd
.
open_workbook
(
'
SOWC 2014 Stat Tables_Table 9.xlsx
'
)
sheet
=
book
.
sheet_by_name
(
'
Table 9
'
)
data
=
{
}
for
i
in
xrange
(
14
,
sheet
.
nrows
)
:
# Start at 14th row, because that is where the country data begins
row
=
sheet
.
row_values
(
i
)
country
=
row
[
1
]
data
[
country
]
=
{
'
child_labor
'
:
{
'
total
'
:
[
row
[
4
]
,
row
[
5
]
]
,
'
male
'
:
[
row
[
6
]
,
row
[
7
]
]
,
'
female
'
:
[
row
[
8
]
,
row
[
9
]
]
,
}
,
'
child_marriage
'
:
{
'
married_by_15
'
:
[
row
[
10
]
,
row
[
11
]
]
,
'
married_by_18
'
:
[
row
[
12
]
,
row
[
13
]
]
,
}
}
if
country
==
'
Zimbabwe
'
:
break
import
pprint
pprint
.
pprint
(
data
)
This is a multiline comment used to generally describe what is going on in this script.
This is a single-line comment to document why we start on line 14 and not earlier.
We can and should remove these lines as we move beyond simple parsing of data into data analysis.
At this point, we have a similar output to the previous chapterâs data. In the next chapter, we will take this a step further and parse the same data from a PDF.
Summary
The Excel format is an odd in-between category that is kind of machine readable. Excel files were not meant to be read by programs, but they are parsable.
To handle this nonstandard format, we had to install external libraries. There are two ways to find libraries: by looking on PyPI, the Python package index, or by searching for tutorials and how-tos to see what other people have done.
Once you have identified the library you want to install, use the pip install
command to do it; to uninstall a library, use pip uninstall
.
Besides learning how to parse Excel using the xlrd
library, we also learned a few new Python programming concepts, which are summarized in Table 4-1.
Concept | Purpose |
---|---|
This turns a number into a consecutive list of numbers. Example: |
|
Counting starts at 0, not 1 |
This is a computer construct to be aware of; it occurs throughout programming. It is important to note when using |
Use this to pull out a specific subset of a string or list. |
|
Counters |
Use this as a tool to control |
Nested |
Use when iterating over a data structure within a data structure, such as a list of lists, a list of dictionaries, or a dictionary of dictionaries. |
|
|
|
You can exit from a |
Commenting |
It is important to keep all your code commented, so you know what is happening for future reference. |
As you read on and dig into PDFs, you will learn the importance of exploring alternatives to the data you have or finding alternate ways to locate and find data you need to answer your research questions.
Get Data Wrangling with Python now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.