Chapter 6. Scatterplot Maps

In this chapter, we cover the seven steps as laid out in Chapter 1 and apply them to the question, “How do zip codes relate to geography?” (The background for this project was introduced in Chapter 1.)

Preprocessing

Data is always dirty, and once you’ve found your data set, you’ll need to clean it up. As in the previous chapter, we’ll go through the steps of acquiring and parsing in detail. None of this is rocket science, but again, it’s meant to familiarize you with the various formats in which you’ll find data, and alert you to some of the common issues you’ll encounter along the way. If you just want to start playing with locations and maps, you can download the finished zips.tsv file from the book web site (http://benfry.com/writing/zipdecode/zips.tsv) and jump ahead to the next section.

Data from the U.S. Census Bureau (Acquire)

The acronym ZIP stands for Zoning Improvement Plan, a 1963 initiative to simplify the delivery of mail in the United States. Personal correspondence, once the majority of all mail, was rapidly being overtaken by business mail, which by the 1960s accounted for 80% of the post. Faced with an ever-increasing amount of mail to process, the U.S. Postal Service initiated the zip system to specify more accurately the geographic area of the mail’s destination. The U.S. Postal Service’s web site features a lengthier history of the system at http://www.usps.com/history.

Versions of the zip code database are available from a variety of sources. ...

Get Visualizing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.