Chapter 7. A Case Study in Bilingual Data Science

Rick J. Scavetta

Boyan Angelov

In this final chapter, our goal is to present a case study that demonstrates a sample of all the concepts and tools we’ve shown throughout this book. Although data science provides a practically overwhelming diversity of methods and applications, we typically rely on a core tool kit in our daily work. Thus, it’s unlikely that you’ll make use of all the tools presented in this book (or this case study, for that matter). But that’s alright! We hope that you’ll focus on those parts of the case study that are most relevant to your work and that you’ll be inspired to be a modern, bilingual data scientist.

24 Years and 1.88 Million Wildfires

Our case study will focus on the US Wildfires dataset.¹ This dataset, released by the US Department of Agriculture (USDA), contains 1.88 million geo-referenced wildfire records. Collectively, these fires have resulted in the loss of 140 million acres of forest over 24 years. If you want to execute the code in this chapter, download the SQLite dataset from the USDA website directly or from Kaggle, and place it inside the ch07/data directory

There are 39 features, plus another shape variable in raw format. Many of these are unique identifiers or redundant categorical and continuous representations. Thus, to simplify our case study, we’ll focus on a few features listed in Table 7-1.

Get Python and R for the Modern Data Scientist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python and R for the Modern Data Scientist by Rick J. Scavetta, Boyan Angelov

Chapter 7. A Case Study in Bilingual Data Science

24 Years and 1.88 Million Wildfires

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly