Book description
As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data.
Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas.
- Refine a question of interest to one that can be studied with data
- Pursue data collection that may involve text processing, web scraping, etc.
- Glean valuable insights about data through data cleaning, exploration, and visualization
- Learn how to use modeling to describe the data
- Generalize findings beyond the data
Publisher resources
Table of contents
- Preface
- I. The Data Science Lifecycle
- 1. The Data Science Lifecycle
- 2. Questions and Data Scope
- 3. Simulation and Data Design
- 4. Modeling with Summary Statistics
- 5. Case Study: Why Is My Bus Always Late?
- II. Rectangular Data
- 6. Working with Dataframes Using pandas
- 7. Working with Relations Using SQL
- III. Understanding The Data
- 8. Wrangling Files
- 9. Wrangling Dataframes
- 10. Exploratory Data Analysis
- 11. Data Visualization
- 12. Case Study: How Accurate Are Air Quality Measurements?
- IV. Other Data Sources
- 13. Working with Text
- 14. Data Exchange
- V. Linear Modeling
- 15. Linear Models
- 16. Model Selection
- 17. Theory for Inference and Prediction
- 18. Case Study: How to Weigh a Donkey
- VI. Classification
- 19. Classification
- 20. Numerical Optimization
- 21. Case Study: Detecting Fake News
- Additional Material
- Data Sources
- Index
- About the Authors
Product information
- Title: Learning Data Science
- Author(s):
- Release date: September 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098113001
You might also like
book
Dive Into Data Science
Dive into the exciting world of data science with this practical introduction. Packed with essential skills …
book
Python Data Science Handbook
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, …
book
Python for Data Science
Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. …
book
Data Science: The Hard Parts
This practical guide provides a collection of techniques and best practices that are generally overlooked in …