Python for Data Science

Book description

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. Youâ??ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support.

You will discover Pythonâ??s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

Table of contents

  1. Title Page
  2. Copyright
  3. About the Author
  4. Introduction
    1. Using Python for Data Science
    2. Who Should Read This Book?
    3. What’s in the Book?
  5. Chapter 1: The Basics of Data
    1. Categories of Data
      1. Unstructured Data
      2. Structured Data
      3. Semistructured Data
      4. Time Series Data
    2. Sources of Data
      1. APIs
      2. Web Pages
      3. Databases
      4. Files
    3. The Data Processing Pipeline
      1. Acquisition
      2. Cleansing
      3. Transformation
      4. Analysis
      5. Storage
    4. The Pythonic Way
    5. Summary
  6. Chapter 2: Python Data Structures
    1. Lists
      1. Creating a List
      2. Using Common List Object Methods
      3. Using Slice Notation
      4. Using a List as a Queue
      5. Using a List as a Stack
      6. Using Lists and Stacks for Natural Language Processing
      7. Making Improvements with List Comprehensions
    2. Tuples
      1. A List of Tuples
      2. Immutability
    3. Dictionaries
      1. A List of Dictionaries
      2. Adding to a Dictionary with setdefault()
      3. Loading JSON into a Dictionary
    4. Sets
      1. Removing Duplicates from Sequences
      2. Performing Common Set Operations
      3. Exercise #1: Improved Photo Tag Analysis
    5. Summary
  7. Chapter 3: Python Data Science Libraries
    1. NumPy
      1. Installing NumPy
      2. Creating a NumPy Array
      3. Performing Element-Wise Operations
      4. Using NumPy Statistical Functions
      5. Exercise #2: Using NumPy Statistical Functions
    2. pandas
      1. pandas Installation
      2. pandas Series
      3. Exercise #3: Combining Three Series
      4. pandas DataFrames
      5. Exercise #4: Using Different Joins
    3. scikit-learn
      1. Installing scikit-learn
      2. Obtaining a Sample Dataset
      3. Loading the Sample Dataset into a pandas DataFrame
      4. Splitting the Sample Dataset into a Training Set and a Test Set
      5. Transforming Text into Numerical Feature Vectors
      6. Training and Evaluating the Model
      7. Making Predictions on New Data
    4. Summary
  8. Chapter 4: Accessing Data from Files and APIs
    1. Importing Data Using Python’s open() Function
      1. Text Files
      2. Tabular Data Files
      3. Exercise #5: Opening JSON Files
      4. Binary Files
    2. Exporting Data to Files
    3. Accessing Remote Files and APIs
      1. How HTTP Requests Work
      2. The urllib3 Library
      3. The Requests Library
      4. Exercise #6: Accessing an API with Requests
    4. Moving Data to and from a DataFrame
      1. Importing Nested JSON Structures
      2. Converting a DataFrame to JSON
      3. Exercise #7: Manipulating Complex JSON Structures
      4. Loading Online Data into a DataFrame with pandas-datareader
    5. Summary
  9. Chapter 5: Working with Databases
    1. Relational Databases
      1. Understanding SQL Statements
      2. Getting Started with MySQL
      3. Defining the Database Structure
      4. Inserting Data into the Database
      5. Querying Database Data
      6. Exercise #8: Performing a One-to-Many Join
      7. Using Database Analytics Tools
    2. NoSQL Databases
      1. Key-Value Stores
      2. Document-Oriented Databases
      3. Exercise #9: Inserting and Querying Multiple Documents
    3. Summary
  10. Chapter 6: Aggregating Data
    1. Data to Aggregate
    2. Combining DataFrames
    3. Grouping and Aggregating the Data
      1. Viewing Specific Aggregations by MultiIndex
      2. Slicing a Range of Aggregated Values
      3. Slicing Within Aggregation Levels
      4. Adding a Grand Total
      5. Adding Subtotals
      6. Exercise #10: Excluding Total Rows from the DataFrame
    4. Selecting All Rows in a Group
    5. Summary
  11. Chapter 7: Combining Datasets
    1. Combining Built-in Data Structures
      1. Combining Lists and Tuples with +
      2. Combining Dictionaries with **
      3. Combining Corresponding Rows from Two Structures
      4. Implementing Different Types of Joins for Lists
    2. Concatenating NumPy Arrays
      1. Exercise #11: Adding New Rows/Columns to a NumPy Array
    3. Combining pandas Data Structures
      1. Concatenating DataFrames
      2. Joining Two DataFrames
    4. Summary
  12. Chapter 8: Creating Visualizations
    1. Common Visualizations
      1. Line Graphs
      2. Bar Graphs
      3. Pie Charts
      4. Histograms
    2. Plotting with Matplotlib
      1. Installing Matplotlib
      2. Using matplotlib.pyplot
      3. Working with Figure and Axes Objects
      4. Exercise #12: Combining Bins into an “Other” Slice
    3. Using Other Libraries with Matplotlib
      1. Plotting pandas Data
      2. Plotting Geospatial Data with Cartopy
      3. Exercise #13: Drawing a Map with Cartopy and Matplotlib
    4. Summary
  13. Chapter 9: Analyzing Location Data
    1. Obtaining Location Data
      1. Turning a Human-Readable Address into Geo Coordinates
      2. Getting the Geo Coordinates of a Moving Object
    2. Spatial Data Analysis with geopy and Shapely
      1. Finding the Closest Object
      2. Finding Objects in a Certain Area
      3. Exercise #14: Defining Two or More Polygons
      4. Combining Both Approaches
      5. Exercise #15: Further Improving the Pick-Up Algorithm
    3. Combining Spatial and Nonspatial Data
      1. Deriving Nonspatial Attributes
      2. Exercise #16: Filtering Data with a List Comprehension
      3. Joining Spatial and Nonspatial Datasets
    4. Summary
  14. Chapter 10: Analyzing Time Series Data
    1. Regular vs. Irregular Time Series
    2. Common Time Series Analysis Techniques
      1. Calculating Percentage Changes
      2. Rolling Window Calculations
      3. Calculating the Percentage Change of a Rolling Average
    3. Multivariate Time Series
      1. Processing Multivariate Time Series
      2. Analyzing Dependencies Between Variables
      3. Exercise #17: Adding More Metrics to Analyze Dependencies
    4. Summary
  15. Chapter 11: Gaining Insights from Data
    1. Association Rules
      1. Support
      2. Confidence
      3. Lift
    2. The Apriori Algorithm
      1. Creating a Transaction Dataset
      2. Identifying Frequent Itemsets
      3. Generating Association Rules
    3. Visualizing Association Rules
    4. Gaining Actionable Insights from Association Rules
      1. Generating Recommendations
      2. Planning Discounts Based on Association Rules
      3. Exercise #18: Mining Real Transaction Data
    5. Summary
  16. Chapter 12: Machine Learning for Data Analysis
    1. Why Machine Learning?
    2. Types of Machine Learning
      1. Supervised Learning
      2. Unsupervised Learning
    3. How Machine Learning Works
      1. Data to Learn From
      2. A Statistical Model
      3. Previously Unseen Data
    4. A Sentiment Analysis Example: Classifying Product Reviews
      1. Obtaining Product Reviews
      2. Cleansing the Data
      3. Splitting and Transforming the Data
      4. Training the Model
      5. Evaluating the Model
      6. Exercise #19: Expanding the Example Set
    5. Predicting Stock Trends
      1. Getting Data
      2. Deriving Features from Continuous Data
      3. Generating the Output Variable
      4. Training and Evaluating the Model
      5. Exercise #20: Experimenting with Different Stocks and New Metrics
    6. Summary
  17. Index

Product information

  • Title: Python for Data Science
  • Author(s): Yuli Vasiliev
  • Release date: August 2022
  • Publisher(s): No Starch Press
  • ISBN: 9781718502208