Python Real-World Projects

Book description

Develop Python applications using an enterprise-based approach with unit and acceptance tests by following agile methods to create a minimum viable product (MVP) and iteratively add features

Key Features

  • Master Python and related technologies by working on 12 hands-on projects
  • Accelerate your career by building a personal project portfolio
  • Explore data acquisition, preparation, and analysis applications
  • Purchase of the print or Kindle book includes a free PDF eBook

Book Description

In today's competitive job market, a project portfolio often outshines a traditional resume. Python Real-World Projects empowers you to get to grips with crucial Python concepts while building complete modules and applications. With two dozen meticulously designed projects to explore, this book will help you showcase your Python mastery and refine your skills. Tailored for beginners with a foundational understanding of class definitions, module creation, and Python's inherent data structures, this book is your gateway to programming excellence. You’ll learn how to harness the potential of the standard library and key external projects like JupyterLab, Pydantic, pytest, and requests. You’ll also gain experience with enterprise-oriented methodologies, including unit and acceptance testing, and an agile development approach. Additionally, you’ll dive into the software development lifecycle, starting with a minimum viable product and seamlessly expanding it to add innovative features. By the end of this book, you’ll be armed with a myriad of practical Python projects and all set to accelerate your career as a Python programmer.

What you will learn

  • Explore core deliverables for an application including documentation and test cases
  • Discover approaches to data acquisition such as file processing, RESTful APIs, and SQL queries
  • Create a data inspection notebook to establish properties of source data
  • Write applications to validate, clean, convert, and normalize source data
  • Use foundational graphical analysis techniques to visualize data
  • Build basic univariate and multivariate statistical analysis tools
  • Create reports from raw data using JupyterLab publication tools

Who this book is for

This book is for beginner-to-intermediate level Python programmers looking to enhance their resume by adding a portfolio of 12 practical projects. A basic understanding of the Python language and its aligned technologies is a must. The book helps you polish your Python skills and project-building prowess without delving into basic Python fundamentals.

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. A note on skills required
    4. To get the most out of this book
      1. Complete the extras
      2. Download the example code files
    5. Conventions used
    6. Get in touch
    7. Share your thoughts
    8. Download a free PDF copy of this book
  2. Chapter 1: Project Zero: A Template for Other Projects
    1. 1.1 On quality
      1. 1.1.1 More Reading on Quality
    2. 1.2 Suggested project sprints
      1. 1.2.1 Inception
      2. 1.2.2 Elaboration, part 1: define done
      3. 1.2.3 Elaboration, part 2: define components and tests
      4. 1.2.4 Construction
      5. 1.2.5 Transition
    3. 1.3 List of deliverables
    4. 1.4 Development tool installation
    5. 1.5 Project 0 – Hello World with test cases
      1. 1.5.1 Description
      2. 1.5.2 Approach
      3. 1.5.3 Deliverables
      4. 1.5.4 Definition of done
    6. 1.6 Summary
    7. 1.7 Extras
      1. 1.7.1 Static analysis - mypy, flake8
      2. 1.7.2 CLI features
      3. 1.7.3 Logging
      4. 1.7.4 Cookiecutter
  3. Chapter 2: Overview of the Projects
    1. 2.1 General data acquisition
    2. 2.2 Acquisition via Extract
    3. 2.3 Inspection
    4. 2.4 Clean, validate, standardize, and persist
    5. 2.5 Summarize and analyze
    6. 2.6 Statistical modeling
    7. 2.7 Data contracts
    8. 2.8 Summary
  4. Chapter 3: Project 1.1: Data Acquisition Base Application
    1. 3.1 Description
      1. 3.1.1 User experience
      2. 3.1.2 About the source data
      3. 3.1.3 About the output data
    2. 3.2 Architectural approach
      1. 3.2.1 Class design
      2. 3.2.2 Design principles
      3. 3.2.3 Functional design
    3. 3.3 Deliverables
      1. 3.3.1 Acceptance tests
      2. 3.3.2 Additional acceptance scenarios
      3. 3.3.3 Unit tests
    4. 3.4 Summary
    5. 3.5 Extras
      1. 3.5.1 Logging enhancements
      2. 3.5.2 Configuration extensions
      3. 3.5.3 Data subsets
      4. 3.5.4 Another example data source
  5. Chapter 4: Data Acquisition Features: Web APIs and Scraping
    1. 4.1 Project 1.2: Acquire data from a web service
      1. 4.1.1 Description
      2. 4.1.2 Approach
      3. 4.1.3 Deliverables
    2. 4.2 Project 1.3: Scrape data from a web page
      1. 4.2.1 Description
      2. 4.2.2 About the source data
      3. 4.2.3 Approach
      4. 4.2.4 Deliverables
    3. 4.3 Summary
    4. 4.4 Extras
      1. 4.4.1 Locate more JSON-format data
      2. 4.4.2 Other data sets to extract
      3. 4.4.3 Handling schema variations
      4. 4.4.4 CLI enhancements
      5. 4.4.5 Logging
  6. Chapter 5: Data Acquisition Features: SQL Database
    1. 5.1 Project 1.4: A local SQL database
      1. 5.1.1 Description
      2. 5.1.2 Approach
      3. 5.1.3 Deliverables
    2. 5.2 Project 1.5: Acquire data from a SQL extract
      1. 5.2.1 Description
      2. 5.2.2 The Object-Relational Mapping (ORM) problem
      3. 5.2.3 About the source data
      4. 5.2.4 Approach
      5. 5.2.5 Deliverables
    3. 5.3 Summary
    4. 5.4 Extras
      1. 5.4.1 Consider using another database
      2. 5.4.2 Consider using a NoSQL database
      3. 5.4.3 Consider using SQLAlchemy to define an ORM layer
  7. Chapter 6: Project 2.1: Data Inspection Notebook
    1. 6.1 Description
      1. 6.1.1 About the source data
    2. 6.2 Approach
      1. 6.2.1 Notebook test cases for the functions
      2. 6.2.2 Common code in a separate module
    3. 6.3 Deliverables
      1. 6.3.1 Notebook .ipynb file
      2. 6.3.2 Executing a notebook’s test suite
    4. 6.4 Summary
    5. 6.5 Extras
      1. 6.5.1 Use pandas to examine data
  8. Chapter 7: Data Inspection Features
    1. 7.1 Project 2.2: Validating cardinal domains — measures, counts, and durations
    2. 7.1.1 Description
    3. 7.1.2 Approach
    4. 7.1.3 Deliverables
    5. 7.2 Project 2.3: Validating text and codes — nominal data and ordinal numbers
    6. 7.2.1 Description
    7. 7.2.2 Approach
    8. 7.2.3 Deliverables
    9. 7.3 Project 2.4: Finding reference domains
      1. 7.3.1 Description
      2. 7.3.2 Approach
      3. 7.3.3 Deliverables
    10. 7.4 Summary
    11. 7.5 Extras
      1. 7.5.1 Markdown cells with dates and data source information
      2. 7.5.2 Presentation materials
      3. 7.5.3 JupyterBook or Quarto for even more sophisticated output
  9. Chapter 8: Project 2.5: Schema and Metadata
    1. 8.1 Description
    2. 8.2 Approach
      1. 8.2.1 Define Pydantic classes and emit the JSON Schema
      2. 8.2.2 Define expected data domains in JSON Schema notation
      3. 8.2.3 Use JSON Schema to validate intermediate files
    3. 8.3 Deliverables
      1. 8.3.1 Schema acceptance tests
      2. 8.3.2 Extended acceptance testing
    4. 8.4 Summary
    5. 8.5 Extras
      1. 8.5.1 Revise all previous chapter models to use Pydantic
      2. 8.5.2 Use the ORM layer
  10. Chapter 9: Project 3.1: Data Cleaning Base Application
    1. 9.1 Description
      1. 9.1.1 User experience
      2. 9.1.2 Source data
      3. 9.1.3 Result data
      4. 9.1.4 Conversions and processing
      5. 9.1.5 Error reports
    2. 9.2 Approach
      1. 9.2.1 Model module refactoring
      2. 9.2.2 Pydantic V2 validation
      3. 9.2.3 Validation function design
      4. 9.2.4 Incremental design
      5. 9.2.5 CLI application
    3. 9.3 Deliverables
      1. 9.3.1 Acceptance tests
      2. 9.3.2 Unit tests for the model features
      3. 9.3.3 Application to clean data and create an NDJSON interim file
    4. 9.4 Summary
    5. 9.5 Extras
      1. 9.5.1 Create an output file with rejected samples
  11. Chapter 10: Data Cleaning Features
    1. 10.1 Project 3.2: Validate and convert source fields
      1. 10.1.1 Description
      2. 10.1.2 Approach
      3. 10.1.3 Deliverables
    2. 10.2 Project 3.3: Validate text fields (and numeric coded fields)
      1. 10.2.1 Description
      2. 10.2.2 Approach
      3. 10.2.3 Deliverables
    3. 10.3 Project 3.4: Validate references among separate data sources
      1. 10.3.1 Description
      2. 10.3.2 Approach
      3. 10.3.3 Deliverables
    4. 10.4 Project 3.5: Standardize data to common codes and ranges
      1. 10.4.1 Description
      2. 10.4.2 Approach
      3. 10.4.3 Deliverables
    5. 10.5 Project 3.6: Integration to create an acquisition pipeline
      1. 10.5.1 Description
      2. 10.5.2 Approach
      3. 10.5.3 Deliverables
    6. 10.6 Summary
    7. 10.7 Extras
      1. 10.7.1 Hypothesis testing
      2. 10.7.2 Rejecting bad data via filtering (instead of logging)
      3. 10.7.3 Disjoint subentities
      4. 10.7.4 Create a fan-out cleaning pipeline
  12. Chapter 11: Project 3.7: Interim Data Persistence
    1. 11.1 Description
    2. 11.2 Overall approach
      1. 11.2.1 Designing idempotent operations
    3. 11.3 Deliverables
      1. 11.3.1 Unit test
      2. 11.3.2 Acceptance test
      3. 11.3.3 Cleaned up re-runnable application design
    4. 11.4 Summary
    5. 11.5 Extras
      1. 11.5.1 Using a SQL database
      2. 11.5.2 Persistence with NoSQL databases
  13. Chapter 12: Project 3.8: Integrated Data Acquisition Web Service
    1. 12.1 Description
      1. 12.1.1 The data series resources
      2. 12.1.2 Creating data for download
    2. 12.2 Overall approach
      1. 12.2.1 OpenAPI 3 specification
      2. 12.2.2 RESTful API to be queried from a notebook
      3. 12.2.3 A POST request starts processing
      4. 12.2.4 The GET request for processing status
      5. 12.2.5 The GET request for the results
      6. 12.2.6 Security considerations
    3. 12.3 Deliverables
      1. 12.3.1 Acceptance test cases
      2. 12.3.2 RESTful API app
      3. 12.3.3 Unit test cases
    4. 12.4 Summary
    5. 12.5 Extras
      1. 12.5.1 Add filtering criteria to the POST request
      2. 12.5.3 Use Celery instead of concurrent.futures
      3. 12.5.4 Call external processing directly instead of running a subprocess
  14. Chapter 13: Project 4.1: Visual Analysis Techniques
    1. 13.1 Description
    2. 13.2 Overall approach
      1. 13.2.1 General notebook organization
      2. 13.2.2 Python modules for summarizing
      3. 13.2.3 PyPlot graphics
      4. 13.2.4 Iteration and evolution
    3. 13.3 Deliverables
      1. 13.3.1 Unit test
      2. 13.3.2 Acceptance test
    4. 13.4 Summary
    5. 13.5 Extras
      1. 13.5.1 Use Seaborn for plotting
      2. 13.5.2 Adjust color palettes to emphasize key points about the data
  15. Chapter 14: Project 4.2: Creating Reports
    1. 14.1 Description
      1. 14.1.1 Slide decks and presentations
      2. 14.1.2 Reports
    2. 14.2 Overall approach
      1. 14.2.1 Preparing slides
      2. 14.2.2 Preparing a report
      3. 14.2.3 Creating technical diagrams
    3. 14.3 Deliverables
    4. 14.4 Summary
    5. 14.5 Extras
      1. 14.5.1 Written reports with UML diagrams
  16. Chapter 15: Project 5.1: Modeling Base Application
    1. 15.1 Description
    2. 15.2 Approach
      1. 15.2.1 Designing a summary app
      2. 15.2.2 Describing the distribution
      3. 15.2.3 Use cleaned data model
      4. 15.2.4 Rethink the data inspection functions
      5. 15.2.5 Create new results model
    3. 15.3 Deliverables
      1. 15.3.1 Acceptance testing
      2. 15.3.2 Unit testing
      3. 15.3.3 Application secondary feature
    4. 15.4 Summary
    5. 15.5 Extras
      1. 15.5.1 Measures of shape
      2. 15.5.2 Creating PDF reports
      3. 15.5.3 Serving the HTML report from the data API
  17. Chapter 16: Project 5.2: Simple Multivariate Statistics
    1. 16.1 Description
      1. 16.1.1 Correlation coefficient
      2. 16.1.2 Linear regression
      3. 16.1.3 Diagrams
    2. 16.2 Approach
      1. 16.2.1 Statistical computations
      2. 16.2.2 Analysis diagrams
      3. 16.2.3 Including diagrams in the final document
    3. 16.3 Deliverables
      1. 16.3.1 Acceptance tests
      2. 16.3.2 Unit tests
    4. 16.4 Summary
    5. 16.5 Extras
      1. 16.5.1 Use pandas to compute basic statistics
      2. 16.5.2 Use the dask version of pandas
      3. 16.5.3 Use numpy for statistics
      4. 16.5.4 Use scikit-learn for modeling
      5. 16.5.5 Compute the correlation and regression using functional programming
  18. Chapter 17: Next Steps
    1. 17.1 Overall data wrangling
    2. 17.2 The concept of “decision support”
    3. 17.3 Concept of metadata and provenance
    4. 17.4 Next steps toward machine learning
    5. Why subscribe?
  19. Other Books You Might Enjoy
    1. Packt is searching for authors like you
    2. Share your thoughts
    3. Download a free PDF copy of this book
  20. Index

Product information

  • Title: Python Real-World Projects
  • Author(s): Steven F. Lott
  • Release date: September 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781803246765