Developing Kaggle Notebooks

Book description

Printed in Color Develop an array of effective strategies and blueprints to approach any new data analysis on the Kaggle platform and create Notebooks with substance, style and impact Leverage the power of Generative AI with Kaggle Models Purchase of the print or Kindle book includes a free PDF eBook

Key Features

  • Master the basics of data ingestion, cleaning, exploration, and prepare to build baseline models
  • Work robustly with any type, modality, and size of data, be it tabular, text, image, video, or sound
  • Improve the style and readability of your Notebooks, making them more impactful and compelling

Book Description

Developing Kaggle Notebooks introduces you to data analysis, with a focus on using Kaggle Notebooks to simultaneously achieve mastery in this fi eld and rise to the top of the Kaggle Notebooks tier. The book is structured as a sevenstep data analysis journey, exploring the features available in Kaggle Notebooks alongside various data analysis techniques.

For each topic, we provide one or more notebooks, developing reusable analysis components through Kaggle's Utility Scripts feature, introduced progressively, initially as part of a notebook, and later extracted for use across future notebooks to enhance code reusability on Kaggle. It aims to make the notebooks' code more structured, easy to maintain, and readable.

Although the focus of this book is on data analytics, some examples will guide you in preparing a complete machine learning pipeline using Kaggle Notebooks. Starting from initial data ingestion and data quality assessment, you'll move on to preliminary data analysis, advanced data exploration, feature qualifi cation to build a model baseline, and feature engineering. You'll also delve into hyperparameter tuning to iteratively refi ne your model and prepare for submission in Kaggle competitions. Additionally, the book touches on developing notebooks that leverage the power of generative AI using Kaggle Models.

What you will learn

  • Approach a dataset or competition to perform data analysis via a notebook
  • Learn data ingestion and address issues arising with the ingested data
  • Structure your code using reusable components
  • Analyze in depth both small and large datasets of various types
  • Distinguish yourself from the crowd with the content of your analysis
  • Enhance your notebook style with a color scheme and other visual effects
  • Captivate your audience with data and compelling storytelling techniques

Who this book is for

This book is suitable for a wide audience with a keen interest in data science and machine learning, looking to use Kaggle Notebooks to improve their skills and rise in the Kaggle Notebooks ranks. This book caters to: Beginners on Kaggle from any background Seasoned contributors who want to build various skills like ingestion, preparation, exploration, and visualization Expert contributors who want to learn from the Grandmasters to rise into the upper Kaggle rankings Professionals who already use Kaggle for learning and competing

Table of contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Get in touch
  2. Introducing Kaggle and Its Basic Functions
    1. The Kaggle platform
    2. Kaggle Competitions
    3. Kaggle Datasets
    4. Kaggle Code
    5. Kaggle Discussions
    6. Kaggle Learn
    7. Kaggle Models
    8. Summary
  3. Getting Ready for Your Kaggle Environment
    1. What is a Kaggle Notebook?
    2. How to create notebooks
    3. Exploring notebook capabilities
      1. Basic capabilities
      2. Advanced capabilities
        1. Setting a notebook as a utility script or adding utility scripts
        2. Adding and using secrets
        3. Using Google Cloud services in Kaggle Notebooks
        4. Upgrading your Kaggle Notebook to Google Cloud AI Notebooks
        5. Using a Notebook to automatically update a Dataset
    4. Using the Kaggle API to create, update, download, and monitor your notebooks
    5. Summary
  4. Starting Our Travel – Surviving the Titanic Disaster
    1. A closer look at the Titanic
    2. Conducting data inspection
      1. Understanding the data
      2. Analyzing the data
    3. Performing univariate analysis
    4. Performing multivariate analysis
    5. Extracting meaningful information from passenger names
    6. Creating a dashboard showing multiple plots
    7. Building a baseline model
    8. Summary
    9. References
  5. Take a Break and Have a Beer or Coffee in London
    1. Pubs in England
      1. Data quality check
      2. Data exploration
    2. Starbucks around the world
      1. Preliminary data analysis
      2. Univariate and bivariate data analysis
      3. Geospatial analysis
    3. Pubs and Starbucks in London
      1. Data preparation
      2. Geospatial analysis
    4. Summary
    5. References
  6. Get Back to Work and Optimize Microloans for Developing Countries
    1. Introducing the Kiva analytics competition
    2. More data, more insights – analyzing the Kiva data competition
      1. Understanding the borrower demographic
      2. Exploring MPI correlation with other factors
      3. Radar visualization of poverty dimensions
      4. Final remarks
    3. Telling a different story from a different dataset
      1. The plot
      2. The actual history
      3. Conclusion
    4. Summary
    5. References
  7. Can You Predict Bee Subspecies?
    1. Data exploration
      1. Data quality checks
      2. Exploring image data
      3. Locations
      4. Date and time
      5. Subspecies
      6. Health
      7. Others
      8. Conclusion
    2. Subspecies classification
      1. Splitting the data
      2. Data augmentation
      3. Building a baseline model
      4. Iteratively refining the model
    3. Summary
    4. References
  8. Text Analysis Is All You Need
    1. What is in the data?
      1. Target feature
      2. Sensitive features
    2. Analyzing the comments text
      1. Topic modeling
      2. Named entity recognition
      3. POS tagging
    3. Preparing the model
      1. Building the vocabulary
      2. Embedding index and embedding matrix
      3. Checking vocabulary coverage
      4. Iteratively improving vocabulary coverage
        1. Transforming to lowercase
        2. Removing contractions
        3. Removing punctuation and special characters
    4. Building a baseline model
    5. Transformer-based solution
    6. Summary
    7. References
  9. Analyzing Acoustic Signals to Predict the Next Simulated Earthquake
    1. Introducing the LANL Earthquake Prediction competition
    2. Formats for signal data
    3. Exploring our competition data
      1. Solution approach
    4. Feature engineering
      1. Trend feature and classic STA/LTA
      2. FFT-derived features
      3. Features derived from aggregate functions
      4. Features derived using the Hilbert transform and Hann window
      5. Features based on moving averages
    5. Building a baseline model
    6. Summary
    7. References
  10. Can You Find Out Which Movie Is a Deepfake?
    1. Introducing the competition
    2. Introducing competition utility scripts
      1. Video data utils
      2. Face and body detection utils
    3. Metadata exploration
    4. Video data exploration
      1. Visualizing sample files
      2. Performing object detection
    5. Summary
    6. References
  11. Unleash the Power of Generative AI with Kaggle Models
    1. Introducing Kaggle Models
    2. Prompting a foundation model
      1. Model evaluation and testing
      2. Model quantization
    3. Building a multi-task application with Langchain
    4. Code generation with Kaggle Models
    5. Creating a RAG system
    6. Summary
    7. References
  12. Closing Our Journey: How to Stay Relevant and on Top
    1. Learn from the best: observe successful Grandmasters
    2. Revisit and refine your work periodically
    3. Recognize other’s contributions, and add your personal touch
    4. Be quick: don’t wait for perfection
    5. Be generous: share your knowledge
    6. Step outside your comfort zone
    7. Be grateful
    8. Summary
    9. References
  13. Other Books You May Enjoy
  14. Index

Product information

  • Title: Developing Kaggle Notebooks
  • Author(s): Gabriel Preda
  • Release date: December 2023
  • Publisher(s): Packt Publishing
  • ISBN: 9781805128519