Data Science, Analytics, and AI for Business and the Real World™

Video description

Right now, despite the Covid-19 economic contraction, traditional businesses are hiring data scientists in droves! Therefore, data scientist has become the top job in the U.S. for the last four years running.

However, data science has a difficult learning curve. This course seeks to fill all those gaps and has a comprehensive syllabus that tackles all the major components of data science knowledge.

You will be using data science to solve common business problems throughout this course. You will start with the basics of Python, Pandas, Scikit-learn, NumPy, Keras, Prophet, statsmod, SciPy, and more. You will learn statistics and probability for data science in detail. Then, you will learn visualization theory for data science and analytics using Seaborn, Matplotlib, and Plotly.

You will look at dashboard design using Google Data Studio along with machine learning and deep learning theory/tools.

Then, you will be solving problems using predictive modeling, classification, and deep learning. After this, you will move your focus to data analysis and statistical case studies, data science in marketing, and data science in retail.

Finally, you will see deployment to the cloud using Heroku to build a machine learning API.

By the end of this course, you will learn all the major components of data science and gain the confidence to enter the world of data science.

What You Will Learn

  • Look at machine learning algorithms with Scikit-learn
  • Create beautiful charts, graphs, and visualizations that tell a story with data
  • Understand common business problems and how to apply data science
  • Create data dashboards with Google Data Studio
  • Learn to apply data science in marketing and retail
  • Integrate big data analysis and machine learning with PySpark

Audience

This course is designed for beginners in data science; business analysts who wish to do more with their data; college graduates who lack real-world experience; business-oriented persons who would like to use data to enhance their business; software developers or engineers who would like to start learning data science. Anyone looking to become more employable as a data scientist and with an interest in using data to solve real-world problems will enjoy this course thoroughly.

No need to be a programming or math whiz; basic high school math will be sufficient.

About The Author

Rajeev Ratan: Rajeev Ratan is a data scientist with an MSc in artificial intelligence from the University of Edinburgh and a BSc in electrical and computer engineering from the University of West Indies. He has worked in several London tech start-ups as a data scientist, mostly in computer vision. He was a member of Entrepreneur First, a London-based start-up incubator, where he co-founded an EdTech start-up.

Later on, he worked in AI tech start-ups involved in the real estate and gambling sectors. Before venturing into data science, Rajeev worked as a radio frequency engineer for eight years. His research interests lie in deep learning and computer vision. He has created several online courses that are hosted on many global online portals.

Table of contents

  1. Chapter 1 : Introduction to the Course
    1. The Data Science Hype
    2. About Our Case Studies
    3. Why Data is the New Oil
    4. Defining Business Problems for Analytic Thinking and Data-Driven Decision Making
    5. 10 Data Science Projects Every Business Should Do!
    6. How Deep Learning is Changing Everything
    7. The Career Paths of a Data Scientist
    8. The Data Science Approach to Problems
  2. Chapter 2 : Set Up (Google Colab) and Download Code Files
    1. Downloading and Running Your Code
  3. Chapter 3 : Introduction to Python
    1. Why Use Python for Data Science?
    2. Python Introduction - Part 1 - Variables
    3. Python - Variables (Lists and Dictionaries)
    4. Python - Conditional Statements
    5. Python - Loops
    6. Python - Functions
    7. Python - Classes
  4. Chapter 4 : Pandas
    1. Introduction to Pandas
    2. Pandas 1 - Data Series
    3. Pandas 2A - DataFrames - Index, Slice, Stats, Finding Empty Cells
    4. Pandas 2B - DataFrames - Index, Slice, Stats, Finding Empty Cells, and Filtering
    5. Pandas 3A - Data Cleaning - Alter Columns/Rows, Missing Data, and String Operations
    6. Pandas 3B - Data Cleaning - Alter Columns/Rows, Missing Data, and String Operations
    7. Pandas 4 - Data Aggregation - GroupBy, Map, Pivot, Aggregate Functions
    8. Feature Engineer, Lambda, and Apply
    9. Concatenating, Merging, and Joining
    10. Time Series Data
    11. Advanced Operations - Iterows, Vectorization, and NumPy
    12. Advanced Operations - Map, Filter, Apply
    13. Advanced Operations - Parallel Processing
    14. Map Visualizations with Plotly - Cloropeths from Scratch - USA and World
    15. Map Visualizations with Plotly - Heatmaps, Scatter Plots, and Lines
  5. Chapter 5 : Statistics and Visualizations
    1. Introduction to Statistics
    2. Descriptive Statistics - Why Statistical Knowledge is So Important
    3. Descriptive Statistics 1 - Exploratory Data Analysis (EDA) and Visualizations
    4. Descriptive Statistics 2 - Exploratory Data Analysis (EDA) and Visualizations
    5. Sampling, Averages, and Variance, and How to Lie and Mislead with Statistics
    6. Sampling - Sample Sizes and Confidence Intervals - What Can You Trust?
    7. Types of Variables - Quantitative and Qualitative
    8. Frequency Distributions
    9. Frequency Distributions Shapes
    10. Analyzing Frequency Distributions - What is the Best Type of Wine? Red or White?
    11. Mean, Mode, and Median - Not as Simple as You Think
    12. Variance, Standard Deviation, and Bessel's Correction
    13. Covariance and Correlation - Do Amazon and Google Know You Better Than Anyone Else?
    14. Lying with Correlations - Divorce Rates in Maine Caused by Margarine Consumption
    15. The Normal Distribution and the Central Limit Theorem
    16. Z-Scores
  6. Chapter 6 : Probability Theory
    1. Introduction to Probability
    2. Estimating Probability
    3. Probability - Addition Rule
    4. Probability - Permutations and Combinations
    5. Bayes Theorem
  7. Chapter 7 : Hypothesis Testing
    1. Introduction to Hypothesis Testing
    2. Statistical Significance
    3. Hypothesis Testing - P Value
    4. Hypothesis Testing - Pearson Correlation
  8. Chapter 8 : A/B Testing - A Worked Example
    1. Understanding the Problem + Exploratory Data Analysis and Visualizations
    2. A/B Test Result Analysis
    3. A/B Testing a Worked Real-Life Example - Designing an A/B Test
    4. Statistical Power and Significance
    5. Analysis of A/B Test Results
  9. Chapter 9 : Data Dashboards - Google Data Studio
    1. Intro to Google Data Studio
    2. Opening Google Data Studio and Uploading Data
    3. Your First Dashboard Part 1
    4. Your First Dashboard Part 2
    5. Creating New Fields to Our data
    6. Pivot Tables - Total Profit
    7. Adding Filters to Tables
    8. Scorecard KPI Visualizations
    9. Scorecards with Time Comparison
    10. Bar Charts (Horizontal, Vertical, and Stacked)
    11. Line Charts
    12. Pie Charts, Donut Charts, and Tree Maps
    13. Time Series and Comparative Time Series Plots
    14. Scatter Plots
    15. Geographic Plots
    16. Bullet and Line Area Plots
    17. Sharing and Final Conclusions
    18. Our Executive Sales Dashboard
  10. Chapter 10 : Machine Learning
    1. Introduction to Machine Learning
    2. How Machine Learning enables Computers to Learn
    3. What is a Machine Learning Model?
    4. Types of Machine Learning
    5. Linear Regression - Introduction to Cost Functions and Gradient Descent
    6. Linear Regressions in Python from Scratch and Using Sklearn
    7. Polynomial and Multivariate Linear Regression
    8. Logistic Regression
    9. Support Vector Machines (SVMs)
    10. Decision Trees and Random Forests, and the Gini Index
    11. K-Nearest Neighbors (KNN)
    12. Assessing Performance - Confusion Matrix, Precision, and Recall
    13. Understanding the ROC and AUC Curve
    14. What Makes a Good Model? Regularization, Overfitting, Generalization, and Outliers
    15. Introduction to Neural Networks
    16. Types of Deep Learning Algorithms CNNs, RNNs, and LSTMs
  11. Chapter 11 : Deep Learning
    1. Neural Networks Chapter Overview
    2. Machine Learning Overview
    3. Neural Networks Explained
    4. Forward Propagation
    5. Activation Functions
    6. Training Part 1 - Loss Functions
    7. Training Part 2 - Backpropagation and Gradient Descent
    8. Backpropagation and Learning Rates - A Worked Example
    9. Regularization, Overfitting, Generalization, and Test Datasets
    10. Epochs, Iterations, and Batch Sizes
    11. Measuring Performance and the Confusion Matrix
    12. Review and Best Practices
  12. Chapter 12 : Unsupervised Learning - Clustering
    1. Introduction to Unsupervised Learning
    2. K-Means Clustering
    3. Choosing K
    4. K-Means - Elbow and Silhouette Method
    5. Agglomerative Hierarchical Clustering
    6. Mean Shift Clustering
    7. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
    8. DBSCAN in Python
    9. Expectation-Maximization (EM) Clustering Using Gaussian Mixture Models (GMM)
  13. Chapter 13 : Dimensionality Reduction
    1. Principal Component Analysis
    2. t-Distributed Stochastic Neighbor Embedding (t-SNE)
    3. PCA and t-SNE in Python with Visualization Comparisons
  14. Chapter 14 : Recommendation Systems
    1. Introduction to Recommendation Engines
    2. Before Recommending, How Do We Rate or Review Items?
    3. User Collaborative Filtering and Item/Content-Based Filtering
    4. The Netflix Prize and Matrix Factorization and Deep Learning as Latent-Factor Me
  15. Chapter 15 : Natural Language Processing
    1. Introduction to Natural Language Processing
    2. Modeling Language - The Bag of Words Model
    3. Normalization, Stop Word Removal, Lemmatizing/Stemming
    4. TF-IDF Vectorizer (Term Frequency — Inverse Document Frequency)
    5. Word2Vec - Efficient Estimation of Word Representations in Vector Space
  16. Chapter 16 : Big Data
    1. Introduction to Big Data
    2. Challenges in Big Data
    3. Hadoop, MapReduce, and Spark
    4. Introduction to PySpark
    5. RDDs, Transformations, Actions, Lineage Graphs, and Jobs
  17. Chapter 17 : Predicting the US 2020 Election
    1. Understanding Polling Data
    2. Cleaning and Exploring Our Dataset
    3. Data Wrangling Our Dataset
    4. Understanding the US Electoral System
    5. Visualizing Our Polling Data
    6. Statistical Analysis of Polling Data
    7. Polling Simulations
    8. Polling Simulation Result Analysis
    9. Visualizing Our results on a US Map
  18. Chapter 18 : Predicting Diabetes Cases
    1. Understanding and Preparing Our Healthcare Data
    2. First Attempt - Trying a Naive Model
    3. Trying Different Models and Comparing the Results
  19. Chapter 19 : Market Basket Analysis
    1. Understanding Our Dataset
    2. Data Preparation
    3. Visualizing Our Frequent Sets
  20. Chapter 20 : Predicting the World Cup Winner (Soccer/Football)
    1. Understanding and Preparing Our Soccer Datasets - Part 1
    2. Understanding and Preparing Our Soccer Datasets - Part 2
    3. Predicting Game Outcomes with Our Model
    4. Simulating the World Cup Outcome with Our Model
  21. Chapter 21 : Covid-19 Data Analysis and Flourish Bar Chart Race Visualization
    1. Understanding Our Covid-19 Data
    2. Analysis of the Most Recent Data
    3. World Visualizations
    4. Analyzing Confirmed Cases in Each Country
    5. Mapping Covid-19 Cases
    6. Animating Our Maps
    7. Comparing Countries and Continents
    8. Flourish Bar Chart Race - 1
    9. Flourish Bar Chart Race - 2
  22. Chapter 22 : Analyzing Olympic Winners
    1. Understanding Our Olympic Dataset
    2. Getting the Medals Per Country
    3. Analyzing the Winter Olympic Data and Viewing Medals Won Over Time
  23. Chapter 23 : Is Home Advantage Real in Soccer and Basketball
    1. Understanding Our Dataset and EDA
    2. Goal Difference Ratios Home Versus Away
    3. How Home Advantage Have Evolved Over Time
  24. Chapter 24 : IPL Cricket Data Analysis
    1. Loading and Understanding Our Cricket Dataset
    2. Man of the Match and Stadium Analysis
    3. Do Toss Winners Win More? And Team Versus Team Comparisons
  25. Chapter 25 : Streaming Services (Netflix, Hulu, Disney Plus, and Amazon Prime)
    1. Understanding Our Dataset
    2. EDA and Visualizations
    3. Best Movies Per Genre Platform Comparisons
  26. Chapter 26 : Micro Brewery and Pub Data Analysis
    1. EDA, Visualizations, and Map
  27. Chapter 27 : Pizza Restaurant Data Analysis
    1. EDA and Visualizations
    2. Analysis Per State
    3. Pizza Maps
  28. Chapter 28 : Supply Chain Data Analysis
    1. Understanding Our Dataset
    2. Visualizations and EDA
    3. More Visualizations
  29. Chapter 29 : Indian Election Result Analysis
    1. Introduction
    2. Visualizations of Election Results
    3. Visualizing Gender Turnout
  30. Chapter 30 : Africa Economic Crisis Data Analysis
    1. Economic Dataset Understanding
    2. Visualizations and Correlations
  31. Chapter 31 : Predicting Which Employees May Quit
    1. Figuring Out Which Employees May Quit - Understanding the Problem and EDA
    2. Data Cleaning and Preparation
    3. Machine Learning Modeling + Deep Learning
  32. Chapter 32 : Figuring Out Which Customers May Leave
    1. Understanding the Problem
    2. Exploratory Data Analysis and Visualizations
    3. Data Pre-Processing
    4. Machine Learning Modeling + Deep Learning
  33. Chapter 33 : Who to Target for Donations?
    1. Understanding the Problem
    2. Exploratory Data Analysis and Visualizations
    3. Preparing Our Dataset for Machine Learning
    4. Modeling Using Grid Search to Find the best parameters
  34. Chapter 34 : Predicting Insurance Premiums
    1. Understanding the Problem + Exploratory Data Analysis and Visualizations
    2. Data Preparation and Machine Learning Modeling
  35. Chapter 35 : Predicting Airbnb Prices
    1. Understanding the Problem + Exploratory Data Analysis and Visualizations
    2. Machine Learning Modeling
    3. Using Our Model for Value Estimation for New Clients
  36. Chapter 36 : Detecting Credit Card Fraud
    1. Understanding Our Dataset
    2. Exploratory Analysis
    3. Feature Extraction
    4. Creating and Validating Our Model
  37. Chapter 37 : Analyzing Conversion Rates in Marketing Campaigns
    1. Exploratory Analysis of Understanding Marketing Conversion Rates
  38. Chapter 38 : Predicting Advertising Engagement
    1. Understanding the Problem + Exploratory Data Analysis and Visualizations
    2. Data Preparation and Machine Learning Modeling
  39. Chapter 39 : Product Sales Analysis
    1. Problem and Plan of Attack
    2. Sales and Revenue Analysis
    3. Analysis Per Country, Repeat Customers, and Items
  40. Chapter 40 : Determining Your Most Valuable Customers
    1. Understanding the Problem + Exploratory Data Analysis and Visualizations
    2. Customer Lifetime Value Modeling
  41. Chapter 41 : Customer Clustering (K-Means, Hierarchical) - Train Passenger
    1. Data Exploration and Description
    2. Simple Exploratory Data Analysis and Visualizations
    3. Feature Engineering
    4. K-Means Clustering of Customer Data
    5. Cluster Analysis
  42. Chapter 42 : Build a Product Recommendation System
    1. Dataset Description and Data Cleaning
    2. Making a Customer-Item Matrix
    3. User-User Matrix - Getting Recommended Items
    4. Item-Item Collaborative Filtering - Finding the Most Similar Items
  43. Chapter 43 : Deep Learning Recommendation System
    1. Understanding Our Wikipedia Movie Dataset
    2. Creating Our Dataset
    3. Deep Learning Embeddings and Training
    4. Getting Recommendations Based on Movie Similarity
  44. Chapter 44 : Predicting Brent Oil Prices
    1. Understanding Our Dataset and Its Time Series Nature
    2. Creating Our Prediction Model
    3. Making Future Predictions
  45. Chapter 45 : Detecting Sentiment in Tweets
    1. Understanding Our Dataset and Word Clouds
    2. Visualizations and Feature Extraction
    3. Training Our Model
  46. Chapter 46 : Spam or Ham Detection
    1. Loading and Understanding Our Spam/Ham Dataset
    2. Training Our Spam Detector
  47. Chapter 47 : Explore Data with PySpark and Titanic Survival Prediction
    1. Exploratory Analysis of Our Titanic Dataset
    2. Transformation Operations
    3. Machine Learning with PySpark
  48. Chapter 48 : Newspaper Headline Classification Using PySpark
    1. Loading and Understanding Our Dataset
    2. Building Our Model with PySpark
  49. Chapter 49 : Deployment into Production
    1. Introduction to Production Deployment Systems
    2. Creating the Model
    3. Introduction to Flask
    4. About Our WebApp
    5. Deploying Our WebApp on Heroku

Product information

  • Title: Data Science, Analytics, and AI for Business and the Real World™
  • Author(s): Rajeev Ratan
  • Release date: March 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803240848