Machine Learning Algorithms - Second Edition

Book description

An easy-to-follow, step-by-step guide for getting to grips with the real-world application of machine learning algorithms

Key Features

  • Explore statistics and complex mathematics for data-intensive applications
  • Discover new developments in EM algorithm, PCA, and bayesian regression
  • Study patterns and make predictions across various datasets

Book Description

Machine learning has gained tremendous popularity for its powerful and fast predictions with large datasets. However, the true forces behind its powerful output are the complex algorithms involving substantial statistical analysis that churn large datasets and generate substantial insight.

This second edition of Machine Learning Algorithms walks you through prominent development outcomes that have taken place relating to machine learning algorithms, which constitute major contributions to the machine learning process and help you to strengthen and master statistical interpretation across the areas of supervised, semi-supervised, and reinforcement learning. Once the core concepts of an algorithm have been covered, you'll explore real-world examples based on the most diffused libraries, such as scikit-learn, NLTK, TensorFlow, and Keras. You will discover new topics such as principal component analysis (PCA), independent component analysis (ICA), Bayesian regression, discriminant analysis, advanced clustering, and gaussian mixture.

By the end of this book, you will have studied machine learning algorithms and be able to put them into production to make your machine learning applications more innovative.

What you will learn

  • Study feature selection and the feature engineering process
  • Assess performance and error trade-offs for linear regression
  • Build a data model and understand how it works by using different types of algorithm
  • Learn to tune the parameters of Support Vector Machines (SVM)
  • Explore the concept of natural language processing (NLP) and recommendation systems
  • Create a machine learning architecture from scratch

Who this book is for

Machine Learning Algorithms is for you if you are a machine learning engineer, data engineer, or junior data scientist who wants to advance in the field of predictive analytics and machine learning. Familiarity with R and Python will be an added advantage for getting the best from this book.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Machine Learning Algorithms Second Edition
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. A Gentle Introduction to Machine Learning
    1. Introduction – classic and adaptive machines
      1. Descriptive analysis
      2. Predictive analysis
    2. Only learning matters
      1. Supervised learning
      2. Unsupervised learning
      3. Semi-supervised learning
      4. Reinforcement learning
      5. Computational neuroscience
    3. Beyond machine learning – deep learning and bio-inspired adaptive systems
    4. Machine learning and big data
    5. Summary
  8. Important Elements in Machine Learning
    1. Data formats
      1. Multiclass strategies
        1. One-vs-all
        2. One-vs-one
    2. Learnability
      1. Underfitting and overfitting
      2. Error measures and cost functions
      3. PAC learning
    3. Introduction to statistical learning concepts
      1. MAP learning
      2. Maximum likelihood learning
    4. Class balancing
      1. Resampling with replacement
      2. SMOTE resampling
    5. Elements of information theory
      1. Entropy
      2. Cross-entropy and mutual information
      3. Divergence measures between two probability distributions
    6. Summary
  9. Feature Selection and Feature Engineering
    1. scikit-learn toy datasets
    2. Creating training and test sets
    3. Managing categorical data
    4. Managing missing features
    5. Data scaling and normalization
      1. Whitening
    6. Feature selection and filtering
    7. Principal Component Analysis
      1. Non-Negative Matrix Factorization
      2. Sparse PCA
      3. Kernel PCA
    8. Independent Component Analysis
    9. Atom extraction and dictionary learning
    10. Visualizing high-dimensional datasets using t-SNE
    11. Summary
  10. Regression Algorithms
    1. Linear models for regression
    2. A bidimensional example
    3. Linear regression with scikit-learn and higher dimensionality
      1. R2 score
      2. Explained variance
      3. Regressor analytic expression
    4. Ridge, Lasso, and ElasticNet
      1. Ridge
      2. Lasso
      3. ElasticNet
    5. Robust regression
      1. RANSAC
      2. Huber regression
    6. Bayesian regression
    7. Polynomial regression
    8. Isotonic regression
    9. Summary
  11. Linear Classification Algorithms
    1. Linear classification
    2. Logistic regression
    3. Implementation and optimizations
    4. Stochastic gradient descent algorithms
    5. Passive-aggressive algorithms
      1. Passive-aggressive regression
    6. Finding the optimal hyperparameters through a grid search
    7. Classification metrics
      1. Confusion matrix
      2. Precision
      3. Recall
      4. F-Beta
      5. Cohen's Kappa
      6. Global classification report
      7. Learning curve
    8. ROC curve
    9. Summary
  12. Naive Bayes and Discriminant Analysis
    1. Bayes' theorem
    2. Naive Bayes classifiers
    3. Naive Bayes in scikit-learn
      1. Bernoulli Naive Bayes
      2. Multinomial Naive Bayes
        1. An example of Multinomial Naive Bayes for text classification
      3. Gaussian Naive Bayes
    4. Discriminant analysis
    5. Summary
  13. Support Vector Machines
    1. Linear SVM
    2. SVMs with scikit-learn
      1. Linear classification
    3. Kernel-based classification
      1. Radial Basis Function
      2. Polynomial kernel
      3. Sigmoid kernel
      4. Custom kernels
      5. Non-linear examples
    4. ν-Support Vector Machines
    5. Support Vector Regression
      1. An example of SVR with the Airfoil Self-Noise dataset
    6. Introducing semi-supervised Support Vector Machines (S3VM)
    7. Summary
  14. Decision Trees and Ensemble Learning
    1. Binary Decision Trees
      1. Binary decisions
      2. Impurity measures
        1. Gini impurity index
        2. Cross-entropy impurity index
        3. Misclassification impurity index
      3. Feature importance
    2. Decision Tree classification with scikit-learn
    3. Decision Tree regression
      1. Example of Decision Tree regression with the Concrete Compressive Strength dataset
    4. Introduction to Ensemble Learning
      1. Random Forests
        1. Feature importance in Random Forests
      2. AdaBoost
      3. Gradient Tree Boosting
      4. Voting classifier
    5. Summary
  15. Clustering Fundamentals
    1. Clustering basics
    2. k-NN
    3. Gaussian mixture
      1. Finding the optimal number of components
    4. K-means
      1. Finding the optimal number of clusters
        1. Optimizing the inertia
        2. Silhouette score
        3. Calinski-Harabasz index
        4. Cluster instability
    5. Evaluation methods based on the ground truth
      1. Homogeneity
      2. Completeness
      3. Adjusted Rand Index
    6. Summary
  16. Advanced Clustering
    1. DBSCAN
    2. Spectral Clustering
    3. Online Clustering
      1. Mini-batch K-means
      2. BIRCH
    4. Biclustering
    5. Summary
  17. Hierarchical Clustering
    1. Hierarchical strategies
    2. Agglomerative Clustering
      1. Dendrograms
      2. Agglomerative Clustering in scikit-learn
      3. Connectivity constraints
    3. Summary
  18. Introducing Recommendation Systems
    1. Naive user-based systems
      1. Implementing a user-based system with scikit-learn
    2. Content-based systems
    3. Model-free (or memory-based) collaborative filtering
    4. Model-based collaborative filtering
      1. Singular value decomposition strategy
      2. Alternating least squares strategy
      3. ALS with Apache Spark MLlib
    5. Summary
  19. Introducing Natural Language Processing
    1. NLTK and built-in corpora
      1. Corpora examples
    2. The Bag-of-Words strategy
      1. Tokenizing
        1. Sentence tokenizing
        2. Word tokenizing
      2. Stopword removal
        1. Language detection
      3. Stemming
      4. Vectorizing
        1. Count vectorizing
          1. N-grams
        2. TF-IDF vectorizing
    3. Part-of-Speech
      1. Named Entity Recognition
    4. A sample text classifier based on the Reuters corpus
    5. Summary
  20. Topic Modeling and Sentiment Analysis in NLP
    1. Topic modeling
      1. Latent Semantic Analysis
      2. Probabilistic Latent Semantic Analysis
      3. Latent Dirichlet Allocation
    2. Introducing Word2vec with Gensim
    3. Sentiment analysis
      1. VADER sentiment analysis with NLTK
    4. Summary
  21. Introducing Neural Networks
    1. Deep learning at a glance
      1. Artificial neural networks
    2. MLPs with Keras
      1. Interfacing Keras to scikit-learn
    3. Summary
  22. Advanced Deep Learning Models
    1. Deep model layers
      1. Fully connected layers
        1. Convolutional layers
        2. Dropout layers
        3. Batch normalization layers
        4. Recurrent Neural Networks
    2. An example of a deep convolutional network with Keras
    3. An example of an LSTM network with Keras
    4. A brief introduction to TensorFlow
      1. Computing gradients
      2. Logistic regression
      3. Classification with a multilayer perceptron
      4. Image convolution
    5. Summary
  23. Creating a Machine Learning Architecture
    1. Machine learning architectures
      1. Data collection
      2. Normalization and regularization
      3. Dimensionality reduction
      4. Data augmentation
      5. Data conversion
      6. Modeling/grid search/cross-validation
      7. Visualization
      8. GPU support
      9. A brief introduction to distributed architectures
    2. Scikit-learn tools for machine learning architectures
      1. Pipelines
      2. Feature unions
    3. Summary
  24. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Machine Learning Algorithms - Second Edition
  • Author(s): Giuseppe Bonaccorso
  • Release date: August 2018
  • Publisher(s): Packt Publishing
  • ISBN: 9781789347999