Book description
Companies are scrambling to integrate AI into their systems and operations. But to build truly successful solutions, you need a firm grasp of the underlying mathematics. This accessible guide walks you through the math necessary to thrive in the AI field such as focusing on real-world applications rather than dense academic theory.
Engineers, data scientists, and students alike will examine mathematical topics critical for AI--including regression, neural networks, optimization, backpropagation, convolution, Markov chains, and more--through popular applications such as computer vision, natural language processing, and automated systems. And supplementary Jupyter notebooks shed light on examples with Python code and visualizations. Whether you're just beginning your career or have years of experience, this book gives you the foundation necessary to dive deeper in the field.
- Understand the underlying mathematics powering AI systems, including generative adversarial networks, random graphs, large random matrices, mathematical logic, optimal control, and more
- Learn how to adapt mathematical methods to different applications from completely different fields
- Gain the mathematical fluency to interpret and explain how AI systems arrive at their decisions
Publisher resources
Table of contents
-
Preface
- Why I Wrote This Book
- Who Is This Book For?
- Who Is This Book Not For?
- How Will the Math Be Presented in This Book?
- Infographic
- What Math Background Is Expected from You to Be Able to Read This Book?
- Overview of the Chapters
- My Favorite Books on AI
- Conventions Used in This Book
- Using Code Examples
- O’Reilly Online Learning
- How to Contact Us
- Acknowledgments
- 1. Why Learn the Mathematics of AI?
-
2. Data, Data, Data
- Data for AI
- Real Data Versus Simulated Data
- Mathematical Models: Linear Versus Nonlinear
- An Example of Real Data
- An Example of Simulated Data
- Mathematical Models: Simulations and AI
- Where Do We Get Our Data From?
-
The Vocabulary of Data Distributions, Probability, and Statistics
- Random Variables
- Probability Distributions
- Marginal Probabilities
- The Uniform and the Normal Distributions
- Conditional Probabilities and Bayes’ Theorem
- Conditional Probabilities and Joint Distributions
- Prior Distribution, Posterior Distribution, and Likelihood Function
- Mixtures of Distributions
- Sums and Products of Random Variables
- Using Graphs to Represent Joint Probability Distributions
- Expectation, Mean, Variance, and Uncertainty
- Covariance and Correlation
- Markov Process
- Normalizing, Scaling, and/or Standardizing a Random Variable or Data Set
- Common Examples
- Continuous Distributions Versus Discrete Distributions (Density Versus Mass)
- The Power of the Joint Probability Density Function
- Distribution of Data: The Uniform Distribution
- Distribution of Data: The Bell-Shaped Normal (Gaussian) Distribution
- Distribution of Data: Other Important and Commonly Used Distributions
- The Various Uses of the Word “Distribution”
- A/B Testing
- Summary and Looking Ahead
-
3. Fitting Functions to Data
- Traditional and Very Useful Machine Learning Models
- Numerical Solutions Versus Analytical Solutions
- Regression: Predict a Numerical Value
- Logistic Regression: Classify into Two Classes
- Softmax Regression: Classify into Multiple Classes
- Incorporating These Models into the Last Layer of a Neural Network
- Other Popular Machine Learning Techniques and Ensembles of Techniques
- Performance Measures for Classification Models
- Summary and Looking Ahead
-
4. Optimization for Neural Networks
- The Brain Cortex and Artificial Neural Networks
- Training Function: Fully Connected, or Dense, Feed Forward Neural Networks
- Loss Functions
- Optimization
- Regularization Techniques
- Hyperparameter Examples That Appear in Machine Learning
- Chain Rule and Backpropagation: Calculating ∇ L ( ω → i )
- Assessing the Significance of the Input Data Features
- Summary and Looking Ahead
- 5. Convolutional Neural Networks and Computer Vision
-
6. Singular Value Decomposition: Image Processing, Natural Language Processing, and Social Media
- Matrix Factorization
- Diagonal Matrices
-
Matrices as Linear Transformations Acting on Space
- Action of A on the Right Singular Vectors
- Action of A on the Standard Unit Vectors and the Unit Square Determined by Them
- Action of A on the Unit Circle
- Breaking Down the Circle-to-Ellipse Transformation According to the Singular Value Decomposition
- Rotation and Reflection Matrices
- Action of A on a General Vector x →
- Three Ways to Multiply Matrices
- The Big Picture
- The Ingredients of the Singular Value Decomposition
- Singular Value Decomposition Versus the Eigenvalue Decomposition
- Computation of the Singular Value Decomposition
- The Pseudoinverse
- Applying the Singular Value Decomposition to Images
- Principal Component Analysis and Dimension Reduction
- Principal Component Analysis and Clustering
- A Social Media Application
- Latent Semantic Analysis
- Randomized Singular Value Decomposition
- Summary and Looking Ahead
-
7. Natural Language and Finance AI: Vectorization and Time Series
- Natural Language AI
- Preparing Natural Language Data for Machine Processing
- Statistical Models and the log Function
- Zipf’s Law for Term Counts
-
Various Vector Representations for Natural Language Documents
- Term Frequency Vector Representation of a Document or Bag of Words
- Term Frequency-Inverse Document Frequency Vector Representation of a Document
- Topic Vector Representation of a Document Determined by Latent Semantic Analysis
- Topic Vector Representation of a Document Determined by Latent Dirichlet Allocation
- Topic Vector Representation of a Document Determined by Latent Discriminant Analysis
- Meaning Vector Representations of Words and of Documents Determined by Neural Network Embeddings
- Cosine Similarity
- Natural Language Processing Applications
- Transformers and Attention Models
- Convolutional Neural Networks for Time Series Data
- Recurrent Neural Networks for Time Series Data
- An Example of Natural Language Data
- Finance AI
- Summary and Looking Ahead
-
8. Probabilistic Generative Models
- What Are Generative Models Useful For?
- The Typical Mathematics of Generative Models
- Shifting Our Brain from Deterministic Thinking to Probabilistic Thinking
- Maximum Likelihood Estimation
- Explicit and Implicit Density Models
- Explicit Density-Tractable: Fully Visible Belief Networks
- Explicit Density-Tractable: Change of Variables Nonlinear Independent Component Analysis
- Explicit Density-Intractable: Variational Autoencoders Approximation via Variational Methods
- Explicit Density-Intractable: Boltzman Machine Approximation via Markov Chain
- Implicit Density-Markov Chain: Generative Stochastic Network
- Implicit Density-Direct: Generative Adversarial Networks
- Example: Machine Learning and Generative Networks for High Energy Physics
- Other Generative Models
- The Evolution of Generative Models
- Probabilistic Language Modeling
- Summary and Looking Ahead
-
9. Graph Models
- Graphs: Nodes, Edges, and Features for Each
- Example: PageRank Algorithm
- Inverting Matrices Using Graphs
- Cayley Graphs of Groups: Pure Algebra and Parallel Computing
- Message Passing Within a Graph
-
The Limitless Applications of Graphs
- Brain Networks
- Spread of Disease
- Spread of Information
- Detecting and Tracking Fake News Propagation
- Web-Scale Recommendation Systems
- Fighting Cancer
- Biochemical Graphs
- Molecular Graph Generation for Drug and Protein Structure Discovery
- Citation Networks
- Social Media Networks and Social Influence Prediction
- Sociological Structures
- Bayesian Networks
- Traffic Forecasting
- Logistics and Operations Research
- Language Models
- Graph Structure of the Web
- Automatically Analyzing Computer Programs
- Data Structures in Computer Science
- Load Balancing in Distributed Networks
- Artificial Neural Networks
- Random Walks on Graphs
- Node Representation Learning
- Tasks for Graph Neural Networks
- Dynamic Graph Models
-
Bayesian Networks
- A Bayesian Network Represents a Compactified Conditional Probability Table
- Making Predictions Using a Bayesian Network
- Bayesian Networks Are Belief Networks, Not Causal Networks
- Keep This in Mind About Bayesian Networks
- Chains, Forks, and Colliders
- Given a Data Set, How Do We Set Up a Bayesian Network for the Involved Variables?
- Graph Diagrams for Probabilistic Causal Modeling
- A Brief History of Graph Theory
- Main Considerations in Graph Theory
- Algorithms and Computational Aspects of Graphs
- Summary and Looking Ahead
-
10. Operations Research
- No Free Lunch
- Complexity Analysis and O() Notation
- Optimization: The Heart of Operations Research
- Thinking About Optimization
- Optimization on Networks
- The n-Queens Problem
- Linear Optimization
- Game Theory and Multiagents
- Queuing
- Inventory
- Machine Learning for Operations Research
- Hamilton-Jacobi-Bellman Equation
- Operations Research for AI
- Summary and Looking Ahead
-
11. Probability
- Where Did Probability Appear in This Book?
- What More Do We Need to Know That Is Essential for AI?
- Causal Modeling and the Do Calculus
- Paradoxes and Diagram Interpretations
- Large Random Matrices
- Stochastic Processes
- Markov Decision Processes and Reinforcement Learning
-
Theoretical and Rigorous Grounds
- Which Events Have a Probability?
- Can We Talk About a Wider Range of Random Variables?
- A Probability Triple (Sample Space, Sigma Algebra, Probability Measure)
- Where Is the Difficulty?
- Random Variable, Expectation, and Integration
- Distribution of a Random Variable and the Change of Variable Theorem
- Next Steps in Rigorous Probability Theory
- The Universality Theorem for Neural Networks
- Summary and Looking Ahead
- 12. Mathematical Logic
-
13. Artificial Intelligence and Partial Differential Equations
- What Is a Partial Differential Equation?
- Modeling with Differential Equations
- Numerical Solutions Are Very Valuable
- Some Statistical Mechanics: The Wonderful Master Equation
- Solutions as Expectations of Underlying Random Processes
- Transforming the PDE
- Solution Operators
- AI for PDEs
- Hamilton-Jacobi-Bellman PDE for Dynamic Programming
- PDEs for AI?
- Other Considerations in Partial Differential Equations
- Summary and Looking Ahead
- 14. Artificial Intelligence, Ethics, Mathematics, Law, and Policy
- Index
- About the Author
Product information
- Title: Essential Math for AI
- Author(s):
- Release date: January 2023
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098107635
You might also like
book
Generative Deep Learning, 2nd Edition
Generative AI is the hottest topic in tech. This practical book teaches machine learning engineers and …
book
Introducing Python, 2nd Edition
Easy to understand and fun to read, this updated edition of Introducing Python is ideal for …
book
Deep Learning for Coders with fastai and PyTorch
Deep learning is often viewed as the exclusive domain of math PhDs and big tech companies. …
book
Prompt Engineering for Generative AI
Large language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. …