Reinforcement Learning Algorithms with Python

Book description

Develop self-learning algorithms and agents using TensorFlow and other Python tools, frameworks, and libraries

Key Features

  • Learn, develop, and deploy advanced reinforcement learning algorithms to solve a variety of tasks
  • Understand and develop model-free and model-based algorithms for building self-learning agents
  • Work with advanced Reinforcement Learning concepts and algorithms such as imitation learning and evolution strategies

Book Description

Reinforcement Learning (RL) is a popular and promising branch of AI that involves making smarter models and agents that can automatically determine ideal behavior based on changing requirements. This book will help you master RL algorithms and understand their implementation as you build self-learning agents.

Starting with an introduction to the tools, libraries, and setup needed to work in the RL environment, this book covers the building blocks of RL and delves into value-based methods, such as the application of Q-learning and SARSA algorithms. You'll learn how to use a combination of Q-learning and neural networks to solve complex problems. Furthermore, you'll study the policy gradient methods, TRPO, and PPO, to improve performance and stability, before moving on to the DDPG and TD3 deterministic algorithms. This book also covers how imitation learning techniques work and how Dagger can teach an agent to drive. You'll discover evolutionary strategies and black-box optimization techniques, and see how they can improve RL algorithms. Finally, you'll get to grips with exploration approaches, such as UCB and UCB1, and develop a meta-algorithm called ESBAS.

By the end of the book, you'll have worked with key RL algorithms to overcome challenges in real-world applications, and be part of the RL research community.

What you will learn

  • Develop an agent to play CartPole using the OpenAI Gym interface
  • Discover the model-based reinforcement learning paradigm
  • Solve the Frozen Lake problem with dynamic programming
  • Explore Q-learning and SARSA with a view to playing a taxi game
  • Apply Deep Q-Networks (DQNs) to Atari games using Gym
  • Study policy gradient algorithms, including Actor-Critic and REINFORCE
  • Understand and apply PPO and TRPO in continuous locomotion environments
  • Get to grips with evolution strategies for solving the lunar lander problem

Who this book is for

If you are an AI researcher, deep learning user, or anyone who wants to learn reinforcement learning from scratch, this book is for you. You'll also find this reinforcement learning book useful if you want to learn about the advancements in the field. Working knowledge of Python is necessary.

Table of contents

  1. Title Page
  2. Copyright and Credits
    1. Reinforcement Learning Algorithms with Python
  3. Dedication
  4. About Packt
    1. Why subscribe?
  5. Contributors
    1. About the author
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. Section 1: Algorithms and Environments
  8. The Landscape of Reinforcement Learning
    1. An introduction to RL
      1. Comparing RL and supervised learning
      2. History of RL
      3. Deep RL
    2. Elements of RL
      1. Policy
      2. The value function
      3. Reward
      4. Model
    3. Applications of RL
      1. Games
      2. Robotics and Industry 4.0
      3. Machine learning
      4. Economics and finance
      5. Healthcare
      6. Intelligent transportation systems
      7. Energy optimization and smart grid
    4. Summary
    5. Questions
    6. Further reading
  9. Implementing RL Cycle and OpenAI Gym
    1. Setting up the environment
      1. Installing OpenAI Gym
      2. Installing Roboschool
    2. OpenAI Gym and RL cycles
      1. Developing an RL cycle
      2. Getting used to spaces
    3. Development of ML models using TensorFlow
      1. Tensor
        1. Constant
        2. Placeholder
        3. Variable
      2. Creating a graph
      3. Simple linear regression example
    4. Introducing TensorBoard
    5. Types of RL environments
      1. Why different environments?
      2. Open source environments
    6. Summary
    7. Questions
    8. Further reading
  10. Solving Problems with Dynamic Programming
    1. MDP
      1. Policy
      2. Return
      3. Value functions
      4. Bellman equation
    2. Categorizing RL algorithms
      1. Model-free algorithms
        1. Value-based algorithms
        2. Policy gradient algorithms
          1. Actor-Critic algorithms
        3. Hybrid algorithms
      2. Model-based RL
      3. Algorithm diversity
    3. Dynamic programming
      1. Policy evaluation and policy improvement
      2. Policy iteration
        1. Policy iteration applied to FrozenLake
      3. Value iteration
        1. Value iteration applied to FrozenLake
    4. Summary
    5. Questions
    6. Further reading
  11. Section 2: Model-Free RL Algorithms
  12. Q-Learning and SARSA Applications
    1. Learning without a model
      1. User experience
      2. Policy evaluation
      3. The exploration problem
        1. Why explore?
        2. How to explore
    2. TD learning
      1. TD update
      2. Policy improvement
      3. Comparing Monte Carlo and TD
    3. SARSA
      1. The algorithm
    4. Applying SARSA to Taxi-v2
    5. Q-learning
      1. Theory
      2. The algorithm
    6. Applying Q-learning to Taxi-v2
      1. Comparing SARSA and Q-learning
    7. Summary
    8. Questions
  13. Deep Q-Network
    1. Deep neural networks and Q-learning
      1. Function approximation
      2. Q-learning with neural networks
      3. Deep Q-learning instabilities
    2. DQN
      1. The solution
        1. Replay memory
        2. The target network
      2. The DQN algorithm
        1. The loss function
        2. Pseudocode
      3. Model architecture
    3. DQN applied to Pong
      1. Atari games
      2. Preprocessing
      3. DQN implementation
        1. DNNs
        2. The experienced buffer
        3. The computational graph and training loop
      4. Results
    4. DQN variations
      1. Double DQN
        1. DDQN implementation
        2. Results
      2. Dueling DQN
        1. Dueling DQN implementation
        2. Results
      3. N-step DQN
        1. Implementation
        2. Results
    5. Summary
    6. Questions
    7. Further reading
  14. Learning Stochastic and PG Optimization
    1. Policy gradient methods
      1. The gradient of the policy
      2. Policy gradient theorem
      3. Computing the gradient
      4. The policy
      5. On-policy PG
    2. Understanding the REINFORCE algorithm
      1. Implementing REINFORCE
      2. Landing a spacecraft using REINFORCE
        1. Analyzing the results
    3. REINFORCE with baseline
      1. Implementing REINFORCE with baseline
    4. Learning the AC algorithm
      1. Using a critic to help an actor to learn
      2. The n-step AC model
      3. The AC implementation
      4. Landing a spacecraft using AC
      5. Advanced AC, and tips and tricks
    5. Summary
    6. Questions
    7. Further reading
  15. TRPO and PPO Implementation
    1. Roboschool
      1. Control a continuous system
    2. Natural policy gradient
      1. Intuition behind NPG
      2. A bit of math
        1. FIM and KL divergence
      3. Natural gradient complications
    3. Trust region policy optimization
      1. The TRPO algorithm
      2. Implementation of the TRPO algorithm
      3. Application of TRPO
    4. Proximal Policy Optimization
      1. A quick overview
      2. The PPO algorithm
      3. Implementation of PPO
      4. PPO application
    5. Summary
    6. Questions
    7. Further reading
  16. DDPG and TD3 Applications
    1. Combining policy gradient optimization with Q-learning
      1. Deterministic policy gradient
    2. Deep deterministic policy gradient
      1. The DDPG algorithm
      2. DDPG implementation
      3. Appling DDPG to BipedalWalker-v2
    3. Twin delayed deep deterministic policy gradient (TD3)
      1. Addressing overestimation bias
        1. Implementation of TD3
      2. Addressing variance reduction
        1. Delayed policy updates
        2. Target regularization
      3. Applying TD3 to BipedalWalker
    4. Summary
    5. Questions
    6. Further reading
  17. Section 3: Beyond Model-Free Algorithms and Improvements
  18. Model-Based RL
    1. Model-based methods
      1. A broad perspective on model-based learning
        1. A known model
        2. Unknown model
      2. Advantages and disadvantages
    2. Combining model-based with model-free learning
      1. A useful combination
      2. Building a model from images
    3. ME-TRPO applied to an inverted pendulum
      1. Understanding ME-TRPO
      2. Implementing ME-TRPO
      3. Experimenting with RoboSchool
        1. Results on RoboSchoolInvertedPendulum
    4. Summary
    5. Questions
    6. Further reading
  19. Imitation Learning with the DAgger Algorithm
    1. Technical requirements
      1. Installation of Flappy Bird
    2. The imitation approach
      1. The driving assistant example
      2. Comparing IL and RL
      3. The role of the expert in imitation learning
      4. The IL structure
        1. Comparing active with passive imitation
    3. Playing Flappy Bird
      1. How to use the environment
    4. Understanding the dataset aggregation algorithm
      1. The DAgger algorithm
      2. Implementation of DAgger
        1. Loading the expert inference model
        2. Creating the learner's computational graph
        3. Creating a DAgger loop
      3. Analyzing the results on Flappy Bird
    5. IRL
    6. Summary
    7. Questions
    8. Further reading
  20. Understanding Black-Box Optimization Algorithms
    1. Beyond RL
      1. A brief recap of RL
      2. The alternative
        1. EAs
    2. The core of EAs
      1. Genetic algorithms
      2. Evolution strategies
        1. CMA-ES
        2. ES versus RL
    3. Scalable evolution strategies
      1. The core
        1. Parallelizing ES
        2. Other tricks
        3. Pseudocode
      2. Scalable implementation
        1. The main function
        2. Workers
    4. Applying scalable ES to LunarLander
    5. Summary
    6. Questions
    7. Further reading
  21. Developing the ESBAS Algorithm
    1. Exploration versus exploitation
      1. Multi-armed bandit
    2. Approaches to exploration
      1. The ∈-greedy strategy
      2. The UCB algorithm
        1. UCB1
      3. Exploration complexity
    3. Epochal stochastic bandit algorithm selection
      1. Unboxing algorithm selection
      2. Under the hood of ESBAS
      3. Implementation
      4. Solving Acrobot
        1. Results
    4. Summary
    5. Questions
    6. Further reading
  22. Practical Implementation for Resolving RL Challenges
    1. Best practices of deep RL
      1. Choosing the appropriate algorithm
      2. From zero to one
    2. Challenges in deep RL
      1. Stability and reproducibility
      2. Efficiency
      3. Generalization
    3. Advanced techniques
      1. Unsupervised RL
        1. Intrinsic reward
      2. Transfer learning
        1. Types of transfer learning
          1. 1-task learning
          2. Multi-task learning
    4. RL in the real world
      1. Facing real-world challenges
      2. Bridging the gap between simulation and the real world
      3. Creating your own environment
    5. Future of RL and its impact on society
    6. Summary
    7. Questions
    8. Further reading
  23. Assessments
  24. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Reinforcement Learning Algorithms with Python
  • Author(s): Andrea Lonza
  • Release date: October 2019
  • Publisher(s): Packt Publishing
  • ISBN: 9781789131116