Understanding Software Dynamics

Book description

An Expert Guide to Software Performance Optimization

From mobile and cloud apps to video games to driverless vehicle control, more and more software is time-constrained: It must deliver reliable results seamlessly, consistently, and virtually instantaneously. If it doesn't, customers are unhappy--and sometimes lives are put at risk. When complex software underperforms or fails, software engineers need to identify and address the root causes. This is difficult and, historically, few tools have been available to help.

In Understanding Software Dynamics, performance expert Richard L. Sites tackles the problem head on, offering expert methods and advanced tools for understanding complex, time-constrained software dynamics, improving reliability and troubleshooting challenging performance problems.

Sites draws on several decades of experience pioneering software performance optimization, as well as extensive experience teaching graduate-level developers. He introduces principles and techniques for use in any environment, from embedded devices to datacenters, illuminating them with examples based on x86 or ARM processors running Linux and linked by Ethernet. He also guides readers through building and applying a powerful, new, extremely low-overhead open-source software tool, KUtrace, to precisely trace executions on every CPU core. Using insights gleaned from this tool, readers can apply nuanced solutions--not merely brute-force techniques such as turning off caches or cores.

  • Measure and address issues associated with CPUs, memory, disk/SSD, networks, and their interactions

  • Fix programs that are always too slow, and those that sometimes lag for no apparent reason

  • Design useful observability, logging, and time-stamping capabilities into your code

  • Reason more effectively about performance data to see why reality differs from expectations

  • Identify problems such as excess execution, slow instruction execution, waiting for resources, and software locks

Understanding Software Dynamics will be valuable to experienced software professionals, including application and OS developers, hardware and system architects, real-time system designers, and game developers, as well as advanced students.

Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Table of contents

  1. Cover Page
  2. About This eBook
  3. Halftitle Page
  4. Title Page
  5. Copyright Page
  6. Dedication
  7. Contents at a Glance
  8. Contents
  9. Foreword
  10. Preface
  11. Acknowledgments
  12. About the Author
  13. Part I: Measurement
    1. Chapter 1. My Program Is Too Slow
      1. 1.1 Datacenter Context
      2. 1.2 Datacenter Hardware
      3. 1.3 Datacenter Software
      4. 1.4 Long-Tail Latency
      5. 1.5 Thought Framework
      6. 1.6 Order-of-Magnitude Estimates
      7. 1.7 Why Are Transactions Slow?
      8. 1.8 The Five Fundamental Resources
      9. 1.9 Summary
    2. Chapter 2. Measuring CPUs
      1. 2.1 How We Got Here
      2. 2.2 Where Are We Now?
      3. 2.3 Measuring the Latency of an add Instruction
      4. 2.4 Straight-Line Code Fail
      5. 2.5 Simple Loop, Loop Overhead Fail, Optimizing Compiler Fail
      6. 2.6 Dead Variable Fail
      7. 2.7 Better Loop
      8. 2.8 Dependent Variables
      9. 2.9 Actual Execution Latency
      10. 2.10 More Nuance
      11. 2.11 Summary
      12. Exercises
    3. Chapter 3. Measuring Memory
      1. 3.1 Memory Timing
      2. 3.2 About Memory
      3. 3.3 Cache Organization
      4. 3.4 Data Alignment
      5. 3.5 Translation Lookaside Buffer Organization
      6. 3.6 The Measurements
      7. 3.7 Measuring Cache Line Size
      8. 3.8 Problem: N+1 Prefetching
      9. 3.9 Dependent Loads
      10. 3.10 Non-random Dynamic Random-Access Memory
      11. 3.11 Measuring Total Size of Each Cache Level
      12. 3.12 Measuring Cache Associativity of Each Level
      13. 3.13 Translation Buffer Time
      14. 3.14 Cache Underutilization
      15. 3.15 Summary
      16. Exercises
    4. Chapter 4. CPU and Memory Interaction
      1. 4.1 Cache Interaction
      2. 4.2 Simple Matrix Multiply Dynamics
      3. 4.3 Estimates
      4. 4.4 Initialization, Cross-Checking, and Observing
      5. 4.5 Initial Results
      6. 4.6 Faster Matrix Multiply, Transpose Method
      7. 4.7 Faster Matrix Multiply, Subblock Method
      8. 4.8 Cache-Aware Computation
      9. 4.9 Summary
      10. Exercises
    5. Chapter 5. Measuring Disk/SSD
      1. 5.1 About Hard Disks
      2. 5.2 About SSDs
      3. 5.3 Software Disk Access and On-Disk Buffering
      4. 5.4 How Fast Is a Disk Read?
      5. 5.5 A Little Back-of-the-Envelope Calculation
      6. 5.6 How Fast Is a Disk Write?
      7. 5.7 Results
      8. 5.8 Reading from Disk
      9. 5.9 Writing to Disk
      10. 5.10 Reading from SSD
      11. 5.11 Writing to SSD
      12. 5.12 Multiple Transfers
      13. 5.13 Summary
      14. Exercises
    6. Chapter 6. Measuring Networks
      1. 6.1 About Ethernet
      2. 6.2 About Hubs, Switches, and Routers
      3. 6.3 About TCP/IP
      4. 6.4 About Packets
      5. 6.5 About Remote Procedure Calls (RPCs)
      6. 6.6 Slop
      7. 6.7 Observing Network Traffic
      8. 6.8 Sample RPC Message Definition
      9. 6.9 Sample Logging Design
      10. 6.10 Sample Client-Server System Using RPCs
      11. 6.11 Sample Server Program
      12. 6.12 Spinlocks
      13. 6.13 Sample Client Program
      14. 6.14 Measuring One Sample Client-Server RPC
      15. 6.15 Postprocessing RPC Logs
      16. 6.16 Observations
      17. 6.17 Summary
      18. Exercises
    7. Chapter 7. Disk and Network Database Interaction
      1. 7.1 Time Alignment
      2. 7.2 Multiple Clients
      3. 7.3 Spinlocks
      4. 7.4 Experiment 1
      5. 7.5 On-Disk Database
      6. 7.6 Experiment 2
      7. 7.7 Experiment 3
      8. 7.8 Logging
      9. 7.9 Understanding Transaction Latency Variation
      10. 7.10 Summary
      11. Exercises
  14. Part II: Observation
    1. Chapter 8. Logging
      1. 8.1 Observation Tools
      2. 8.2 Logging
      3. 8.3 Basic Logging
      4. 8.4 Extended Logging
      5. 8.5 Timestamps
      6. 8.6 RPC IDs
      7. 8.7 Log File Formats
      8. 8.8 Managing Log Files
      9. 8.9 Summary
    2. Chapter 9. Aggregate Measures
      1. 9.1 Uniform vs. Bursty Event Rates
      2. 9.2 Measurement Intervals
      3. 9.3 Timelines
      4. 9.4 Further Summarizing of Timelines
      5. 9.5 Histogram Time Scales
      6. 9.6 Aggregating Per-Event Measurements
      7. 9.7 Patterns of Values Over Time
      8. 9.8 Update Intervals
      9. 9.9 Example Transactions
      10. 9.10 Conclusion
    3. Chapter 10. Dashboards
      1. 10.1 Sample Service
      2. 10.2 Sample Dashboards
      3. 10.3 Master Dashboard
      4. 10.4 Per-Instance Dashboards
      5. 10.5 Per-Server Dashboards
      6. 10.6 Sanity Checks
      7. 10.7 Summary
      8. Exercises
    4. Chapter 11. Other Existing Tools
      1. 11.1 Kinds of Observation Tools
      2. 11.2 Data to Observe
      3. 11.3 top Command
      4. 11.4 /proc and /sys Pseudofiles
      5. 11.5 time Command
      6. 11.6 perf Command
      7. 11.7 oprofile, CPU Profiler
      8. 11.8 strace, System Calls
      9. 11.9 ltrace, CPU C Library Calls
      10. 11.10 ftrace, CPU Trace
      11. 11.11 mtrace, Memory Malloc/Free
      12. 11.12 blktrace, Disk Trace
      13. 11.13 tcpdump and Wireshark, Network Trace
      14. 11.14 locktrace, Critical Section Locks
      15. 11.15 Offered Load, Outbound Calls, and Transaction Latency
      16. 11.16 Summary
      17. Exercises
    5. Chapter 12. Traces
      1. 12.1 Tracing Advantages
      2. 12.2 Tracing Disadvantages
      3. 12.3 The Three Starting Questions
      4. 12.4 Example: Early Program Counter Trace
      5. 12.5 Example: Per-Function Counts and Time
      6. 12.6 Case Study: Per-Function Trace of Gmail
      7. 12.7 Summary
    6. Chapter 13. Observation Tool Design Principles
      1. 13.1 What to Observe
      2. 13.2 How Frequently and For How Long?
      3. 13.3 How Much Overhead?
      4. 13.4 Design Consequences
      5. 13.5 Case Study: Histogram Buckets
      6. 13.6 Designing Data Display
      7. 13.7 Summary
  15. Part III: Kernel-User Trace
    1. Chapter 14. KUtrace: Goals, Design, Implementation
      1. 14.1 Overview
      2. 14.2 Goals
      3. 14.3 Design
      4. 14.4 Implementation
      5. 14.5 Kernel Patches and Module
      6. 14.6 Control Program
      7. 14.7 Postprocessing
      8. 14.8 A Note on Security
      9. 14.9 Summary
    2. Chapter 15. KUtrace: Linux Kernel Patches
      1. 15.1 Trace Buffer Data Structures
      2. 15.2 Raw Traceblock Format
      3. 15.3 Trace Entries
      4. 15.4 IPC Trace Entries
      5. 15.5 Timestamps
      6. 15.6 Event Numbers
      7. 15.7 Nested Trace Entries
      8. 15.8 Code
      9. 15.9 Packet Tracing
      10. 15.10 AMD/Intel x86-64 Patches
      11. 15.11 Summary
      12. Exercises
    3. Chapter 16. KUtrace: Linux Loadable Module
      1. 16.1 Kernel Interface Data Structures
      2. 16.2 Module Load/Unload
      3. 16.3 Initializing and Controlling Tracing
      4. 16.4 Implementing Trace Calls
      5. 16.5 Insert1
      6. 16.6 InsertN
      7. 16.7 Switching to a New Traceblock
      8. 16.8 Summary
    4. Chapter 17. KUtrace: User-Mode Runtime Control
      1. 17.1 Controlling Tracing
      2. 17.2 Standalone kutrace_control Program
      3. 17.3 The Underlying kutrace_lib Library
      4. 17.4 The Control Interface to the Loadable Module
      5. 17.5 Summary
    5. Chapter 18. KUtrace: Postprocessing
      1. 18.1 Postprocessing Details
      2. 18.2 The rawtoevent Program
      3. 18.3 The eventtospan Program
      4. 18.4 The spantotrim Program
      5. 18.5 The spantospan Program
      6. 18.6 The samptoname_k and samptoname_u Programs
      7. 18.7 The makeself Program
      8. 18.8 KUtrace JSON Format
      9. 18.9 Summary
    6. Chapter 19. KUtrace: Display of Software Dynamics
      1. 19.1 Overview
      2. 19.2 Region 1, Controls
      3. 19.3 Region 2, Y-axis
      4. 19.4 Region 3, Timelines
      5. 19.5 Region 4, IPC Legend
      6. 19.6 Region 5, X-axis
      7. 19.7 Region 6, Save/Restore
      8. 19.8 Secondary Controls
      9. 19.9 Summary
  16. Part IV: Reasoning
    1. Chapter 20. What to Look For
      1. 20.1 Overview
    2. Chapter 21. Executing Too Much
      1. 21.1 Overview
      2. 21.2 The Program
      3. 21.3 The Mystery
      4. 21.4 Exploring and Reasoning
      5. 21.5 Mystery Understood
      6. 21.6 Summary
    3. Chapter 22. Executing Slowly
      1. 22.1 Overview
      2. 22.2 The Program
      3. 22.3 The Mystery
      4. 22.4 Floating-Point Antagonist
      5. 22.5 Memory Antagonist
      6. 22.6 Mystery Understood
      7. 22.7 Summary
    4. Chapter 23. Waiting for CPU
      1. 23.1 The Program
      2. 23.2 The Mystery
      3. 23.3 Exploring and Reasoning
      4. 23.4 Mystery 2
      5. 23.5 Mystery 2 Understood
      6. 23.6 Bonus Mystery
      7. 23.7 Summary
      8. Exercises
    5. Chapter 24. Waiting for Memory
      1. 24.1 The Program
      2. 24.2 The Mystery
      3. 24.3 Exploring and Reasoning
      4. 24.4 Mystery 2: Access to a Page Table
      5. 24.5 Mystery 2 Understood
      6. 24.6 Summary
      7. Exercises
    6. Chapter 25. Waiting for Disk
      1. 25.1 The Program
      2. 25.2 The Mystery
      3. 25.3 Exploring and Reasoning
      4. 25.4 Reading 40MB
      5. 25.5 Reading Sequential 4KB Blocks
      6. 25.6 Reading Random 4KB Blocks
      7. 25.7 Writing and Sync of 40MB on SSD
      8. 25.8 Reading 40MB on SSD
      9. 25.9 Two Programs Accessing Two Files at Once
      10. 25.10 Mysteries Understood
      11. 25.11 Summary
      12. Exercises
    7. Chapter 26. Waiting for Network
      1. 26.1 Overview
      2. 26.2 The Programs
      3. 26.3 Experiment 1
      4. 26.4 Experiment 1 Mystery
      5. 26.5 Experiment 1 Exploring and Reasoning
      6. 26.6 Experiment 1 What About the Time Between RPCs?
      7. 26.7 Experiment 2
      8. 26.8 Experiment 3
      9. 26.9 Experiment 4
      10. 26.10 Mysteries Understood
      11. 26.11 Bonus Anomaly
      12. 26.12 Summary
    8. Chapter 27. Waiting for Locks
      1. 27.1 Overview
      2. 27.2 The Program
      3. 27.3 Experiment 1: Long Lock Hold Times
      4. 27.4 Mysteries in Experiment 1
      5. 27.5 Exploring and Reasoning in Experiment 1
      6. 27.6 Experiment 2: Fixing Lock Capture
      7. 27.7 Experiment 3: Fixing Lock Contention via Multiple Locks
      8. 27.8 Experiment 4: Fixing Lock Contention via Less Locked Work
      9. 27.9 Experiment 5: Fixing Lock Contention via RCU for Dashboard
      10. 27.10 Summary
    9. Chapter 28. Waiting for Time
      1. 28.1 Periodic Work
      2. 28.2 Timeouts
      3. 28.3 Timeslicing
      4. 28.4 Inline Execution Delays
      5. 28.5 Summary
    10. Chapter 29. Waiting for Queues
      1. 29.1 Overview
      2. 29.2 Request Distribution
      3. 29.3 Queue Structure
      4. 29.4 Worker Tasks
      5. 29.5 Primary Task
      6. 29.6 Dequeue
      7. 29.7 Enqueue
      8. 29.8 Spinlock
      9. 29.9 The “Work” Routine
      10. 29.10 Simple Examples
      11. 29.11 What Could Possibly Go Wrong?
      12. 29.12 CPU Frequency
      13. 29.13 Complex Examples
      14. 29.14 Waiting for CPUs: RPC Log
      15. 29.15 Waiting for CPUs: KUtrace
      16. 29.16 PlainSpinLock Flaw
      17. 29.17 Root Cause
      18. 29.18 PlainSpinLock Fixed: Observability
      19. 29.19 Load Balancing
      20. 29.20 Queue Depth: Observability
      21. 29.21 Spin at the End
      22. 29.22 One More Flaw
      23. 29.23 Cross-Checking
      24. 29.24 Summary
      25. Exercises
    11. Chapter 30. Recap
      1. 30.1 What You Learned
      2. 30.2 What We Haven’t Covered
      3. 30.3 Next Steps
      4. 30.4 Summary (for the Entire Book)
  17. Appendix A. Sample Servers
    1. A.1 Sample Server Hardware
    2. A.2 Connecting the Servers
  18. Appendix B. Trace Entries
    1. B.1 Fixed-Length Trace Entries
    2. B.2 Variable-Length Trace Entries
    3. B.3 Event Numbers
  19. Glossary
  20. References
  21. Index

Product information

  • Title: Understanding Software Dynamics
  • Author(s): Richard L. Sites
  • Release date: December 2021
  • Publisher(s): Addison-Wesley Professional
  • ISBN: 9780137589692