Book description
An Expert Guide to Software Performance Optimization
From mobile and cloud apps to video games to driverless vehicle control, more and more software is time-constrained: It must deliver reliable results seamlessly, consistently, and virtually instantaneously. If it doesn't, customers are unhappy--and sometimes lives are put at risk. When complex software underperforms or fails, software engineers need to identify and address the root causes. This is difficult and, historically, few tools have been available to help.
In Understanding Software Dynamics, performance expert Richard L. Sites tackles the problem head on, offering expert methods and advanced tools for understanding complex, time-constrained software dynamics, improving reliability and troubleshooting challenging performance problems.
Sites draws on several decades of experience pioneering software performance optimization, as well as extensive experience teaching graduate-level developers. He introduces principles and techniques for use in any environment, from embedded devices to datacenters, illuminating them with examples based on x86 or ARM processors running Linux and linked by Ethernet. He also guides readers through building and applying a powerful, new, extremely low-overhead open-source software tool, KUtrace, to precisely trace executions on every CPU core. Using insights gleaned from this tool, readers can apply nuanced solutions--not merely brute-force techniques such as turning off caches or cores.
Measure and address issues associated with CPUs, memory, disk/SSD, networks, and their interactions
Fix programs that are always too slow, and those that sometimes lag for no apparent reason
Design useful observability, logging, and time-stamping capabilities into your code
Reason more effectively about performance data to see why reality differs from expectations
Identify problems such as excess execution, slow instruction execution, waiting for resources, and software locks
Understanding Software Dynamics will be valuable to experienced software professionals, including application and OS developers, hardware and system architects, real-time system designers, and game developers, as well as advanced students.
Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.
Table of contents
- Cover Page
- About This eBook
- Halftitle Page
- Title Page
- Copyright Page
- Dedication
- Contents at a Glance
- Contents
- Foreword
- Preface
- Acknowledgments
- About the Author
-
Part I: Measurement
- Chapter 1. My Program Is Too Slow
-
Chapter 2. Measuring CPUs
- 2.1 How We Got Here
- 2.2 Where Are We Now?
- 2.3 Measuring the Latency of an add Instruction
- 2.4 Straight-Line Code Fail
- 2.5 Simple Loop, Loop Overhead Fail, Optimizing Compiler Fail
- 2.6 Dead Variable Fail
- 2.7 Better Loop
- 2.8 Dependent Variables
- 2.9 Actual Execution Latency
- 2.10 More Nuance
- 2.11 Summary
- Exercises
-
Chapter 3. Measuring Memory
- 3.1 Memory Timing
- 3.2 About Memory
- 3.3 Cache Organization
- 3.4 Data Alignment
- 3.5 Translation Lookaside Buffer Organization
- 3.6 The Measurements
- 3.7 Measuring Cache Line Size
- 3.8 Problem: N+1 Prefetching
- 3.9 Dependent Loads
- 3.10 Non-random Dynamic Random-Access Memory
- 3.11 Measuring Total Size of Each Cache Level
- 3.12 Measuring Cache Associativity of Each Level
- 3.13 Translation Buffer Time
- 3.14 Cache Underutilization
- 3.15 Summary
- Exercises
- Chapter 4. CPU and Memory Interaction
-
Chapter 5. Measuring Disk/SSD
- 5.1 About Hard Disks
- 5.2 About SSDs
- 5.3 Software Disk Access and On-Disk Buffering
- 5.4 How Fast Is a Disk Read?
- 5.5 A Little Back-of-the-Envelope Calculation
- 5.6 How Fast Is a Disk Write?
- 5.7 Results
- 5.8 Reading from Disk
- 5.9 Writing to Disk
- 5.10 Reading from SSD
- 5.11 Writing to SSD
- 5.12 Multiple Transfers
- 5.13 Summary
- Exercises
-
Chapter 6. Measuring Networks
- 6.1 About Ethernet
- 6.2 About Hubs, Switches, and Routers
- 6.3 About TCP/IP
- 6.4 About Packets
- 6.5 About Remote Procedure Calls (RPCs)
- 6.6 Slop
- 6.7 Observing Network Traffic
- 6.8 Sample RPC Message Definition
- 6.9 Sample Logging Design
- 6.10 Sample Client-Server System Using RPCs
- 6.11 Sample Server Program
- 6.12 Spinlocks
- 6.13 Sample Client Program
- 6.14 Measuring One Sample Client-Server RPC
- 6.15 Postprocessing RPC Logs
- 6.16 Observations
- 6.17 Summary
- Exercises
- Chapter 7. Disk and Network Database Interaction
-
Part II: Observation
- Chapter 8. Logging
- Chapter 9. Aggregate Measures
- Chapter 10. Dashboards
-
Chapter 11. Other Existing Tools
- 11.1 Kinds of Observation Tools
- 11.2 Data to Observe
- 11.3 top Command
- 11.4 /proc and /sys Pseudofiles
- 11.5 time Command
- 11.6 perf Command
- 11.7 oprofile, CPU Profiler
- 11.8 strace, System Calls
- 11.9 ltrace, CPU C Library Calls
- 11.10 ftrace, CPU Trace
- 11.11 mtrace, Memory Malloc/Free
- 11.12 blktrace, Disk Trace
- 11.13 tcpdump and Wireshark, Network Trace
- 11.14 locktrace, Critical Section Locks
- 11.15 Offered Load, Outbound Calls, and Transaction Latency
- 11.16 Summary
- Exercises
- Chapter 12. Traces
- Chapter 13. Observation Tool Design Principles
- Part III: Kernel-User Trace
-
Part IV: Reasoning
- Chapter 20. What to Look For
- Chapter 21. Executing Too Much
- Chapter 22. Executing Slowly
- Chapter 23. Waiting for CPU
- Chapter 24. Waiting for Memory
-
Chapter 25. Waiting for Disk
- 25.1 The Program
- 25.2 The Mystery
- 25.3 Exploring and Reasoning
- 25.4 Reading 40MB
- 25.5 Reading Sequential 4KB Blocks
- 25.6 Reading Random 4KB Blocks
- 25.7 Writing and Sync of 40MB on SSD
- 25.8 Reading 40MB on SSD
- 25.9 Two Programs Accessing Two Files at Once
- 25.10 Mysteries Understood
- 25.11 Summary
- Exercises
- Chapter 26. Waiting for Network
-
Chapter 27. Waiting for Locks
- 27.1 Overview
- 27.2 The Program
- 27.3 Experiment 1: Long Lock Hold Times
- 27.4 Mysteries in Experiment 1
- 27.5 Exploring and Reasoning in Experiment 1
- 27.6 Experiment 2: Fixing Lock Capture
- 27.7 Experiment 3: Fixing Lock Contention via Multiple Locks
- 27.8 Experiment 4: Fixing Lock Contention via Less Locked Work
- 27.9 Experiment 5: Fixing Lock Contention via RCU for Dashboard
- 27.10 Summary
- Chapter 28. Waiting for Time
-
Chapter 29. Waiting for Queues
- 29.1 Overview
- 29.2 Request Distribution
- 29.3 Queue Structure
- 29.4 Worker Tasks
- 29.5 Primary Task
- 29.6 Dequeue
- 29.7 Enqueue
- 29.8 Spinlock
- 29.9 The “Work” Routine
- 29.10 Simple Examples
- 29.11 What Could Possibly Go Wrong?
- 29.12 CPU Frequency
- 29.13 Complex Examples
- 29.14 Waiting for CPUs: RPC Log
- 29.15 Waiting for CPUs: KUtrace
- 29.16 PlainSpinLock Flaw
- 29.17 Root Cause
- 29.18 PlainSpinLock Fixed: Observability
- 29.19 Load Balancing
- 29.20 Queue Depth: Observability
- 29.21 Spin at the End
- 29.22 One More Flaw
- 29.23 Cross-Checking
- 29.24 Summary
- Exercises
- Chapter 30. Recap
- Appendix A. Sample Servers
- Appendix B. Trace Entries
- Glossary
- References
- Index
Product information
- Title: Understanding Software Dynamics
- Author(s):
- Release date: December 2021
- Publisher(s): Addison-Wesley Professional
- ISBN: 9780137589692
You might also like
audiobook
The Design of Everyday Things
First, businesses discovered quality as a key competitive edge; next came science. Now, Donald A. Norman, …
audiobook
The Manager's Path
Managing people is difficult wherever you work. But in the tech industry, where management is also …
book
The Manager's Path
Managing people is difficult wherever you work. But in the tech industry, where management is also …
audiobook
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Data is at the center of many challenges in system design today. Difficult issues need to …