Book description
If you've ever wondered how Linux carries out the complicated tasks assigned to it by the IP protocols -- or if you just want to learn about modern networking through real-life examples -- Understanding Linux Network Internals is for you.
Like the popular O'Reilly book, Understanding the Linux Kernel, this book clearly explains the underlying concepts and teaches you how to follow the actual C code that implements it. Although some background in the TCP/IP protocols is helpful, you can learn a great deal from this text about the protocols themselves and their uses. And if you already have a base knowledge of C, you can use the book's code walkthroughs to figure out exactly what this sophisticated part of the Linux kernel is doing.
Part of the difficulty in understanding networks -- and implementing them -- is that the tasks are broken up and performed at many different times by different pieces of code. One of the strengths of this book is to integrate the pieces and reveal the relationships between far-flung functions and data structures. Understanding Linux Network Internals is both a big-picture discussion and a no-nonsense guide to the details of Linux networking. Topics include:
- Key problems with networking
- Network interface card (NIC) device drivers
- System initialization
- Layer 2 (link-layer) tasks and implementation
- Layer 3 (IPv4) tasks and implementation
- Neighbor infrastructure and protocols (ARP)
- Bridging
- Routing
- ICMP
Author Christian Benvenuti, an operating system designer specializing in networking, explains much more than how Linux code works. He shows the purposes of major networking features and the trade-offs involved in choosing one solution over another. A large number of flowcharts and other diagrams enhance the book's understandability.
Publisher resources
Table of contents
- Preface
-
I. General Background
-
1. Introduction
- 1.1. Basic Terminology
-
1.2. Common Coding Patterns
- 1.2.1. Memory Caches
- 1.2.2. Caching and Hash Tables
- 1.2.3. Reference Counts
- 1.2.4. Garbage Collection
- 1.2.5. Function Pointers and Virtual Function Tables (VFTs)
- 1.2.6. goto Statements
- 1.2.7. Vector Definitions
- 1.2.8. Conditional Directives (#ifdef and family)
- 1.2.9. Compile-Time Optimization for Condition Checks
- 1.2.10. Mutual Exclusion
- 1.2.11. Conversions Between Host and Network Order
- 1.2.12. Catching Bugs
- 1.2.13. Statistics
- 1.2.14. Measuring Time
- 1.3. User-Space Tools
- 1.4. Browsing the Source Code
- 1.5. When a Feature Is Offered as a Patch
-
2. Critical Data
Structures
-
2.1. The Socket Buffer: sk_buff Structure
- 2.1.1. Networking Options and Kernel Structures
- 2.1.2. Layout Fields
- 2.1.3. General Fields
- 2.1.4. Feature-Specific Fields
-
2.1.5. Management Functions
- 2.1.5.1. Allocating memory: alloc_skb and dev_alloc_skb
- 2.1.5.2. Freeing memory: kfree_skb and dev_kfree_skb
- 2.1.5.3. Data reservation and alignment: skb_reserve, skb_put, skb_push, and skb_pull
- 2.1.5.4. The skb_shared_info structure and the skb_shinfo function
- 2.1.5.5. Cloning and copying buffers
- 2.1.5.6. List management functions
- 2.2. net_device Structure
- 2.3. Files Mentioned in This Chapter
-
2.1. The Socket Buffer: sk_buff Structure
- 3. User-Space-to-Kernel Interface
-
1. Introduction
-
II. System
Initialization
-
4. Notification Chains
- 4.1. Reasons for Notification Chains
- 4.2. Overview
- 4.3. Defining a Chain
- 4.4. Registering with a Chain
- 4.5. Notifying Events on a Chain
- 4.6. Notification Chains for the Networking Subsystems
- 4.7. Tuning via /proc Filesystem
- 4.8. Functions and Variables Featured in This Chapter
- 4.9. Files and Directories Featured in This Chapter
-
5. Network Device
Initialization
- 5.1. System Initialization Overview
- 5.2. Device Registration and Initialization
- 5.3. Basic Goals of NIC Initialization
- 5.4. Interaction Between Devices and Kernel
- 5.5. Initialization Options
- 5.6. Module Options
- 5.7. Initializing the Device Handling Layer: net_dev_init
- 5.8. User-Space Helpers
- 5.9. Virtual Devices
- 5.10. Tuning via /proc Filesystem
- 5.11. Functions and Variables Featured in This Chapter
- 5.12. Files and Directories Featured in This Chapter
-
6. The PCI Layer and
Network Interface Cards
- 6.1. Data Structures Featured in This Chapter
- 6.2. Registering a PCI NIC Device Driver
- 6.3. Power Management and Wake-on-LAN
- 6.4. Example of PCI NIC Driver Registration
- 6.5. The Big Picture
- 6.6. Tuning via /proc Filesystem
- 6.7. Functions and Variables Featured in This Chapter
- 6.8. Files and Directories Featured in This Chapter
-
7. Kernel Infrastructure
for Component Initialization
- 7.1. Boot-Time Kernel Options
- 7.2. Module Initialization Code
- 7.3. Optimized Macro-Based Tagging
- 7.4. Boot-Time Initialization Routines
- 7.5. Memory Optimizations
- 7.6. Tuning via /proc Filesystem
- 7.7. Functions and Variables Featured in This Chapter
- 7.8. Files and Directories Featured in This Chapter
-
8. Device Registration and
Initialization
- 8.1. When a Device Is Registered
- 8.2. When a Device Is Unregistered
- 8.3. Allocating net_device Structures
- 8.4. Skeleton of NIC Registration and Unregistration
- 8.5. Device Initialization
- 8.6. Organization of net_device Structures
- 8.7. Device State
- 8.8. Registering and Unregistering Devices
- 8.9. Device Registration
- 8.10. Device Unregistration
- 8.11. Enabling and Disabling a Network Device
- 8.12. Updating the Device Queuing Discipline State
- 8.13. Configuring Device-Related Information from User Space
- 8.14. Virtual Devices
- 8.15. Locking
- 8.16. Tuning via /proc Filesystem
- 8.17. Functions and Variables Featured in This Chapter
- 8.18. Files and Directories Featured in This Chapter
-
4. Notification Chains
-
III. Transmission and
Reception
-
9. Interrupts and Network
Drivers
- 9.1. Decisions and Traffic Direction
- 9.2. Notifying Drivers When Frames Are Received
-
9.3. Interrupt Handlers
- 9.3.1. Reasons for Bottom Half Handlers
- 9.3.2. Bottom Halves Solutions
- 9.3.3. Concurrency and Locking
- 9.3.4. Preemption
- 9.3.5. Bottom-Half Handlers
- 9.3.6. Tasklets
- 9.3.7. Softirq Initialization
- 9.3.8. Pending softirq Handling
- 9.3.9. Per-Architecture Processing of softirq
- 9.3.10. ksoftirqd Kernel Threads
- 9.3.11. Tasklet Processing
- 9.3.12. How the Networking Code Uses softirqs
- 9.4. softnet_data Structure
-
10. Frame Reception
- 10.1. Interactions with Other Features
- 10.2. Enabling and Disabling a Device
- 10.3. Queues
- 10.4. Notifying the Kernel of Frame Reception: NAPI and netif_rx
- 10.5. Old Interface Between Device Drivers and Kernel: First Part of netif_rx
- 10.6. Congestion Management
- 10.7. Processing the NET_RX_SOFTIRQ: net_rx_action
- 11. Frame Transmission
- 12. General and Reference Material About Interrupts
-
13. Protocol Handlers
- 13.1. Overview of Network Stack
- 13.2. Executing the Right Protocol Handler
- 13.3. Protocol Handler Organization
- 13.4. Protocol Handler Registration
- 13.5. Ethernet Versus IEEE 802.3 Frames
- 13.6. Tuning via /proc Filesystem
- 13.7. Functions and Variables Featured in This Chapter
- 13.8. Files and Directories Featured in This Chapter
-
9. Interrupts and Network
Drivers
-
IV. Bridging
- 14. Bridging: Concepts
-
15. Bridging: The Spanning
Tree Protocol
- 15.1. Basic Terminology
- 15.2. Example of Hierarchical Switched L2 Topology
- 15.3. Basic Elements of the Spanning Tree Protocol
- 15.4. Bridge and Port IDs
- 15.5. Bridge Protocol Data Units (BPDUs)
- 15.6. Defining the Active Topology
- 15.7. Timers
- 15.8. Topology Changes
- 15.9. BPDU Encapsulation
- 15.10. Transmitting Configuration BPDUs
- 15.11. Processing Ingress Frames
- 15.12. Convergence Time
- 15.13. Overview of Newer Spanning Tree Protocols
-
16. Bridging: Linux
Implementation
- 16.1. Bridge Device Abstraction
- 16.2. Important Data Structures
- 16.3. Initialization of Bridging Code
- 16.4. Creating Bridge Devices and Bridge Ports
- 16.5. Creating a New Bridge Device
- 16.6. Bridge Device Setup Routine
- 16.7. Deleting a Bridge
- 16.8. Adding Ports to a Bridge
- 16.9. Enabling and Disabling a Bridge Device
- 16.10. Enabling and Disabling a Bridge Port
- 16.11. Changing State on a Bridge Port
- 16.12. The Big Picture
- 16.13. Forwarding Database
- 16.14. Handling Ingress Traffic
- 16.15. Transmitting on a Bridge Device
-
16.16. Spanning Tree Protocol (STP)
- 16.16.1. Key Spanning Tree Routines
- 16.16.2. Bridge IDs and Port IDs
- 16.16.3. Enabling the Spanning Tree Protocol on a Bridge Device
- 16.16.4. Processing Ingress BPDUs
- 16.16.5. Transmitting BPDUs
- 16.16.6. Configuration Updates
- 16.16.7. Root Bridge Selection
- 16.16.8. Timers
- 16.16.9. Handling Topology Changes
- 16.17. netdevice Notification Chain
-
17. Bridging:
Miscellaneous Topics
- 17.1. User-Space Configuration Tools
- 17.2. Tuning via /proc Filesystem
- 17.3. Tuning via /sys Filesystem
- 17.4. Statistics
- 17.5. Data Structures Featured in This Part of the Book
- 17.6. Functions and Variables Featured in This Part of the Book
- 17.7. Files and Directories Featured in This Part of the Book
-
V. Internet Protocol
Version 4 (IPv4)
-
18. Internet Protocol
Version 4 (IPv4): Concepts
- 18.1. IP Protocol: The Big Picture
- 18.2. IP Header
- 18.3. IP Options
- 18.4. Packet Fragmentation/Defragmentation
- 18.5. Checksums
- 19. Internet Protocol Version 4 (IPv4): Linux Foundations and Features
- 20. Internet Protocol Version 4 (IPv4): Forwarding and Local Delivery
-
21. Internet Protocol
Version 4 (IPv4): Transmission
-
21.1.
Key Functions That Perform Transmission
- 21.1.1. Multicast Traffic
- 21.1.2. Relevant Socket Data Structures for Local Traffic
- 21.1.3. The ip_queue_xmit Function
-
21.1.4. The ip_append_data Function
- 21.1.4.1. Basic memory allocation and buffer organization for ip_append_data
- 21.1.4.2. Memory allocation and buffer organization for ip_append_data with Scatter Gather I/O
- 21.1.4.3. Key routines for handling fragmented buffers
- 21.1.4.4. Further handling of the buffers
- 21.1.4.5. Setting the context
- 21.1.4.6. Getting ready for fragment generation
- 21.1.4.7. Copying data into the fragments: getfrag
- 21.1.4.8. Buffer allocation
- 21.1.4.9. Main loop
- 21.1.4.10. L4 checksum
- 21.1.5. The ip_append_page Function
- 21.1.6. The ip_push_pending_frames Function
- 21.1.7. Putting Together the Transmission Functions
- 21.1.8. Raw Sockets
- 21.2. Interface to the Neighboring Subsystem
-
21.1.
Key Functions That Perform Transmission
-
22. Internet Protocol
Version 4 (IPv4): Handling Fragmentation
- 22.1. IP Fragmentation
-
22.2.
IP Defragmentation
- 22.2.1. Organization of the IP Fragments Hash Table
- 22.2.2. Key Issues in Defragmentation
- 22.2.3. Functions Involved with Defragmentation
- 22.2.4. New ipq Instance Initialization
- 22.2.5. The ip_defrag Function
- 22.2.6. The ip_frag_queue Function
- 22.2.7. Garbage Collection
- 22.2.8. Hash Table Reorganization
-
23. Internet Protocol
Version 4 (IPv4): Miscellaneous Topics
- 23.1. Long-Living IP Peer Information
- 23.2. Selecting the IP Header’s ID Field
- 23.3. IP Statistics
- 23.4. IP Configuration
- 23.5. IP-over-IP
- 23.6. IPv4: What’s Wrong with It?
- 23.7. Tuning via /proc Filesystem
-
23.8.
Data Structures Featured in This Part of the
Book
- 23.8.1. iphdr Structure
- 23.8.2. ip_options Structure
- 23.8.3. ipcm_cookie Structure
- 23.8.4. ipq Structure
- 23.8.5. inet_peer Structure
- 23.8.6. ipstats_mib Structure
- 23.8.7. in_device Structure
- 23.8.8. in_ifaddr Structure
- 23.8.9. ipv4_devconf Structure
- 23.8.10. ipv4_config Structure
- 23.8.11. cork Structure
- 23.8.12. skb_frag_t Structure
- 23.9. Functions and Variables Featured in This Part of the Book
- 23.10. Files and Directories Featured in This Part of the Book
- 24. Layer Four Protocol and Raw IP Handling
-
25. Internet Control
Message Protocol (ICMPv4)
- 25.1. ICMP Header
- 25.2. ICMP Payload
-
25.3.
ICMP Types
- 25.3.1. ICMP_ECHO and ICMP_ECHOREPLY
- 25.3.2. ICMP_DEST_UNREACH
- 25.3.3. ICMP_SOURCE_QUENCH
- 25.3.4. ICMP_REDIRECT
- 25.3.5. ICMP_TIME_EXCEEDED
- 25.3.6. ICMP_PARAMETERPROB
- 25.3.7. ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY
- 25.3.8. ICMP_INFO_REQUEST and ICMP_INFO_REPLY
- 25.3.9. ICMP_ADDRESS and ICMP_ADDRESSREPLY
- 25.4. Applications of the ICMP Protocol
- 25.5. The Big Picture
- 25.6. Protocol Initialization
- 25.7. Data Structures Featured in This Chapter
-
25.8.
Transmitting ICMP Messages
- 25.8.1. Transmitting ICMP Error Messages
- 25.8.2. Replying to Ingress ICMP Messages
- 25.8.3. Rate Limiting
- 25.8.4. Implementation of Rate Limiting
- 25.8.5. Receiving ICMP Messages
- 25.8.6. Processing ICMP_ECHO and ICMP_ECHOREPLY Messages
- 25.8.7. Processing the Common ICMP Messages
- 25.8.8. Processing ICMP_REDIRECT Messages
- 25.8.9. Processing ICMP_TIMESTAMP and ICMP_TIMESTAMPREPLY Messages
- 25.8.10. Processing ICMP_ADDRESS and ICMP_ADDRESSREPLY Messages
- 25.9. ICMP Statistics
- 25.10. Passing Error Notifications to the Transport Layer
- 25.11. Tuning via /proc Filesystem
- 25.12. Functions and Variables Featured in This Chapter
- 25.13. Files and Directories Featured in This Chapter
-
18. Internet Protocol
Version 4 (IPv4): Concepts
-
VI. Neighboring
Subsystem
- 26. Neighboring Subsystem: Concepts
-
27. Neighboring Subsystem:
Infrastructure
- 27.1. Main Data Structures
- 27.2. Common Interface Between L3 Protocols and Neighboring Protocols
- 27.3. General Tasks of the Neighboring Infrastructure
- 27.4. Reference Counts on neighbour Structures
- 27.5. Creating a neighbour Entry
- 27.6. Neighbor Deletion
- 27.7. Acting As a Proxy
- 27.8. L2 Header Caching
- 27.9. Protocol Initialization and Cleanup
- 27.10. Interaction with Other Subsystems
- 27.11. Interaction Between Neighboring Protocols and L3 Transmission Functions
- 27.12. Queuing
-
28. Neighboring Subsystem:
Address Resolution Protocol (ARP)
- 28.1. ARP Packet Format
- 28.2. Example of an ARP Transaction
- 28.3. Gratuitous ARP
- 28.4. Responding from Multiple Interfaces
- 28.5. Tunable ARP Options
- 28.6. ARP Protocol Initialization
- 28.7. Initialization of a neighbour Structure
- 28.8. Transmitting and Receiving ARP Packets
- 28.9. Processing Ingress ARP Packets
- 28.10. Proxy ARP
- 28.11. Examples
- 28.12. External Events
- 28.13. ARPD
- 28.14. Reverse Address Resolution Protocol (RARP)
- 28.15. Improvements in ND (IPv6) over ARP (IPv4)
- 29. Neighboring Subsystem: Miscellaneous Topics
-
VII. Routing
- 30. Routing: Concepts
- 31. Routing: Advanced
- 32. Routing: Li nux Implementation
- 33. Routing: The Routing Cache
-
34. Routing: Routing
Tables
- 34.1. Organization of Routing Hash Tables
- 34.2. Routing Table Initialization
- 34.3. Adding and Removing Routes
- 34.4. Policy Routing and Its Effects on Routing Table Definitions
-
35. Routing: Lookups
- 35.1. High-Level View of Lookup Functions
- 35.2. Helper Routines
- 35.3. The Table Lookup: fn_hash_lookup
- 35.4. fib_lookup Function
- 35.5. Setting Functions for Reception and Transmission
- 35.6. General Structure of the Input and Output Routing Routines
- 35.7. Input Routing
- 35.8. Output Routing
- 35.9. Effects of Multipath on Next Hop Selection
- 35.10. Policy Routing
- 35.11. Source Routing
- 35.12. Policy Routing and Routing Table Based Classifier
-
36. Routing: Miscellaneous
Topics
- 36.1. User-Space Configuration Tools
- 36.2. Statistics
- 36.3. Tuning via /proc Filesystem
- 36.4. Enabling and Disabling Forwarding
-
36.5.
Data Structures Featured in This Part of the
Book
- 36.5.1. fib_table Structure
- 36.5.2. fn_zone Structure
- 36.5.3. fib_node Structure
- 36.5.4. fib_alias Structure
- 36.5.5. fib_info Structure
- 36.5.6. fib_nh Structure
- 36.5.7. fib_rule Structure
- 36.5.8. fib_result Structure
- 36.5.9. rtable Structure
- 36.5.10. dst_entry Structure
- 36.5.11. dst_ops Structure
- 36.5.12. flowi Structure
- 36.5.13. rt_cache_stat Structure
- 36.5.14. ip_mp_alg_ops Structure
- 36.6. Functions and Variables Featured in This Part of the Book
- 36.7. Files and Directories Featured in This Part of the Book
- About the Author
- Colophon
- Copyright
Product information
- Title: Understanding Linux Network Internals
- Author(s):
- Release date: December 2005
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596002558
You might also like
book
Linux Kernel Programming
Learn how to write high-quality kernel module code, solve common Linux kernel programming issues, and understand …
book
Linux Kernel Debugging
Effectively debug kernel modules, device drivers, and the kernel itself by gaining a solid understanding of …
book
Understanding the Linux Kernel, Second Edition
To thoroughly understand what makes Linux tick and why it's so efficient, you need to delve …
book
Linux Observability with BPF
Build your expertise in the BPF virtual machine in the Linux kernel with this practical guide …