Cover | Table of Contents
|
Abbreviation
|
Meaning
|
|---|---|
|
L2
|
Link layer (e.g., Ethernet)
|
kmalloc and kfree functions to allocate
and free a memory block, respectively. The syntax of those two functions is similar to
that of the two sister calls, malloc and free, from the libc
user-space library. For more details on kmalloc and
kfree, please refer to Linux Device Drivers (O'Reilly).ping,
iputils includes arping
(used to generate ARP requests), the Network Router Discovery daemon rdisc, and others.http://linux-net.osdl.org/index.php/Iproute2, and the other packages can
be downloaded from the download server of most Linux distributions.http://lartc.org
http://www.policyrouting.org
http://www.netfilter.org
http://cscope.sourceforge.net. It is a simple yet powerful
tool for searching, for example, where a function or variable is defined, where it is
called, etc. Installing the tool is straightforward and you can find all the necessary
instructions on the web site.struct sk_buff
struct net_device
net_device data structures are allocated.struct
sock, which stores the networking information for sockets. Because this book
does not cover sockets, I have not included sock in this
chapter.skb_reserve function
(described later in this chapter) to carry it out. Thus, one of the first things done by
each protocol, as the buffer passes down through layers, is to call skb_reserve to reserve space for the protocol's
header. In the later section "Data
reservation and alignment: skb_reserve, skb_put, skb_push, and skb_pull," we will
see an example of how the kernel makes sure enough space is reserved at the head of the
buffer to allow each layer to add its own header while the buffer traverses the layers.net_device data structure stores all
information specifically regarding a network device. There is one such structure for each
device, both real ones (such as Ethernet NICs) and virtual ones (such as bonding or VLAN). In this section, I will use the words interface and
device interchangeably, even though the difference between them is
important in other contexts.net_device structures for all devices are put
into a global list to which the global variable dev_base points. The data structure is defined in include/linux/netdevice.h. The registration of network devices is described
in Chapter 8. In that chapter, you can find
details on how and when most of the net_device fields
are initialized.sk_buff, this structure is quite big and
includes many feature-specific parameters, along with parameters from many different
layers. For this reason, the overall organization of the structure will probably see some
changes soon for optimization reasons.net_device structure are set to the same value for all devices of the same
type, some fields must be set differently by each model of device. Thus, for almost every
type, Linux provides a general function that initializes the parameters whose values stay
the same across all models. Each device driver invokes this function in addition to
setting those fields that have unique values for its model. Drivers can also overwrite
fields that were already initialized by the kernel (for instance, to improve performance).
You can find more details in Chapter 8.net_device structure can be
classified into the following categories:net_device structure includes three identifiers , not to be
confused:
ioctl command, and what functions are
provided by Netlink, currently the preferred interface for user-space network configuration.sysctl system call (see
man sysctl) and the other one is procfs. When the kernel has support for procfs, it adds a special directory (/proc/sys) to /proc that includes a file for each kernel variable exported by
sysctl.proc_mkdir. Files in /proc/net can be registered and unregistered with proc_net_fops_create and proc_net_remove, defined in include/linux/proc_fs.h. These two routines are wrappers around the generic
APIs create_proc_entry and remove_proc_entry. In particular, proc_net_fops_create takes care of creating the file (with proc_net_create) and initializing its file operation
handlers. Let's look at an example.static struct file_operations arp_seq_fops = {
.owner = THIS_MODULE,
.open = arp_seq_open,
.read = seq_read,
.llseek = seq_lseek,
.release = seq_release_private,
};
static int _ _init arp_proc_init(void)
{
if (!proc_net_fops_create("arp", S_IRUGO, &arp_seq_fops))
return -ENOMEM;
return 0;
}proc_net_fops_create tell you that the filename is arp, it must be assigned read permission only, and the set of file
operation handlers is ioctl call is issued. Let's see an example involving
ifconfig.ioctl to communicate with the kernel. For example,
when the system administrator types a command like ifconfig eth0
mtu 1250 to change the MTU of the interface eth0, ifconfig opens a socket,
initializes a local data structure with the information received from the system
administrator (data in the example), and passes it to
the kernel with an ioctl call. SIOCSIFMTU is the command identifier. struct ifreq data;
fd = socket(PF_INET, SOCK_DGRAM, 0);
< ... initialize "data" ...>
err = ioctl(fd, SIOCSIFMTU, &data);ioctl commands are processed by the kernel in
different places. Figure 3-4 shows how
the most common ioctl commands used by the networking
code are dispatched by sock_ioctl and routed to the
right function handler. We will not see how sock_ioctl
is invoked or how transport protocols like UDP and TCP register their handlers. If you
desire to dig into this part of the code, you can use the figure as a starting point. For
the routines that we cover in this book, the figure provides a reference to the right
chapter.
ioctl commands in the figure is
parsed (split into components) for your convenience. For example, the command used to add
a route to a routing table, SIOCADDRT, is shown as SIOC
ADD RT to emphasize the two interesting components: ADD, which says you are adding
something, and RT, which says a route is what you are adding. Most commands follow this
syntax. Often, when a given object type can be both read and written, you have one more
component in the command name: G for get or S for set. The two commands that add and
remove an IP address from an interface, socket system call:int socket(int domain, int type, int protocol)
PF_INET), you can use the man socket command.domain, type, and
protocol arguments. Netlink uses the new PF_NETLINK protocol family (domain), supports only the
SOCK_DGRAM type, and defines several protocols, each
one used for a different component (or a set of components) of the networking stack. For
example, the NETLINK_ROUTE protocol is used for most
networking features, such as routing and neighboring protocols, and NETLINK_FIREWALL is used for the firewall (Netfilter). The
Netlink protocols are listed in the NETLINK_
XXX enumeration list in include/linux/netlink.h.RTMGRP_
XXX in include/linux/rtnetlink.h. Among them are the RTMGRP_IPV4_ROUTErtnl_sem) that ensures exclusive access to the data structures
that store the networking configuration. This is true regardless of whether the
configuration is applied via ioctl or Netlink.ifconfig eth3 down) or
a hardware failure. Networks D, E, and F would become unreachable by RT (and by systems in
A, B, and C relying on RT for their connections) and should be removed from the routing
table. Who is going to tell the routing subsystem about that interface failure? A
notification chain.
ifconfig eth3 down) or
a hardware failure. Networks D, E, and F would become unreachable by RT (and by systems in
A, B, and C relying on RT for their connections) and should be removed from the routing
table. Who is going to tell the routing subsystem about that interface failure? A
notification chain.
If (subsystem_X_enabled) {
do_something_1
}
if (subsystem_Y_enabled) {
do_something_2
}
If (subsystem_Z_enabled) {
do_something_3
}
... ... ...notifier_block, whose definition is the following:struct notifier_block
{
int (*notifier_call)(struct notifier_block *self, unsigned long, void *);
struct notifier_block *next;
int priority;
};notifier_call is the function to execute, next is used to link together the elements of the list, and
priority represents the priority of the function.
Functions with higher priority are executed first. But in practice, almost all
registrations leave the priority out of the notifier_block definition, which means it gets the default
value of 0 and execution order ends up depending only on the registration order (i.e., it
is a semirandom order). The return values of notifier_call are listed in the upcoming section, "Notifying Events on a Chain."notifier_block instances are
xxx
_chain, xxx
_notifier_chain, and xxx
_notifier_list.notifier_chain_register. The kernel also provides a set of wrappers around
notifier_chain_register, some of which are shown in
Table 4-1.inetaddr_chain
, inet6addr_chain
, and netdev_chain.|
Operation
|
Function prototype
| |
|---|---|---|
|
Registration
|
int notifier_chain_register(struct notifier_block
**list, struct notifier_block *n)
| |
|
Wrappers
| ||
inetaddr_chain
|
register_inetaddr_notifier
| |
inet6addr_chain
|
register_inet6addr_notifier
| |
netdev_chain
|
register_netdevice_notifier
| |
|
Unregistration
|
int notifier_chain_unregister(struct notifier_block
**nl, struct notifier_block *n)
| |
|
Wrappers
| ||
inetaddr_chain
|
unregister_inetaddr_notifier | |
notifier_call_chain, defined in kernel/sys.c. This function simply invokes, in order of priority, all the
callback routines registered against the chain. Note that callback routines are executed
in the context of the process that calls notifier_call_chain. A callback routine could, however, be implemented so that
it queues the notification somewhere and wakes up a process that will look at it.int notifier_call_chain(struct notifier_block **n, unsigned long val, void *v)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb = *n;
while (nb)
{
ret = nb->notifier_call(nb, val, v);
if (ret & NOTIFY_STOP_MASK)
{
return ret;
}
nb = nb->next;
}
return ret;
}n
val
val unequivocally identifies an event type (i.e.,
NETDEV_REGISTER).v
v to identify the net_device data structure.notifier_call_chain
can return any of the NOTIFY_
XXX values defined in include/linux/notifier.h:NOTIFY_OK
NOTIFY_DONE
NOTIFY_BAD
NOTIFY_STOP
NOTIFY_STOP_MASK
notifier_call_chain
to see whether to stop invoking the callback routines, or keep going. Both inetaddr_chain
inet6addr_chain
).netdev_chain
reboot_notifier_list chain, which is a chain that warns when
the system is about to reboot.netdev_chain:int register_netdevice_notifier(struct notifier_block *nb)
{
return notifier_chain_register(&netdev_chain, nb);
}un]register_
xxx
_notifier, xxx
_[un]register_notifier, and xxx
_[un]register.ip_fib_init, which is the initialization routine used by the routing code
that is described in the section "Routing
Subsystem Initialization" in Chapter
32:static struct notifier_block fib_inetaddr_notifier = {
.notifier_call = fib_inetaddr_event,
};
static struct notifier_block fib_netdev_notifier = {
.notifier_call = fib_netdev_event,
};
void _ _init ip_fib_init(void)
{
... ... ...
register_netdevice_notifier(&fib_netdev_notifier);
register_inetaddr_notifier(&fib_inetaddr_notifier);
}