So far, we’ve talked about the Linux kernel from the perspective of writing device drivers. Once you begin playing with the kernel, however, you may find that you want to “understand it all.” In fact, you may find yourself passing whole days navigating through the source code and grepping your way through the source tree to uncover the relationships among the different parts of the kernel.
This kind of “heavy grepping” is one of the tasks your authors perform quite often, and it is an efficient way to retrieve information from the source code. Nowadays you can even exploit Internet resources to understand the kernel source tree; some of them are listed in the preface. But despite Internet resources, wise use of grep,[62] less, and possibly ctags or etags can still be the best way to extract information from the kernel sources.
In our opinion, acquiring a bit of a knowledge base before sitting
down in front of your preferred shell prompt can be
helpful. Therefore, this chapter presents a quick overview of the
Linux kernel source files based on version 2.4.2. If you’re interested
in other versions, some of the descriptions may not apply
literally. Whole sections may be missing (like the
drivers/media
directory that was introduced in
2.4.0-test6 by moving various preexisting drivers to this new
directory). We hope the following information is useful, even if not
authoritative, for browsing other versions of the kernel.
Every pathname is given relative to the source root
(usually /usr/src/linux
), while filenames with no
directory component are assumed to reside in the “current”
directory—the one being discussed. Header files (when named
with <
and >
angle
brackets) are given relative to the include
directory of the source tree. We won’t dissect the
Documentation
directory, as its role is
self-explanatory.
The usual way to look at a program is to start where execution begins. As far as Linux is concerned, it’s hard to tell where execution begins—it depends on how you define “begins.”
The architecture-independent starting point is
start_kernel in init/main.c
.
This function is invoked from architecture-specific code, to which it
never returns. It is in charge of spinning the wheel and can thus be
considered the “mother of all functions,” the first breath in the
computer’s life. Before start_kernel, there was
chaos.
By the time start_kernel is invoked, the processor has been initialized, protected mode[63] has been entered, the processor is executing at the highest privilege level (sometimes called supervisor mode), and interrupts are disabled. The start_kernel function is in charge of initializing all the kernel data structures. It does this by calling external functions to perform subtasks, since each setup function is defined in the appropriate kernel subsystem.
The first function called by start_kernel, after
acquiring the kernel lock and printing the Linux banner string, is
setup_arch. This allows platform-specific
C-language code to run; setup_arch receives a
pointer to the local command_line
pointer in
start_kernel, so it can make it point to the real
(platform-dependent) location where the command line is stored. As the
next step, start_kernel passes the command line
to parse_options (defined in the same
init/main.c
file) so that the boot options can be
honored.
Command-line parsing is performed by calling handler functions
associated with each kernel argument (for example,
video=
is associated with
video_setup). Each function usually ends up
setting variables that are used later, when the associated facility is
initialized. The internal organization of command-line parsing is
similar to the init calls mechanism, described later.
After parsing, start_kernel activates the various basic functionalities of the system. This includes setting up interrupt tables, activating the timer interrupt, and initializing the console and memory management. All of this is performed by functions declared elsewhere in platform-specific code. The function continues by initializing less basic kernel subsystems, including buffer management, signal handling, and file and inode management.
Finally, start_kernel forks the init kernel thread (which gets 1 as a process ID) and executes the idle function (again, defined in architecture-specific code).
The initial boot sequence can thus be summarized as follows:
System firmware or a boot loader arranges for the kernel to be placed at the proper address in memory. This code is usually external to Linux source code.
Architecture-specific assembly code performs very low-level tasks, like initializing memory and setting up CPU registers so that C code can run flawlessly. This includes selecting a stack area and setting the stack pointer accordingly. The amount of such code varies from platform to platform; it can range from a few dozen lines up to a few thousand lines.
start_kernel is called. It acquires the kernel lock, prints the banner, and calls setup_arch.
Architecture-specific C-language code completes low-level initialization and retrieves a command line for start_kernel to use.
start_kernel parses the command line and calls the handlers associated with the keyword it identifies.
start_kernel initializes basic facilities and forks the init thread.
It is the task of the init thread to
perform all other initialization. The thread is part of the same
init/main.c
file, and the bulk of the
initialization (init) calls are performed by
do_basic_setup. The function initializes all bus
subsystems that it finds (PCI, SBus, and so on). It then invokes
do_initcalls; device driver initialization is
performed as part of the initcall processing.
The idea of init calls was added in version 2.3.13 and is not
available in older kernels; it is designed to avoid hairy
#ifdef
conditionals all over the initialization
code. Every optional kernel feature (device driver or whatever) must
be initialized only if configured in the system, so the call to
initialization functions used to be surrounded by #ifdef CONFIG_
FEATURE
and
#endif
. With init calls, each optional feature
declares its own initialization function; the compilation process then
places a reference to the function in a special ELF section. At boot
time, do_initcalls scans the ELF section to
invoke all the relevant initialization functions.
The same idea is applied to command-line arguments. Each driver that
can receive a command-line argument at boot time defines a data
structure that associates the argument with a function. A pointer to
the data structure is placed into a separate ELF section, so
parse_option can scan this section for each
command-line option and invoke the associated driver function, if a
match is found. The remaining arguments end up in either the
environment or the command line of the init
process. All the magic for init calls and ELF sections is part of
<linux/init.h>
.
Unfortunately, this init call idea works only when no ordering is
required across the various initialization functions, so a few
#ifdef
s are still present in
init/main.c
.
It’s interesting to see how the idea of init calls and its application to the list of command-line arguments helped reduce the amount of conditional compilation in the code:
morgana%grep -c ifdef linux-2.[024]/init/main.c
linux-2.0/init/main.c:120
linux-2.2/init/main.c:246
linux-2.4/init/main.c:35
Despite the huge addition of new features over time, the amount of
conditional compilation dropped significantly in 2.4 with the adoption
of init calls. Another advantage of this technique is that device
driver maintainers don’t need to patch main.c
every time they add support for a new command-line argument. The
addition of new features to the kernel has been greatly facilitated by
this technique and there are no more hairy cross references all over
the boot code. But as a side effect, 2.4 can’t be compiled into older
file formats that are less flexible than ELF. For this reason,
uClinux
[64] developers
switched from COFF to ELF while porting their system from 2.0 to 2.4.
Another side effect of extensive use of ELF sections is that the final
pass in compiling the kernel is not a conventional link pass as it
used to be. Every platform now defines exactly how to link the kernel
image (the vmlinux
file) by means of an
ldscript file; the file is called
vmlinux.lds
in the source tree of each
platform. Use of ld scripts is described in the
standard documentation for the binutils
package.
There is yet another advantage to putting the initialization code into a special section. Once initialization is complete, that code is no longer needed. Since this code has been isolated, the kernel is able to dump it and reclaim the memory it occupies.
[62] Usually, find and xargs are needed to build a command line for grep. Although not trivial, proficient use of Unix tools is outside of the scope of this book.
[63] This concept only makes sense on the x86 architecture. More mature architectures don’t find themselves in a limited backward-compatible mode when they power up.
[64] uClinux is a version of the Linux kernel that can run on processors without an MMU. This is typical in the embedded world, and several M68k and ARM processors have no hardware memory management. uClinux stands for microcontroller Linux, since it’s meant to run on microcontrollers rather than full-fledged computers.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.