One of the most compelling problems for anyone writing kernel code is how to approach debugging. Kernel code cannot be easily executed under a debugger, nor can it be easily traced, because it is a set of functionalities not related to a specific process. Kernel code errors can also be exceedingly hard to reproduce and can bring down the entire system with them, thus destroying much of the evidence that could be used to track them down.
This chapter introduces techniques you can use to monitor kernel code and trace errors under such trying circumstances.
The most common debugging technique is monitoring, which in applications programming is done by calling printf at suitable points. When you are debugging kernel code, you can accomplish the same goal with printk.
We used the printk function in earlier chapters with the simplifying assumption that it works like printf. Now it’s time to introduce some of the differences.
One of the differences is that printk lets you
classify messages according to their severity by associating different
loglevels, or priorities, with the messages. You
usually indicate the loglevel with a macro. For example,
KERN_INFO
, which we saw prepended to some of the
earlier print statements, is one of the possible loglevels of the
message. The loglevel macro expands to a string, which is
concatenated to the message text at compile time; that’s why there is
no comma between the priority and the format string in the following
examples. Here are two examples of printk
commands, a debug message and a critical message:
printk(KERN_DEBUG "Here I am: %s:%i\n", __FILE__, __LINE_&_); printk(KERN_CRIT "I'm trashed; giving up on %p\n", ptr);
There are eight possible loglevel strings, defined in the header
<linux/kernel.h>
:
-
KERN_EMERG
Used for emergency messages, usually those that precede a crash.
-
KERN_ALERT
-
KERN_CRIT
Critical conditions, often related to serious hardware or software failures.
-
KERN_ERR
Used to report error conditions; device drivers will often use
KERN_ERR
to report hardware difficulties.-
KERN_WARNING
Warnings about problematic situations that do not, in themselves, create serious problems with the system.
-
KERN_NOTICE
Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level.
-
KERN_INFO
Informational messages. Many drivers print information about the hardware they find at startup time at this level.
-
KERN_DEBUG
Each string (in the macro expansion) represents an integer in angle brackets. Integers range from 0 to 7, with smaller values representing higher priorities.
A printk statement with no specified priority
defaults to DEFAULT_MESSAGE_LOGLEVEL
, specified in
kernel/printk.c
as an integer. The default
loglevel value has changed several times during Linux development, so
we suggest that you always specify an explicit loglevel.
Based on the loglevel, the kernel may print the message to the current
console, be it a text-mode terminal, a serial line printer, or a
parallel printer. If the priority is less than the integer variable
console_loglevel
, the message is displayed. If both
klogd and
syslogd are running on the system, kernel
messages are appended to /var/log/messages
(or
otherwise treated depending on your syslogd
configuration), independent of console_loglevel
. If
klogd is not running, the message won’t
reach user space unless you read /proc/kmsg
.
The variable console_loglevel
is initialized to
DEFAULT_CONSOLE_LOGLEVEL
and can be modified
through the sys_syslog system call. One way to
change it is by specifying the -c switch
when invoking klogd, as specified in the
klogd manpage. Note that to change the
current value, you must first kill klogd
and then restart it with the -c option.
Alternatively, you can write a program to change the console
loglevel. You’ll find a version of such a program in
misc-progs/setlevel.c
in the source files
provided on the O’Reilly FTP site. The new level is specified as an
integer value between 1 and 8, inclusive. If it is set to 1, only
messages of level 0 (KERN_EMERG
) will reach the
console; if it is set to 8, all messages, including debugging ones,
will be displayed.
You’ll probably want to lower the loglevel if you work on the console
and you experience a kernel fault (see Section 4.4
later in this chapter), because the fault-handling code raises the
console_loglevel
to its maximum value, causing
every subsequent message to appear on the console. You’ll want to
raise the loglevel if you need to see your debugging messages; this is
useful if you are developing kernel code remotely and the text console
is not being used for an interactive session.
From version 2.1.31 on it is possible to read and modify the console
loglevel using the text file
/proc/sys/kernel/printk
. The file hosts four
integer values. You may be interested in the first two: the current
console loglevel and the default level for messages.
With recent kernels, for instance, you can cause all kernel messages
to appear at the console by simply entering
#echo 8 > /proc/sys/kernel/printk
If you run 2.0, however, you still need the setlevel tool.
It should now be apparent why the hello.c
sample
had the <1>
markers; they are there to make
sure that the messages appear on the console.
Linux allows for some flexibility in console logging policies by
letting you send messages to a specific virtual console (if your
console lives on the text screen). By default, the “console” is the
current virtual terminal. To select a different virtual terminal to
receive messages, you can issue ioctl(TIOCLINUX)
on
any console device. The following program,
setconsole, can be used to choose which
console receives kernel messages; it must be run by the superuser and
is available in the misc-progs
directory.
This is how the program works:
int main(int argc, char **argv) { char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */ if (argc==2) bytes[1] = atoi(argv[1]); /* the chosen console */ else { fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1); } if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) { /* use stdin */ fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n", argv[0], strerror(errno)); exit(1); } exit(0); }
setconsole uses the special
ioctl command TIOCLINUX
, which
implements Linux-specific functions. To use
TIOCLINUX
, you pass it an argument that is a
pointer to a byte array. The first byte of the array is a number that
specifies the requested subcommand, and the following bytes are
subcommand specific. In setconsole,
subcommand 11 is used, and the next byte (stored in
bytes[1]
) identifies the virtual console. The
complete description of TIOCLINUX
can be found in
drivers/char/tty_io.c
, in the kernel sources.
The printk function writes messages into a
circular buffer that is LOG_BUF_LEN
(defined in
kernel/printk.c) bytes long. It then wakes any
process that is waiting for messages, that is, any process that is
sleeping in the syslog system call or that
is reading /proc/kmsg
. These two interfaces to
the logging engine are almost equivalent, but note that reading from
/proc/kmsg
consumes the data from the log buffer,
whereas the syslog system call can optionally
return log data while leaving it for other processes as well. In
general, reading the /proc
file is easier, which
is why it is the default behavior for
klogd.
If you happen to read the kernel messages by hand, after stopping
klogd you’ll find that the
/proc
file looks like a FIFO, in that the reader
blocks, waiting for more data. Obviously, you can’t read messages this
way if klogd or another process is already
reading the same data because you’ll contend for it.
If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data. The logging process thus loses the oldest data. This problem is negligible compared with the advantages of using such a circular buffer. For example, a circular buffer allows the system to run even without a logging process, while minimizing memory waste by overwriting old data should nobody read it. Another feature of the Linux approach to messaging is that printk can be invoked from anywhere, even from an interrupt handler, with no limit on how much data can be printed. The only disadvantage is the possibility of losing some data.
If the klogd process is running, it
retrieves kernel messages and dispatches them to
syslogd, which in turn checks
/etc/syslog.conf
to find out how to deal with
them. syslogd differentiates between
messages according to a facility and a priority; allowable values for
both the facility and the priority are defined in
<sys/syslog.h>
. Kernel messages are logged
by the LOG_KERN
facility, at a priority
corresponding to the one used in printk (for
example, LOG_ERR
is used for
KERN_ERR
messages). If
klogd isn’t running, data remains in the
circular buffer until someone reads it or the buffer overflows.
If you want to avoid clobbering your system log with the monitoring
messages from your driver, you can either specify the
-f (file) option to
klogd to instruct it to save messages to a
specific file, or modify /etc/syslog.conf
to suit
your needs. Yet another possibility is to take the brute-force
approach: kill klogd and verbosely print
messages on an unused virtual terminal,[21] or issue the command
cat /proc/kmsg from an unused
xterm.
During the early stages of driver development, printk can help considerably in debugging and testing new code. When you officially release the driver, on the other hand, you should remove, or at least disable, such print statements. Unfortunately, you’re likely to find that as soon as you think you no longer need the messages and remove them, you’ll implement a new feature in the driver (or somebody will find a bug) and you’ll want to turn at least one of the messages back on. There are several ways to solve both issues, to globally enable or disable your debug messages and to turn individual messages on or off.
Here we show one way to code printk calls so you can turn them on and off individually or globally; the technique depends on defining a macro that resolves to a printk (or printf) call when you want it to.
Each print statement can be enabled or disabled by removing or adding a single letter to the macro’s name.
All the messages can be disabled at once by changing the value of the
CFLAGS
variable before compiling.The same print statement can be used in kernel code and user-level code, so that the driver and test programs can be managed in the same way with regard to extra messages.
The following code fragment implements these features and comes
directly from the header scull.h
.
#undef PDEBUG /* undef it, just in case */ #ifdef SCULL_DEBUG # ifdef __KERNEL__ /* This one if debugging is on, and kernel space */ # define PDEBUG(fmt, args...) printk( KERN_DEBUG "scull: " fmt, ## args) # else /* This one for user space */ # define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args) # endif #else # define PDEBUG(fmt, args...) /* not debugging: nothing */ #endif #undef PDEBUGG #define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */
The symbol PDEBUG
depends on whether or not
SCULL_DEBUG
is defined, and it displays information
in whatever manner is appropriate to the environment where the code is
running: it uses the kernel call printk when it’s
in the kernel, and the libc call
fprintf to the standard error when run in user
space. The PDEBUGG
symbol, on the other hand, does
nothing; it can be used to easily “comment” print statements without
removing them entirely.
To simplify the process further, add the following lines to your makefile:
# Comment/uncomment the following line to disable/enable debugging DEBUG = y # Add your debugging flag (or not) to CFLAGS ifeq ($(DEBUG),y) DEBFLAGS = -O -g -DSCULL_DEBUG # "-O" is needed to expand inlines else DEBFLAGS = -O2 endif CFLAGS += $(DEBFLAGS)
The macros shown in this section depend on a gcc extension to the ANSI C preprocessor that supports macros with a variable number of arguments. This gcc dependency shouldn’t be a problem because the kernel proper depends heavily on gcc features anyway. In addition, the makefile depends on GNU’s version of make; once again, the kernel already depends on GNU make, so this dependency is not a problem.
If you’re familiar with the C preprocessor, you can expand on the given definitions to implement the concept of a “debug level,” defining different levels and assigning an integer (or bit mask) value to each level to determine how verbose it should be.
But every driver has its own features and monitoring needs. The art of good programming is in choosing the best trade-off between flexibility and efficiency, and we can’t tell what is the best for you. Remember that preprocessor conditionals (as well as constant expressions in the code) are executed at compile time, so you must recompile to turn messages on or off. A possible alternative is to use C conditionals, which are executed at runtime and therefore permit you to turn messaging on and off during program execution. This is a nice feature, but it requires additional processing every time the code is executed, which can affect performance even when the messages are disabled. Sometimes this performance hit is unacceptable.
The macros shown in this section have proven themselves useful in a number of situations, with the only disadvantage being the requirement to recompile a module after any changes to its messages.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.