The previous section described how printk works and how it can be used. What it didn’t talk about are its disadvantages.
A massive use of printk can slow down the system
noticeably, because syslogd keeps syncing
its output files; thus, every line that is printed causes a disk
operation. This is the right implementation from
syslogd’s perspective. It tries to
write everything to disk in case the system crashes right after
printing the message; however, you don’t want to slow down your system
just for the sake of debugging messages. This problem can be solved
by prefixing the name of your log file as it appears in
/etc/syslogd.conf
with a
minus.[22] The problem with
changing the configuration file is that the modification will likely
remain there after you are done debugging, even though during normal
system operation you do want messages to be flushed to disk as soon as
possible. An alternative to such a permanent change is running a
program other than klogd (such as
cat /proc/kmsg, as suggested earlier), but this
may not provide a suitable environment for normal system operation.
More often than not, the best way to get relevant information is to query the system when you need the information, instead of continually producing data. In fact, every Unix system provides many tools for obtaining system information: ps, netstat, vmstat, and so on.
Two main techniques are available to driver developers for querying
the system: creating a file in the /proc
filesystem and using the ioctl driver method. You
may use devfs as an alternative to
/proc
, but /proc
is an
easier tool to use for information retrieval.
The /proc
filesystem is a special,
software-created filesystem that is used by the kernel to export
information to the world. Each file under /proc
is tied to a kernel function that generates the file’s “contents” on
the fly when the file is read. We have already seen some of these
files in action; /proc/modules
, for example,
always returns a list of the currently loaded modules.
/proc
is heavily used in the Linux system. Many
utilities on a modern Linux distribution, such as
ps, top, and
uptime, get their information from
/proc
. Some device drivers also export
information via /proc
, and yours can do so as
well. The /proc
filesystem is dynamic, so your
module can add or remove entries at any time.
Fully featured /proc
entries can be complicated
beasts; among other things, they can be written to as well as read
from. Most of the time, however, /proc
entries
are read-only files. This section will concern itself with the simple
read-only case. Those who are interested in implementing something
more complicated can look here for the basics; the kernel source may
then be consulted for the full picture.
All modules that work with /proc
should include
<linux/proc_fs.h>
to define the proper
functions.
To create a read-only /proc
file, your driver
must implement a function to produce the data when the file is read.
When some process reads the file (using the read
system call), the request will reach your module by means of one of
two different interfaces, according to what you registered. We’ll
leave registration for later in this section and jump directly to the
description of the reading interfaces.
In both cases the kernel allocates a page of memory (i.e.,
PAGE_SIZE
bytes) where the driver can write data to
be returned to user space.
The recommended interface is read_proc, but an older interface named get_info also exists.
-
int (*read_proc)(char *page, char **start, off_t offset, int count, int *eof, void *data);
The
page
pointer is the buffer where you’ll write your data;start
is used by the function to say where the interesting data has been written inpage
(more on this later);offset
andcount
have the same meaning as in the read implementation. Theeof
argument points to an integer that must be set by the driver to signal that it has no more data to return, whiledata
is a driver-specific data pointer you can use for internal bookkeeping.[23] The function is available in version 2.4 of the kernel, and 2.2 as well if you use our sysdep.h header.-
int (*get_info)(char *page, char **start, off_t offset, int count);
get_info is an older interface used to read from a
/proc
file. The arguments all have the same meaning as for read_proc. What it lacks is the pointer to report end-of-file and the object-oriented flavor brought in by thedata
pointer. The function is available in all the kernel versions we are interested in (although it had an extra unused argument in its 2.0 implementation).
Both functions should return the number of bytes of data actually
placed in the page
buffer, just like the
read implementation does for other files. Other
output values are *eof
and
*start
. eof
is a simple flag,
but the use of the start
value is somewhat more
complicated.
The main problem with the original implementation of user extensions
to the /proc
filesystem was use of a single
memory page for data transfer. This limited the total size of a user
file to 4 KB (or whatever was appropriate for the host platform). The
start
argument is there to implement large data
files, but it can be ignored.
If your proc_read function does not set the
*start
pointer (it starts out
NULL
), the kernel assumes that the
offset
parameter has been ignored and that the data
page contains the whole file you want to return to user space. If, on
the other hand, you need to build a bigger file from pieces, you can
set *start
to be equal to page
so that the caller knows your new data is placed at the beginning of
the buffer. You should then, of course, skip the first
offset
bytes of data, which will have already been
returned in a previous call.
There has long been another major issue with
/proc
files, which start
is
meant to solve as well. Sometimes the ASCII representation of kernel
data structures changes between successive calls to
read, so the reader process could find
inconsistent data from one call to the next. If
*start
is set to a small integer value, the caller
will use it to increment filp->f_pos
independently of the amount of data you return, thus making
f_pos
an internal record number of your
read_proc or get_info
procedure. If, for example, your read_proc
function is returning information from a big array of structures, and
five of those structures were returned in the first call,
start
could be set to 5. The next call will
provide that same value as the offset; the driver then knows to start
returning data from the sixth structure in the array. This is defined
as a “hack” by its authors and can be seen in
fs/proc/generic.c
.
Time for an example. Here is a simple read_proc implementation for the scull device:
int scull_read_procmem(char *buf, char **start, off_t offset, int count, int *eof, void *data) { int i, j, len = 0; int limit = count - 80; /* Don't print more than this */ for (i = 0; i < scull_nr_devs && len <= limit; i++) { Scull_Dev *d = &scull_devices[i]; if (down_interruptible(&d->sem)) return -ERESTARTSYS; len += sprintf(buf+len,"\nDevice %i: qset %i, q %i, sz %li\n", i, d->qset, d->quantum, d->size); for (; d && len <= limit; d = d->next) { /* scan the list */ len += sprintf(buf+len, " item at %p, qset at %p\n", d, d->data); if (d->data && !d->next) /* dump only the last item - save space */ for (j = 0; j < d->qset; j++) { if (d->data[j]) len += sprintf(buf+len," % 4i: %8p\n", j,d->data[j]); } } up(&scull_devices[i].sem); } *eof = 1; return len; }
This is a fairly typical read_proc
implementation. It assumes that there will never be a need to
generate more than one page of data, and so ignores the
start
and offset
values. It is,
however, careful not to overrun its buffer, just in case.
A /proc
function using the
get_info interface would look very similar to the
one just shown, with the exception that the last two arguments would be
missing. The end-of-file condition, in this case, is signaled by
returning less data than the caller expects (i.e., less than
count
).
Once you have a read_proc function defined, you
need to connect it to an entry in the /proc
hierarchy. There are two ways of setting up this connection, depending
on what versions of the kernel you wish to support. The easiest
method, only available in the 2.4 kernel (and 2.2 too if you use our
sysdep.h header), is to simply call
create_proc_read_entry. Here is the call used by
scull to make its
/proc
function available as
/proc/scullmem
:
create_proc_read_entry("scullmem", 0 /* default mode */, NULL /* parent dir */, scull_read_procmem, NULL /* client data */);
The arguments to this function are, as shown, the name of the
/proc
entry, the file permissions to apply to the
entry (the value 0 is treated as a special case and is turned to a
default, world-readable mask), the proc_dir_entry
pointer to the parent directory for this file (we use
NULL
to make the driver appear directly under
/proc
), the pointer to the
read_proc function, and the data pointer that
will be passed back to the read_proc function.
The directory entry pointer can be used to create entire directory
hierarchies under /proc
. Note, however, that an
entry may be more easily placed in a subdirectory of
/proc
simply by giving the directory name as part
of the name of the entry—as long as the directory itself already
exists. For example, an emerging convention says that
/proc
entries associated with device drivers
should go in the subdirectory driver/
;
scull could place its entry there simply by
giving its name as driver/scullmem
.
Entries in /proc
, of course, should be removed
when the module is unloaded. remove_proc_entry
is the function that undoes what
create_proc_read_entry did:
remove_proc_entry("scullmem", NULL /* parent dir */);
The alternative method for creating a /proc
entry
is to create and initialize a proc_dir_entry
structure and pass it to proc_register_dynamic
(version 2.0) or proc_register (version 2.2,
which assumes a dynamic file if the inode number in the structure is
0). As an example, consider the following code that
scull uses when compiled against 2.0
headers:
static int scull_get_info(char *buf, char **start, off_t offset, int len, int unused) { int eof = 0; return scull_read_procmem (buf, start, offset, len, &eof, NULL); } struct proc_dir_entry scull_proc_entry = { namelen: 8, name: "scullmem", mode: S_IFREG | S_IRUGO, nlink: 1, get_info: scull_get_info, }; static void scull_create_proc() { proc_register_dynamic(&proc_root, &scull_proc_entry); } static void scull_remove_proc() { proc_unregister(&proc_root, scull_proc_entry.low_ino); }
The code declares a function using the get_info
interface and fills in a proc_dir_entry
structure
that is registered with the filesystem.
This code provides compatibility across the 2.0 and 2.2 kernels, with
a little support from macro definitions in
sysdep.h
. It uses the
get_info interface because the 2.0 kernel did not
support read_proc. Some more work with
#ifdef
could have made it use
read_proc with Linux 2.2, but the benefits would
be minor.
ioctl, which we show you how to use in the next chapter, is a system call that acts on a file descriptor; it receives a number that identifies a command to be performed and (optionally) another argument, usually a pointer.
As an alternative to using the /proc
filesystem,
you can implement a few ioctl commands tailored
for debugging. These commands can copy relevant data structures from
the driver to user space, where you can examine them.
Using ioctl this way to get information is
somewhat more difficult than using /proc
, because
you need another program to issue the ioctl and
display the results. This program must be written, compiled, and kept
in sync with the module you’re testing. On the other hand, the
driver’s code is easier than what is needed to implement a
/proc
file
There are times when ioctl is the best way to get
information, because it runs faster than reading
/proc
. If some work must be performed on the data
before it’s written to the screen, retrieving the data in binary form
is more efficient than reading a text file. In addition,
ioctl doesn’t require splitting data into
fragments smaller than a page.
Another interesting advantage of the ioctl
approach is that information-retrieval commands can be left in the
driver even when debugging would otherwise be disabled. Unlike a
/proc
file, which is visible to anyone who looks
in the directory (and too many people are likely to wonder “what that
strange file is”), undocumented ioctl commands
are likely to remain unnoticed. In addition, they will still be there
should something weird happen to the driver. The only drawback is that
the module will be slightly bigger.
[22] The minus is a “magic” marker to prevent syslogd from flushing the file to disk at every new message, documented in syslog.conf(5), a manual page worth reading.
[23] We’ll find several of these pointers
throughout the book; they represent the “object” involved in this
action and correspond somewhat to this
in
C++.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.