One feature many drivers need is the ability to schedule execution of some tasks at a later time without resorting to interrupts. Linux offers three different interfaces for this purpose: task queues, tasklets (as of kernel 2.3.43), and kernel timers. Task queues and tasklets provide a flexible utility for scheduling execution at a later time, with various meanings for “later”; they are most useful when writing interrupt handlers, and we’ll see them again in Section 9.5, in Chapter 9. Kernel timers are used to schedule a task to run at a specific time in the future and are dealt with in Section 6.5, later in this chapter.
A typical situation in which you might use task queues or tasklets is
to manage hardware that cannot generate interrupts but still allows
blocking read. You need to poll the device, while taking care not to
burden the CPU with unnecessary operations. Waking the reading
process at fixed time intervals (for example, using
current->timeout
) isn’t a suitable approach,
because each poll would require two context switches (one to run the
polling code in the reading process, and one to return to a process
that has real work to do), and often a suitable polling mechanism can
be implemented only outside of a process’s context.
A similar problem is giving timely input to a simple hardware device. For example, you might need to feed steps to a stepper motor that is directly connected to the parallel port—the motor needs to be moved by single steps on a timely basis. In this case, the controlling process talks to your device driver to dispatch a movement, but the actual movement should be performed step by step at regular intervals after returning from write.
The preferred way to perform such floating operations quickly is to register a task for later execution. The kernel supports task queues, where tasks accumulate to be “consumed” when the queue is run. You can declare your own task queue and trigger it at will, or you can register your tasks in predefined queues, which are run (triggered) by the kernel itself.
This section first describes task queues, then introduces predefined task queues, which provide a good start for some interesting tests (and hang the computer if something goes wrong), and finally introduces how to run your own task queues. Following that, we look at the new tasklet interface, which supersedes task queues in many situations in the 2.4 kernel.
A task queue is a list of tasks, each task being represented by a
function pointer and an argument. When a task is run, it receives a
single void *
argument and returns
void
. The pointer argument can be used to pass
along a data structure to the routine, or it can be ignored. The
queue itself is a list of structures (the tasks) that are owned by the
kernel module declaring and queueing them. The module is completely
responsible for allocating and deallocating the structures, and static
structures are commonly used for this purpose.
A queue element is described by the following structure, copied
directly from <linux/tqueue.h>
:
struct tq_struct { struct tq_struct *next; /* linked list of active bh's */ int sync; /* must be initialized to zero */ void (*routine)(void *); /* function to call */ void *data; /* argument to function */ };
The “bh” in the first comment means bottom half. A bottom half is “half of an interrupt handler”; we’ll discuss this topic thoroughly when we deal with interrupts in Section 9.5, in Chapter 9. For now, suffice it to say that a bottom half is a mechanism provided by a device driver to handle asynchronous tasks which, usually, are too large to be done while handling a hardware interrupt. This chapter should make sense without an understanding of bottom halves, but we will, by necessity, refer to them occasionally.
The most important fields in the data structure just shown are
routine
and data
. To queue a
task for later execution, you need to set both these fields before
queueing the structure, while next
and
sync
should be cleared. The
sync
flag in the structure is used by the kernel to
prevent queueing the same task more than once, because this would
corrupt the next
pointer. Once the task has been
queued, the structure is considered “owned” by the kernel and
shouldn’t be modified until the task is run.
The other data structure involved in task queues is
task_queue
, which is currently just a pointer to
struct tq_struct
; the decision to
typedef
this pointer to another symbol permits the
extension of task_queue
in the future, should the
need arise. task_queue
pointers should be
initialized to NULL
before use.
The following list summarizes the operations that can be performed
on task queues and struct tq_struct
s.
-
DECLARE_TASK_QUEUE(name);
This macro declares a task queue with the given
name
, and initializes it to the empty state.-
int queue_task(struct tq_struct *task, task_queue *list);
As its name suggests, this function queues a task. The return value is 0 if the task was already present on the given queue, nonzero otherwise.
-
void run_task_queue(task_queue *list);
This function is used to consume a queue of accumulated tasks. You won’t need to call it yourself unless you declare and maintain your own queue.
Before getting into the details of using task queues, we need to pause for a moment to look at how they work inside the kernel.
A task queue, as we have already seen, is in practice a linked list of functions to call. When run_task_queue is asked to run a given queue, each entry in the list is executed. When you are writing functions that work with task queues, you have to keep in mind when the kernel will call run_task_queue; the exact context imposes some constraints on what you can do. You should also not make any assumptions regarding the order in which enqueued tasks are run; each of them must do its task independently of the other ones.
And when are task queues run? If you are using one of the predefined task queues discussed in the next section, the answer is “when the kernel gets around to it.” Different queues are run at different times, but they are always run when the kernel has no other pressing work to do.
Most important, they almost certainly are not run when the process that queued the task is executing. They are, instead, run asynchronously. Until now, everything we have done in our sample drivers has run in the context of a process executing system calls. When a task queue runs, however, that process could be asleep, executing on a different processor, or could conceivably have exited altogether.
This asynchronous execution resembles what happens when a hardware interrupt happens (which is discussed in detail in Chapter 9). In fact, task queues are often run as the result of a “software interrupt.” When running in interrupt mode (or interrupt time) in this way, your code is subject to a number of constraints. We will introduce these constraints now; they will be seen again in several places in this book. Repetition is called for in this case; the rules for interrupt mode must be followed or the system will find itself in deep trouble.
A number of actions require the context of a process in order to be executed. When you are outside of process context (i.e., in interrupt mode), you must observe the following rules:
No access to user space is allowed. Because there is no process context, there is no path to the user space associated with any particular process.
The
current
pointer is not valid in interrupt mode, and cannot be used.No sleeping or scheduling may be performed. Interrupt-mode code may not call schedule or sleep_on; it also may not call any other function that may sleep. For example, calling
kmalloc(..., GFP_KERNEL)
is against the rules. Semaphores also may not be used since they can sleep.
Kernel code can tell if it is running in interrupt mode by calling the function in_interrupt(), which takes no parameters and returns nonzero if the processor is running in interrupt time.
One other feature of the current implementation of task queues is that
a task can requeue itself in the same queue from which it was run.
For instance, a task being run from the timer tick can reschedule
itself to be run on the next tick by calling
queue_task to put itself on the queue again.
Rescheduling is possible because the head of the queue is replaced
with a NULL
pointer before consuming queued tasks;
as a result, a new queue is built once the old one starts executing.
Although rescheduling the same task over and over might appear to be a pointless operation, it is sometimes useful. For example, consider a driver that moves a pair of stepper motors one step at a time by rescheduling itself on the timer queue until the target has been reached. Another example is the jiq module, where the printing function reschedules itself to produce its output—the result is several iterations through the timer queue.
The easiest way to perform deferred execution is to use the queues
that are already maintained by the kernel. There are a few of these
queues, but your driver can use only three of them, described in the
following list. The queues are declared in
<linux/tqueue.h>
, which you should include in
your source.
- The scheduler queue
The scheduler queue is unique among the predefined task queues in that it runs in process context, implying that the tasks it runs have a bit more freedom in what they can do. In Linux 2.4, this queue runs out of a dedicated kernel thread called keventd and is accessed via a function called schedule_task. In older versions of the kernel, keventd was not used, and the queue (
tq_scheduler
) was manipulated directly.-
tq_timer
This queue is run by the timer tick. Because the tick (the function do_timer) runs at interrupt time, any task within this queue runs at interrupt time as well.
-
tq_immediate
The immediate queue is run as soon as possible, either on return from a system call or when the scheduler is run, whichever comes first. The queue is consumed at interrupt time.
Other predefined task queues exist as well, but they are not generally of interest to driver writers.
The timeline of a driver using a task queue is represented in Figure 6-1. The figure shows a driver that queues a
function in tq_immediate
from an interrupt handler.
Examples of deferred computation are available in the
jiq (“Just In Queue”) module, from which the
source in this section has been extracted. This module creates
/proc
files that can be read using
dd or other tools; this is similar to
jit.
The process reading a jiq file is put to sleep until the buffer is full.[28] This sleeping is handled with a simple wait queue, declared as
DECLARE_WAIT_QUEUE_HEAD (jiq_wait);
The buffer is filled by successive runs of a task queue. Each pass
through the queue appends a text string to the buffer being filled;
each string reports the current time (in jiffies), the process that is
current
during this pass, and the return value of
in_interrupt.
The code for filling the buffer is confined to the jiq_print_tq function, which executes at each run through the queue being used. The printing function is not interesting and is not worth showing here; instead, let’s look at the initialization of the task to be inserted in a queue:
struct tq_struct jiq_task; /* global: initialized to zero */ /* these lines are in jiq_init() */ jiq_task.routine = jiq_print_tq; jiq_task.data = (void *)&jiq_data;
There’s no need to clear the sync
and
next
fields of jiq_task
because
static variables are initialized to 0 by the compiler.
The scheduler queue is, in some ways, the easiest to use. Because tasks executed from this queue do not run in interrupt mode, they can do more things; in particular, they can sleep. Many parts of the kernel use this queue to accomplish a wide variety of tasks.
As of kernel 2.4.0-test11, the actual task queue implementing the scheduler queue is hidden from the rest of the kernel. Rather than use queue_task directly, code using this queue must call schedule_task to put a task on the queue:
int schedule_task(struct tq_struct *task);
task
, of course, is the task to be scheduled. The
return value is directly from queue_task: nonzero
if the task was not already on the queue.
Again, as of 2.4.0-test11, the kernel runs a special process, called keventd, whose sole job is running tasks from the scheduler queue. keventd provides a predictable process context for the tasks it runs (unlike the previous implementation, which would run tasks under an essentially random process’s context).
There are a couple of implications to the keventd implementation that are worth keeping in mind. The first is that tasks in this queue can sleep, and some kernel code takes advantage of that freedom. Well-behaved code, however, should take care to sleep only for very short periods of time, since no other tasks will be run from the scheduler queue while keventd is sleeping. It is also a good idea to keep in mind that your task shares the scheduler queue with others, which can also sleep. In normal situations, tasks placed in the scheduler queue will run very quickly (perhaps even before schedule_task returns). If some other task sleeps, though, the time that elapses before your tasks execute could be significant. Tasks that absolutely have to run within a narrow time window should use one of the other queues.
/proc/jiqsched
is a sample file that uses the
scheduler queue. The read function for the file
dispatches everything to the task queue in the following way:
int jiq_read_sched(char *buf, char **start, off_t offset, int len, int *eof, void *data) { jiq_data.len = 0; /* nothing printed, yet */ jiq_data.buf = buf; /* print in this place */ jiq_data.jiffies = jiffies; /* initial time */ /* jiq_print will queue_task() again in jiq_data.queue */ jiq_data.queue = SCHEDULER_QUEUE; schedule_task(&jiq_task); /* ready to run */ interruptible_sleep_on(&jiq_wait); /* sleep till completion */ *eof = 1; return jiq_data.len; }
Reading /proc/jiqsched
produces output like the
following:
time delta interrupt pid cpu command 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd 601687 0 0 2 1 keventd
In this output, the time
field is the value of
jiffies
when the task is run,
delta
is the change in jiffies
since the last time the task ran, interrupt
is the
output of the in_interrupt function,
pid
is the ID of the running process,
cpu
is the number of the CPU being used (always 0
on uniprocessor systems), and command
is the command
being run by the current process.
In this case, we see that the task is always running under the keventd process. It also runs very quickly—a task that resubmits itself to the scheduler queue can run hundreds or thousands of times within a single timer tick. Even on a very heavily loaded system, the latency in the scheduler queue is quite small.
The timer queue is different from the scheduler queue in that the
queue (tq_timer
) is directly available. Also, of
course, tasks run from the timer queue are run in interrupt mode.
Additionally, you’re guaranteed that the queue will run at the next
clock tick, thus eliminating latency caused by system load.
The sample code implements /proc/jiqtimer
with the timer queue. For this queue, it must use
queue_task to get things going:
int jiq_read_timer(char *buf, char **start, off_t offset, int len, int *eof, void *data) { jiq_data.len = 0; /* nothing printed, yet */ jiq_data.buf = buf; /* print in this place */ jiq_data.jiffies = jiffies; /* initial time */ jiq_data.queue = &tq_timer; /* reregister yourself here */ queue_task(&jiq_task, &tq_timer); /* ready to run */ interruptible_sleep_on(&jiq_wait); /* sleep till completion */ *eof = 1; return jiq_data.len; }
The following is what head /proc/jiqtimer returned on a system that was compiling a new kernel:
time delta interrupt pid cpu command 45084845 1 1 8783 0 cc1 45084846 1 1 8783 0 cc1 45084847 1 1 8783 0 cc1 45084848 1 1 8783 0 cc1 45084849 1 1 8784 0 as 45084850 1 1 8758 1 cc1 45084851 1 1 8789 0 cpp 45084852 1 1 8758 1 cc1 45084853 1 1 8758 1 cc1 45084854 1 1 8758 1 cc1 45084855 1 1 8758 1 cc1
Note, this time, that exactly one timer tick goes by between each invocation of the task, and that an arbitrary process is running.
The last predefined queue that can be used by modularized code is the
immediate queue. This queue is run via the bottom-half mechanism,
which means that one additional step is required to use it. Bottom
halves are run only when the kernel has been told that a run is
necessary; this is accomplished by “marking” the bottom half. In
the case of tq_immediate
, the necessary call is
mark_bh(IMMEDIATE_BH). Be sure to call
mark_bh
after the task has
been queued; otherwise, the kernel may run the task queue before your
task has been added.
The immediate queue is the fastest queue in the system—it’s executed soonest and is run in interrupt time. The queue is consumed either by the scheduler or as soon as one process returns from its system call. Typical output can look like this:
time delta interrupt pid cpu command 45129449 0 1 8883 0 head 45129453 4 1 0 0 swapper 45129453 0 1 601 0 X 45129453 0 1 601 0 X 45129453 0 1 601 0 X 45129453 0 1 601 0 X 45129454 1 1 0 0 swapper 45129454 0 1 601 0 X 45129454 0 1 601 0 X 45129454 0 1 601 0 X 45129454 0 1 601 0 X 45129454 0 1 601 0 X 45129454 0 1 601 0 X 45129454 0 1 601 0 X
It’s clear that the queue can’t be used to delay the execution of a task—it’s an “immediate” queue. Instead, its purpose is to execute a task as soon as possible, but at a safe time. This feature makes it a great resource for interrupt handlers, because it offers them an entry point for executing program code outside of the actual interrupt management routine. The mechanism used to receive network packets, for example, is based on a similar mechanism.
Please note that you should not reregister your task in this queue (although we do it in jiqimmed for explanatory purposes). The practice gains nothing and may lock the computer hard if run on some version/platform pairs. Some implementations used to rerun the queue until it was empty. This was true, for example, for version 2.0 running on the PC platform.
Declaring a new task queue is not difficult. A driver is free to declare a new task queue, or even several of them; tasks are queued just as we’ve seen with the predefined queues discussed previously.
Unlike a predefined task queue, however, a custom queue is not automatically run by the kernel. The programmer who maintains a queue must arrange for a way of running it.
The following macro declares the queue and expands to a variable declaration. You’ll most likely place it at the beginning of your file, outside of any function:
DECLARE_TASK_QUEUE(tq_custom);
After declaring the queue, you can invoke the usual functions to queue tasks. The call just shown pairs naturally with the following:
queue_task(&custom_task, &tq_custom);
The following line will run tq_custom
when it is
time to execute the task-queue entries that have accumulated:
run_task_queue(&tq_custom);
If you want to experiment with custom queues now, you need to register a function to trigger the queue in one of the predefined queues. Although this may look like a roundabout way to do things, it isn’t. A custom queue can be useful whenever you need to accumulate jobs and execute them all at the same time, even if you use another queue to select that “same time.”
Shortly before the release of the 2.4 kernel, the developers added a new mechanism for the deferral of kernel tasks. This mechanism, called tasklets, is now the preferred way to accomplish bottom-half tasks; indeed, bottom halves themselves are now implemented with tasklets.
Tasklets resemble task queues in a number of ways. They are a way of deferring a task until a safe time, and they are always run in interrupt time. Like task queues, tasklets will be run only once, even if scheduled multiple times, but tasklets may be run in parallel with other (different) tasklets on SMP systems. On SMP systems, tasklets are also guaranteed to run on the CPU that first schedules them, which provides better cache behavior and thus better performance.
Each tasklet has associated with it a function that is called when the
tasklet is to be executed. The life of some kernel developer was made
easier by giving that function a single argument of type
unsigned long
, which makes life a little more
annoying for those who would rather pass it a pointer; casting the
long
argument to a pointer type is a safe practice
on all supported architectures and pretty common in memory management
(as discussed in Chapter 13). The tasklet function is of
type void
; it returns no value.
Software support for tasklets is part of
<linux/interrupt.h>
, and the tasklet itself
must be declared with one of the following:
-
DECLARE_TASKLET(name, function, data);
Declares a tasklet with the given name; when the tasklet is to be executed (as described later), the given function is called with the (unsigned long) data value.
-
DECLARE_TASKLET_DISABLED(name, function, data);
Declares a tasklet as before, but its initial state is “disabled,” meaning that it can be scheduled but will not be executed until enabled at some future time.
The sample jiq driver, when compiled
against 2.4 headers, implements /proc/jiqtasklet
,
which works like the other jiq entries but
uses tasklets; we didn’t emulate tasklets for older kernel versions in
sysdep.h
. The module declares its tasklet as
void jiq_print_tasklet (unsigned long); DECLARE_TASKLET (jiq_tasklet, jiq_print_tasklet, (unsigned long) &jiq_data);
When your driver wants to schedule a tasklet to run, it calls tasklet_schedule:
tasklet_schedule(&jiq_tasklet);
Once a tasklet is scheduled, it is guaranteed to be run once (if enabled) at a safe time. Tasklets may reschedule themselves in much the same manner as task queues. A tasklet need not worry about running against itself on a multiprocessor system, since the kernel takes steps to ensure that any given tasklet is only running in one place. If your driver implements multiple tasklets, however, it should be prepared for the possibility that more than one of them could run simultaneously. In that case, spinlocks must be used to protect critical sections of the code (semaphores, which can sleep, may not be used in tasklets since they run in interrupt time).
The output from /proc/jiqtasklet
looks like this:
time delta interrupt pid cpu command 45472377 0 1 8904 0 head 45472378 1 1 0 0 swapper 45472379 1 1 0 0 swapper 45472380 1 1 0 0 swapper 45472383 3 1 0 0 swapper 45472383 0 1 601 0 X 45472383 0 1 601 0 X 45472383 0 1 601 0 X 45472383 0 1 601 0 X 45472389 6 1 0 0 swapper
Note that the tasklet always runs on the same CPU, even though this output was produced on a dual-CPU system.
The tasklet subsystem provides a few other functions for advanced use of tasklets:
-
void tasklet_disable(struct tasklet_struct *t);
This function disables the given tasklet. The tasklet may still be scheduled with tasklet_schedule, but its execution will be deferred until a time when the tasklet has been enabled again.
-
void tasklet_enable(struct tasklet_struct *t);
Enables a tasklet that had been previously disabled. If the tasklet has already been scheduled, it will run soon (but not directly out of tasklet_enable).
-
void tasklet_kill(struct tasklet_struct *t);
This function may be used on tasklets that reschedule themselves indefinitely. tasklet_kill will remove the tasklet from any queue that it is on. In order to avoid race conditions with the tasklet rescheduling itself, this function waits until the tasklet executes, then pulls it from the queue. Thus, you can be sure that tasklets will not be interrupted partway through. If, however, the tasklet is not currently running and rescheduling itself, tasklet_kill may hang. tasklet_kill may not be called in interrupt time.
[28] The buffer of a
/proc
file is a page of memory, 4 KB, or whatever
is appropriate for the platform you use.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.