The work queues have been introduced in Linux 2.6 and replace a similar construct called “task queue” used in Linux 2.4. They allow kernel functions to be activated (much like deferrable functions) and later executed by special kernel threads called worker threads .
Despite their similarities, deferrable functions and work queues are quite different. The main difference is that deferrable functions run in interrupt context while functions in work queues run in process context. Running in process context is the only way to execute functions that can block (for instance, functions that need to access some block of data on disk) because, as already observed in the section "Nested Execution of Exception and Interrupt Handlers" earlier in this chapter, no process switch can take place in interrupt context. Neither deferrable functions nor functions in a work queue can access the User Mode address space of a process. In fact, a deferrable function cannot make any assumption about the process that is currently running when it is executed. On the other hand, a function in a work queue is executed by a kernel thread, so there is no User Mode address space to access.
The main data structure associated with a work queue is a
descriptor called workqueue_struct
, which contains, among
other things, an array of NR_CPUS
elements, the maximum number of CPUs in the system.[*] Each element is a descriptor of type cpu_workqueue_struct
, whose fields are
shown in Table
4-12.
Table 4-12. The fields of the cpu_workqueue_struct structure
Field name | Description |
---|---|
| Spin lock used to protect the structure |
| Sequence number used by |
| Sequence number used by |
| Head of the list of pending functions |
| Wait queue where the worker thread waiting for more work to be done sleeps |
| Wait queue where the processes waiting for the work queue to be flushed sleep |
| Pointer to the |
| Process descriptor pointer of the worker thread of the structure |
| Current execution depth of
|
The worklist
field of the
cpu_workqueue_struct
structure is
the head of a doubly linked list collecting the pending functions of
the work queue. Every pending function is represented by a work_struct
data structure, whose fields
are shown in Table
4-13.
Table 4-13. The fields of the work_struct structure
Field name | Description |
---|---|
| Set to 1 if the function is already in a work queue list, 0 otherwise |
| Pointers to next and previous elements in the list of pending functions |
| Address of the pending function |
| Pointer passed as a parameter to the pending function |
| Usually points to the parent
|
| Software timer used to delay the execution of the pending function |
The create_workqueue("foo"
)
function receives as its parameter a string of
characters and returns the address of a workqueue_struct
descriptor for the newly
created work queue. The function also creates n
worker threads (where n is the number of CPUs
effectively present in the system), named after the string passed to
the function: foo/0,
foo/1, and so on. The create_singlethread_workqueue( )
function
is similar, but it creates just one worker thread, no matter what
the number of CPUs in the system is. To destroy a work queue the
kernel invokes the destroy_workqueue(
)
function, which receives as its parameter a pointer to a
workqueue_struct
array.
queue_work( )
inserts a
function (already packaged inside a work_struct
descriptor) in a work queue;
it receives a pointer wq
to the
workqueue_struct
descriptor and a
pointer work
to the work_struct
descriptor. queue_work( )
essentially performs the
following steps:
Checks whether the function to be inserted is already present in the work queue (
work->pending
field equal to 1); if so, terminates.Adds the
work_struct
descriptor to the work queue list, and setswork->pending
to 1.If a worker thread is sleeping in the
more_work
wait queue of the local CPU’scpu_workqueue_struct
descriptor, the function wakes it up.
The queue_delayed_work( )
function is nearly identical to queue_work(
)
, except that it receives a third parameter representing
a time delay in system ticks (see Chapter 6). It is used to ensure
a minimum delay before the execution of the pending function. In
practice, queue_delayed_work( )
relies on the software timer in the timer
field of the work_struct
descriptor to defer the actual
insertion of the work_struct
descriptor in the work queue list. cancel_delayed_work( )
cancels a
previously scheduled work queue function, provided that the
corresponding work_struct
descriptor has not already been inserted in the work queue
list.
Every worker thread continuously executes a loop inside the
worker_thread( )
function; most
of the time the thread is sleeping and waiting for some work to be
queued. Once awakened, the worker thread invokes the run_workqueue( )
function, which
essentially removes every work_struct
descriptor from the work queue
list of the worker thread and executes the corresponding pending
function. Because work queue functions can block, the worker thread
can be put to sleep and even migrated to another CPU when
resumed.[*]
Sometimes the kernel has to wait until all pending functions
in a work queue have been executed. The flush_workqueue( )
function receives a
workqueue_struct
descriptor
address and blocks the calling process until all functions that are
pending in the work queue terminate. The function, however, does not
wait for any pending function that was added to the work queue
following flush_workqueue( )
invocation; the remove_sequence
and insert_sequence
fields of
every cpu_workqueue_struct
descriptor are used to recognize the newly added pending
functions.
In most cases, creating a whole set of worker threads
in order to run a function is overkill. Therefore, the kernel offers
a predefined work queue called events, which
can be freely used by every kernel developer. The predefined work
queue is nothing more than a standard work queue that may include
functions of different kernel layers and I/O drivers; its workqueue_struct
descriptor is stored in
the keventd_wq
array. To make use
of the predefined work queue, the kernel offers the functions listed
in Table
4-14.
Table 4-14. Helper functions for the predefined work queue
Predefined work queue function | Equivalent standard work queue function |
---|---|
| |
| |
schedule_delayed_work_on(cpu,w,d) | |
| |
The predefined work queue saves significant system resources when the function is seldom invoked. On the other hand, functions executed in the predefined work queue should not block for a long time: because the execution of the pending functions in the work queue list is serialized on each CPU, a long delay negatively affects the other users of the predefined work queue.
In addition to the general events queue, you’ll find a few specialized work queues in Linux 2.6. The most significant is the kblockd work queue used by the block device layer (see Chapter 14).
[*] The reason for duplicating the work queue data structures in multiprocessor systems is that per-CPU local data structures yield a much more efficient code (see the section "Per-CPU Variables" in Chapter 5).
[*] Strangely enough, a worker thread can be executed by every
CPU, not just the CPU corresponding to the cpu_workqueue_struct
descriptor to
which the worker thread belongs. Therefore, queue_work( )
inserts a function in
the queue of the local CPU, but that function may be executed by
any CPU in the systems.
Get Understanding the Linux Kernel, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.