As we explained earlier, most exceptions are handled simply by sending a Unix signal to the process that caused the exception. The action to be taken is thus deferred until the process receives the signal; as a result, the kernel is able to process the exception quickly.
This approach does not hold for interrupts because they frequently arrive long after the process to which they are related (for instance, a process that requested a data transfer) has been suspended and a completely unrelated process is running. So it would make no sense to send a Unix signal to the current process.
Interrupt handling depends on the type of interrupt. For our purposes, we’ll distinguish three main classes of interrupts:
- I/O interrupts
Some I/O devices require attention; the corresponding interrupt handler must query the device to determine the proper course of action. We cover this type of interrupt in the later section Section 4.6.1.
- Timer interrupts
Some timer, either a local APIC timer or an external timer, has issued an interrupt; this kind of interrupt tells the kernel that a fixed-time interval has elapsed. These interrupts are handled mostly as I/O interrupts; we discuss the peculiar characteristics of timer interrupts in Chapter 6.
- Interprocessor interrupts
A CPU issued an interrupt to another CPU of a multiprocessor system. We cover such interrupts in the later section Section 4.6.2.
In general, an I/O interrupt handler must be flexible enough to service several devices at the same time. In the PCI bus architecture, for instance, several devices may share the same IRQ line. This means that the interrupt vector alone does not tell the whole story. In the example shown in Table 4-3, the same vector 43 is assigned to the USB port and to the sound card. However, some hardware devices found in older PC architectures (like ISA) do not reliably operate if their IRQ line is shared with other devices.
Interrupt handler flexibility is achieved in two distinct ways, as discussed in the following list.
- IRQ sharing
The interrupt handler executes several interrupt service routines (ISRs). Each ISR is a function related to a single device sharing the IRQ line. Since it is not possible to know in advance which particular device issued the IRQ, each ISR is executed to verify whether its device needs attention; if so, the ISR performs all the operations that need to be executed when the device raises an interrupt.
- IRQ dynamic allocation
An IRQ line is associated with a device at the last possible moment; for instance, the IRQ line of the floppy device is allocated only when a user accesses the floppy disk device. In this way, the same IRQ vector may be used by several hardware devices even if they cannot share the IRQ line, although not at the same time.
Not all actions to be performed when an interrupt occurs have the
same urgency. In fact, the interrupt handler itself is not a suitable
place for all kind of actions. Long noncritical operations should be
deferred, since while an interrupt handler is running, the signals on
the corresponding IRQ line are temporarily ignored. Most important,
the process on behalf of which an interrupt handler is executed must
always stay in the TASK_RUNNING
state, or a system
freeze can occur. Therefore, interrupt handlers cannot perform any
blocking procedure such as an I/O disk operation.
Linux divides the actions to be
performed following an interrupt into three classes:
- Critical
Actions such as acknowledging an interrupt to the PIC, reprogramming the PIC or the device controller, or updating data structures accessed by both the device and the processor. These can be executed quickly and are critical because they must be performed as soon as possible. Critical actions are executed within the interrupt handler immediately, with maskable interrupts disabled.
- Noncritical
Actions such as updating data structures that are accessed only by the processor (for instance, reading the scan code after a keyboard key has been pushed). These actions can also finish quickly, so they are executed by the interrupt handler immediately, with the interrupts enabled.
- Noncritical deferrable
Actions such as copying a buffer’s contents into the address space of some process (for instance, sending the keyboard line buffer to the terminal handler process). These may be delayed for a long time interval without affecting the kernel operations; the interested process will just keep waiting for the data. Noncritical deferrable actions are performed by means of separate functions that are discussed in the later section Section 4.7.
Regardless of the kind of circuit that caused the interrupt, all I/O interrupt handlers perform the same four basic actions:
Save the IRQ value and the registers contents in the Kernel Mode stack.
Send an acknowledgment to the PIC that is servicing the IRQ line, thus allowing it to issue further interrupts.
Execute the interrupt service routines (ISRs) associated with all the devices that share the IRQ.
Terminate by jumping to the
ret_from_intr( )
address.
Several descriptors are needed to represent both the state of the IRQ lines and the functions to be executed when an interrupt occurs. Figure 4-3 represents in a schematic way the hardware circuits and the software functions used to handle an interrupt. These functions are discussed in the following sections.
As illustrated in Table 4-2, physical IRQs may be assigned any vector in the range 32-238. However, Linux uses vector 128 to implement system calls.
The IBM-compatible PC architecture requires that some devices be statically connected to specific IRQ lines. In particular:
The interval timer device must be connected to the IRQ0 line (see Chapter 6).
The slave 8259A PIC must be connected to the IRQ2 line (although more advanced PICs are now being used, Linux still supports 8259A-style PICs).
The external mathematical coprocessor must be connected to the IRQ13 line (although recent 80 × 86 processors no longer use such a device, Linux continues to support the hardy 80386 model).
In general, an I/O device can be connected to a limited number of IRQ lines. (As a matter of fact, when playing with an old PC where IRQ sharing is not possible, you might not succeed in installing a new card because of IRQ conflicts with other already present hardware devices.)
Table 4-2. Interrupt vectors in Linux
Vector range |
Use |
---|---|
0-19 |
Nonmaskable interrupts and exceptions |
20-31 |
Intel-reserved |
32-127 |
External interrupts (IRQs) |
128 |
Programmed exception for system calls (see Chapter 9) |
129-238 |
External interrupts (IRQs) |
239 |
Local APIC timer interrupt (see Chapter 6) |
240-250 |
Reserved by Linux for future use |
251-255 |
Interprocessor interrupts (see Section 4.6.2 later in this chapter) |
There are three ways to select a line for an IRQ-configurable device:
By setting some hardware jumpers (only on very old device cards).
By a utility program shipped with the device and executed when installing it. Such a program may either ask the user to select an available IRQ number or probe the system to determine an available number by itself.
By a hardware protocol executed at system startup. Peripheral devices declare which interrupt lines they are ready to use; the final values are then negotiated to reduce conflicts as much as possible. Once this is done, each interrupt handler can read the assigned IRQ by using a function that accesses some I/O ports of the device. For instance, drivers for devices that comply with the Peripheral Component Interconnect (PCI) standard use a group of functions such as
pci_read_config_byte( )
to access the device configuration space.
Table 4-3 shows a fairly arbitrary arrangement of devices and IRQs, such as those that might be found on one particular PC.
Table 4-3. An example of IRQ assignment to I/O devices
IRQ |
INT |
Hardware Device |
---|---|---|
0 |
32 |
Timer |
1 |
33 |
Keyboard |
2 |
34 |
PIC cascading |
3 |
35 |
Second serial port |
4 |
36 |
First serial port |
6 |
38 |
Floppy disk |
8 |
40 |
System clock |
10 |
42 |
Network interface |
11 |
43 |
USB port, sound card |
12 |
44 |
PS/2 mouse |
13 |
45 |
Mathematical coprocessor |
14 |
46 |
EIDE disk controller’s first chain |
15 |
47 |
EIDE disk controller’s second chain |
The kernel must discover the correspondence between the IRQ number and the I/O device before enabling interrupts. Otherwise, how could the kernel handle a signal from, for example, a SCSI disk without knowing which vector corresponds to the device? The correspondence is established while initializing each device driver (see Chapter 13).
As always, when discussing complicated operations involving state transitions, it helps to understand first where key data is stored. Thus, this section explains the data structures that support interrupt handling and how they are laid out in various descriptors. Figure 4-4 illustrates schematically the relationships between the main descriptors that represent the state of the IRQ lines. (The figure does not illustrate the data structures needed to handle softirqs, tasklets, and bottom halves; they are discussed later in this chapter.)
An irq _desc
array groups together
NR_IRQS
(usually 224) irq _desc_t
descriptors, which include the following fields:
-
status
A set of flags describing the IRQ line status (see Table 4-4).
Table 4-4. Flags describing the IRQ line status
Flag name |
Description |
---|---|
|
A handler for the IRQ is being executed. |
|
The IRQ line has been deliberately disabled by a device driver. |
|
An IRQ has occurred on the line; its occurrence has been acknowledged to the PIC, but it has not yet been serviced by the kernel. |
|
The IRQ line has been disabled but the previous IRQ occurrence has not yet been acknowledged to the PIC. |
|
The kernel uses the IRQ line while performing a hardware device probe. |
|
The kernel uses the IRQ line while performing a hardware device probe; moreover, the corresponding interrupt has not been raised. |
|
Not used on the 80 × 86 architecture. |
|
Not used. |
|
Not used on the 80 × 86 architecture. |
-
handler
Points to the
hw_interrupt_type
descriptor that identifies the PIC circuit servicing the IRQ line.-
action
Identifies the interrupt service routines to be invoked when the IRQ occurs. The field points to the first element of the list of
irqaction
descriptors associated with the IRQ. Theirqaction
descriptor is described later in the chapter.-
depth
Shows 0 if the IRQ line is enabled and a positive value if it has been disabled at least once. Every time the
disable_irq( )
ordisable_irq_nosync( )
function is invoked, the field is incremented; ifdepth
was equal to 0, the function disables the IRQ line and sets itsIRQ_DISABLED
flag.[28] Conversely, each invocation of theenable_irq( )
function decrements the field; ifdepth
becomes 0, the function enables the IRQ line and clears itsIRQ_DISABLED
flag.-
lock
A spin lock used to serialize the accesses to the IRQ descriptor (see Chapter 5).
During system initialization, the init_IRQ( )
function sets the status
field of each IRQ main
descriptor to IRQ _DISABLED
. Moreover,
init_IRQ( )
updates the IDT by replacing the
provisional interrupt gates with new ones. This is accomplished
through the following statements:
for (i = 0; i < NR_IRQS; i++) if (i+32 != 128) set_intr_gate(i+32,interrupt[i]);
This code looks in the interrupt
array to find the
interrupt handler addresses that it uses to set up the
interrupt
gates. The interrupt handler for IRQn is named
IRQ
n
_interrupt( )
(see the later section Section 4.6.1.4).
Some of the interrupt gates will never be used; others will be used only in multiprocessor systems; finally, some of them are always used. Thus, some of the interrupt gates are set to their final values, while others aren’t. More precisely:
The gates of the first 16 IRQs (vectors 32-47) are set to their final values.
In multiprocessor systems, the gates of the interprocessor interrupts and the gate of the local APIC timer interrupt are also set properly (see Section 4.6.1.7 later in this chapter).
Vector 128 is left untouched, since it is used for the system call’s programmed exception.
All remaining gates are reserved for interrupts issued from devices connected to a PCI bus. In this case, the
handler
field of theirq_desc
element is initialized to theno_irq_type
null handler.
In addition to the 8259A chip that was mentioned near the beginning
of this chapter, Linux supports several other PIC circuits such as
the SMP IO-APIC, PIIX4’s internal 8259 PIC, and
SGI’s Visual Workstation Cobalt (IO-)APIC. To handle
all such devices in a uniform way, Linux uses a “PIC
object,” consisting of the PIC name and seven PIC
standard methods. The advantage of this object-oriented approach is
that drivers need not to be aware of the kind of PIC installed in the
system. Each driver-visible interrupt source is transparently wired
to the appropriate controller. The data structure that defines a PIC
object is called hw_interrupt_type
(also called
hw_irq_controller
).
For the sake of concreteness, let’s assume that our
computer is a uniprocessor with two 8259A PICs, which provide 16
standard IRQs. In this case, the handler
field in
each of the 16 irq _desc_t
descriptors points to
the i8259A_irq _type
variable, which describes the
8259A PIC. This variable is initialized as follows:
struct hw_interrupt_type i8259A_irq_type = { "XT-PIC", startup_8259A_irq, shutdown_8259A_irq, enable_8259A_irq, disable_8259A_irq, mask_and_ack_8259A, end_8259A_irq, NULL };
The first field in this structure, "XT-PIC"
, is
the PIC name. Next come the pointers to six different functions used
to program the PIC. The first two functions start up and shut down an
IRQ line of the chip, respectively. But in the case of the 8259A
chip, these functions coincide with the third and fourth functions,
which enable and disable the line. The mask_and_ack_8259A( )
function acknowledges the IRQ received by sending the
proper bytes to the 8259A I/O ports. The end_8259A_irq( )
function is invoked when the interrupt handler for the
IRQ line terminates. The last set_affinity
method
is set to NULL
: it is used in multiprocessor
systems to declare the “affinity”
of CPUs for specified IRQs — that is, which CPUs are enabled to
handle specific IRQs.
As described earlier, multiple devices can share a single IRQ.
Therefore, the kernel maintains irqaction
descriptors, each of which refers to a specific hardware device and a
specific interrupt. The descriptor includes the following fields:
-
handler
Points to the interrupt service routine for an I/O device. This is the key field that allows many devices to share the same IRQ.
-
flags
Describes the relationships between the IRQ line and the I/O device (see Table 4-5).
Table 4-5. Flags of the irqaction descriptor
Flag name |
Description |
---|---|
|
The handler must execute with interrupts disabled. |
|
The device permits its IRQ line to be shared with other devices. |
|
The device may be considered a source of events that occurs randomly;
it can thus be used by the kernel random number generator. (Users can
access this feature by taking random numbers from the
|
-
name
The name of the I/O device (shown when listing the serviced IRQs by reading the
/proc/interrupts
file).-
dev_id
A private field for the I/O device. Typically, it identifies the I/O device itself (for instance, it could be equal to its major and minor numbers; see Section 13.2), or it points to a device driver’s data.
-
next
Points to the next element of a list of
irqaction
descriptors. The elements in the list refer to hardware devices that share the same IRQ.
Finally, the irq_stat
array includes
NR_CPUS
entries, one for each CPU in the system.
Each entry is of type irq_cpustat_t
, and includes
a few counters and flags used by the kernel to keep track of what any
CPU is currently doing. The most important fields are usually
accessed through some macros having as a parameter the CPU logical
number (that is, the index of the array).
In particular, the local_irq_count(n)
macro
selects the _ _local_irq_count
field of the
n
th entry of the array. The field is a counter
of how many interrupt handlers are stacked in the CPU — that
is, how many interrupt handlers have been started and are not yet
terminated.
Linux sticks to the Symmetric Multiprocessing model (SMP); this means, essentially, that the kernel should not have any bias toward one CPU with respect to the others. As a consequence, the kernel tries to distribute the IRQ signals coming from the hardware devices in a round-robin fashion among all the CPUs. Therefore, all the CPUs spend approximately the same fraction of their execution time servicing I/O interrupts.
In the earlier section Section 4.2.1.1, we said that the multi-APIC system has sophisticated mechanisms to dynamically distribute the IRQ signals among the CPUs. Therefore, the Linux kernel has to do very little to enforce the round-robin distribution scheme.
During
system bootstrap, the booting CPU
executes the setup_IO_APIC_irqs( )
function to
initialize the I/O APIC chip. The 24 entries of the Interrupt
Redirection Table of the chip are filled so that all IRQ signals from
the I/O hardware devices can be routed to each CPU in the system
according to the “lowest priority”
scheme. During system bootstrap, moreover, all CPUs execute the
setup_local_APIC( )
function, which takes care of
initializing the local APICs. In particular, the task priority
register (TPR) of each chip is initialized to a fixed value, meaning
that the CPU is willing to handle any kind of IRQ signal, regardless
of its priority. The Linux kernel never modifies this value after its
initialization.
Since all task priority registers contain the same value, all CPUs always have the same priority. To break tie, the multi-APIC system uses the values in the arbitration priority registers of local APICs, as explained earlier. Since such values are automatically changed after every interrupt, the IRQ signals are fairly distributed among all CPUs.[29]
In short, when a hardware device raises an IRQ signal, the multi-APIC system selects one of the CPUs and delivers the signal to the corresponding local APIC, which in turn interrupts its CPU. All other CPUs are not notified of the event. All this is magically done by the hardware, so it is of no concern for the kernel after multi-APIC system initialization.
When a CPU receives an interrupt, it starts executing the code at the address found in the corresponding gate of the IDT (see the earlier section Section 4.2.4).
As with other context switches, the need to save registers leaves the kernel developer with a somewhat messy coding job because the registers have to be saved and restored using assembly language code. However, within those operations, the processor is expected to call and return from a C function. In this section, we describe the assembly language task of handling registers; in the next, we show some of the acrobatics required in the C function that is subsequently invoked.
Saving registers is the first task of the interrupt handler. As
already mentioned, the interrupt handler for
IRQn is named
IRQ
n
_interrupt
,
and its address is included in the interrupt gate stored in the
proper IDT entry.
In uniprocessor systems, the same BUILD_IRQ
macro
is duplicated 16 times, once for each IRQ number, in order to yield
16 different interrupt handler entry points. In multiprocessor
systems, the macro is duplicated 14 × 16 times for a grand
total of 224 interrupt handler entry points. Each macro occurrence
expands to the following assembly language fragment:
IRQn_interrupt: pushl $n-256 jmp common_interrupt
The result is to save on the stack the IRQ number associated with the interrupt minus 256.[30]
The same code for all interrupt handlers can then be
executed while referring to this number. The common code can be found
in the BUILD_COMMON_IRQ
macro, which expands to
the following assembly language fragment:
common_interrupt: SAVE_ALL call do_IRQ jmp $ret_from_intr
The SAVE_ALL
macro, in turn, expands to the
following fragment:
cld push %es push %ds pushl %eax pushl %ebp pushl %edi pushl %esi pushl %edx pushl %ecx pushl %ebx movl $_ _KERNEL_DS,%edx movl %edx,%ds movl %edx,%es
SAVE_ALL
saves all the CPU registers that may be
used by the interrupt handler on the stack, except for
eflags
, cs
,
eip
, ss
, and
esp
, which are already saved automatically by the
control unit (see the earlier section Section 4.2.4). The macro then loads the
selector of the kernel data segment into ds
and
es
.
After saving the registers, BUILD_COMMON_IRQ
invokes the do_IRQ( )
function. Then, when the
ret
instruction of do_IRQ( )
is
executed (when that function terminates) control is transferred to
ret_from_intr( )
(see the later section Section 4.8).
The do_IRQ( )
function is invoked to execute all
interrupt service routines associated with an interrupt. When it
starts, the kernel stack contains, from the top down:
The
do_IRQ( )
’s return address (the starting address ofret_from_intr( )
)The group of register values pushed on by
SAVE_ALL
The encoding of the IRQ number
The registers saved automatically by the control unit when it recognized the interrupt
Since the C compiler places all the parameters on top of the stack,
the do_IRQ( )
function is declared as follows:
unsigned int do_IRQ(struct pt_regs regs)
where the pt_regs
structure consists of 15 fields:
The first nine fields are the register values pushed by
SAVE_ALL
.The tenth field, referenced through a field called
orig_eax
, encodes the IRQ number.The remaining fields correspond to the register values pushed on automatically by the control unit.[31]
The do_IRQ( )
function is equivalent to the
following code fragment. Don’t be scared by this
function — we are going to explain the code line by line.
int irq = regs.orig_eax & 0xff; spin_lock(&(irq_desc[irq].lock)); irq_desc[irq].handler->ack(irq); irq_desc[irq].status &= ~(IRQ_REPLAY | IRQ_WAITING); irq_desc[irq].status |= IRQ_PENDING; if (!(irq_desc[irq].status & (IRQ_DISABLED | IRQ_INPROGRESS)) && irq_desc[irq].action) { irq_desc[irq].status |= IRQ_INPROGRESS; do { irq_desc[irq].status &= ~IRQ_PENDING; spin_unlock(&(irq_desc[irq].lock)); handle_IRQ_event(irq, ®s, irq_desc[irq].action); spin_lock(&(irq_desc[irq].lock)); } while (irq_desc[irq].status & IRQ_PENDING); irq_desc[irq].status &= ~IRQ_INPROGRESS; } irq_desc[irq].handler->end(irq); spin_unlock(&(irq_desc[irq].lock)); if (softirq_pending(smp_processor_id( ))) do_softirq( );
First of all, the do_IRQ( )
function gets the IRQ
vector passed as a parameter on the stack and puts it in the
irq
local variable. This value is used as an index
to access the proper element of the irq_desc
array
(the IRQ main descriptor).
Before accessing the main IRQ descriptor, the kernel acquires the
corresponding spin lock. We’ll see in Chapter 5 that the spin lock protects against concurrent
accesses by different CPUs (in a uniprocessor system, the
spin_lock( )
function does nothing). This spin
lock is necessary in a multiprocessor system because other interrupts
of the same kind may be raised, and other CPUs might take care of the
new interrupt occurrences. Without the spin lock, the main IRQ
descriptor would be accessed concurrently by several CPUs. As
we’ll see, this situation must be absolutely
avoided.
After acquiring the spin lock, the function invokes the
ack
method of the main IRQ descriptor. In a
uniprocessor system, the corresponding mask_and_ack_8259A( )
function acknowledges the interrupt on the PIC and also
disables the IRQ line. Masking the IRQ line ensures that the CPU does
not accept further occurrences of this type of interrupt until the
handler terminates. Remember that the do_IRQ( )
function runs with local interrupts disabled; in fact, the CPU
control unit automatically clears the IF flag of the
eflags
register because the interrupt handler is
invoked through an IDT’s interrupt gate. However,
we’ll see shortly that the kernel might re-enable
local interrupts before executing the interrupt service routines of
this interrupt.
In a multiprocessor system, however, things are much more
complicated. Depending on the type of interrupt, acknowledging the
interrupt could either be done by the ack
method
or delayed until the interrupt handler terminates (that is,
acknowledgement could be done by the end
method).
In either case, we can take for granted that the local APIC
doesn’t accept further interrupts of this type until
the handler terminates, although further occurrences of this type of
interrupt may be accepted by other CPUs (main IRQ
descriptor’s spin lock comes to the rescue!).
The do_IRQ( )
function then initializes a few
flags of the main IRQ descriptor. It sets the
IRQ_PENDING
flag because the interrupt has been
acknowledged (well, sort of), but not yet really serviced; it also
clears the IRQ_WAITING
and
IRQ_REPLAY
flags (but we don’t
have to care about them now).
Now do_IRQ( )
checks whether it must really handle
the interrupt. There are three cases in which nothing has to be done.
These are discussed in the following list.
-
IRQ_DISABLED
is set A CPU might execute the
do_IRQ( )
function even if the corresponding IRQ line is disabled; you’ll find an explanation for this nonintuitive case in the later section Section 4.6.1.6. Moreover, buggy motherboards may generate spurious interrupts even when the IRQ line is disabled in the PIC.-
IRQ_INPROGRESS
is set In a multiprocessor system, another CPU might be handling a previous occurrence of the same interrupt. Why not defer the handling of this occurrence to that CPU? This is exactly what is done by Linux. This leads to a simpler kernel architecture because device drivers’ interrupt service routines need not to be reentrant (their execution is serialized). Moreover, the freed CPU can quickly return to what it was doing, without dirtying its hardware cache; this is beneficial to system performances. The
IRQ_INPROGRESS
flag is set whenever a CPU is committed to execute the interrupt service routines of the interrupt; therefore, thedo_IRQ( )
function checks it before starting the real work.-
irc_desc[irq].action
isNULL
This case occurs when there is no interrupt service routines associated with the interrupt. Normally, this happens only when the kernel is probing a hardware device.
Let’s suppose that none of the three cases holds, so
the interrupt has to be serviced. do_IRQ( )
sets
the IRQ_INPROGRESS
flag and starts a loop. In each
iteration, the function clears the IRQ_PENDING
flag, releases the interrupt spin lock, and executes the interrupt
services routines by invoking handle_IRQ_event( )
(described in the later section Section 4.6.1.7). When the latter function
terminates, do_IRQ( )
acquires the spin lock again
and checks the value of the IRQ_PENDING
flag. If
it is clear, no further occurrence of the interrupt has been
delivered to another CPU, so the loop ends. Conversely, if
IRQ_PENDING
is set, another CPU has executed the
do_IRQ( )
function for this type of interrupt
while this CPU was executing handle_IRQ_event( )
.
Therefore, do_IRQ( )
performs another iteration of
the loop, servicing the new occurrence of the interrupt.[32]
Our do_IRQ( )
function is now going to terminate,
either because it has already executed the interrupt service routines
or because it had nothing to do. The function invokes the
end
method of the main IRQ descriptor. On
uniprocessor systems, the corresponding end_8259A_irq( )
function re-enables the IRQ line (unless the interrupt
occurrence was spurious). On multiprocessor systems, the
end
method acknowledges the interrupt (if not
already done by the ack
method).
Finally, do_IRQ( )
releases the spin lock: the
hard work is finished! Before returning, however, the function checks
whether deferrable kernel functions are waiting to be executed (see
Section 4.7 later in
this chapter). In the affirmative case, it invokes the
do_softirq( )
function. When do_IRQ( )
terminates, the control is transferred to the
ret_from_intr( )
function.
The do_IRQ( )
function is small and simple, yet it
works properly in most cases. Indeed, the
IRQ_PENDING
, IRQ_INPROGRESS
,
and IRQ_DISABLED
flags ensure that interrupts are
correctly handled even when the hardware is misbehaving. However,
things may not work so smoothly in a multiprocessor system.
Suppose that a CPU has an IRQ line enabled. A hardware device raises
the IRQ line, and the multi-APIC system selects our CPU for handling
the interrupt. Before the CPU acknowledges the interrupt, the IRQ
line is masked out by another CPU; as a consequence, the
IRQ_DISABLED
flag is set. Right afterwards, our
CPU starts handling the pending interrupt; therefore, the
do_IRQ( )
function acknowledges the interrupt and
then returns without executing the interrupt service routines because
it finds the IRQ_DISABLED
flag set. Therefore, the
interrupt occurred before IRQ line disabling, yet it got lost.
To cope with this scenario, when the enable_irq( )
function re-enables the IRQ line, it forces the hardware to generate
a new occurrence of the lost interrupt:
spin_lock_irqsave(&(irq_desc[irq].lock), flags); if (--irq_desc[irq].depth == 0) { irq_desc[irq].status &= ~IRQ_DISABLED; if (irq_desc[irq].status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) { irq_desc[irq].status |= IRQ_REPLAY; send_IPI_self(irq+32); } irq_desc[irq].handler->enable(irq); } spin_lock_irqrestore(&(irq_desc[irq].lock), flags);
The function detects that an interrupt was lost by checking the value
of the IRQ_PENDING
flag. The flag is always
cleared when leaving the interrupt handler; therefore, if the IRQ
line is disabled and the flag is set, then an interrupt occurrence
has been acknowledged but not yet serviced. In this case it is
necessary to issue a new interrupt. This is obtained by forcing the
local APIC to generate a self-interrupt (see the later section Section 4.6.2). The role of the
IRQ_REPLAY
flag is to ensure that exactly one
self-interrupt is generated. Remember that the do_IRQ( )
function clears that flag when it starts handling the
interrupt.
As mentioned previously, an
interrupt service routine implements a device-specific operation.
When an interrupt handler must execute the ISRs, it invokes the
handle_IRQ_event( )
function. This function
essentially performs the steps shown in the following list.
Invokes the
irq_enter( )
function to increment the_ _local_irq_count
field of theirq_stat
entry of the executing CPU (to learn how many interrupt handlers are stacked in the CPU, see the earlier section Section 4.6.1.2). As we shall see in Chapter 5, this function also checks that interrupts are not globally disabled.Enables the local interrupts with the
sti
assembly language instruction if theSA_INTERRUPT
flag is clear.Executes each interrupt service routine of the interrupt through the following code:
do { action->handler(irq, action->dev_id, regs); action = action->next; } while (action);
At the start of the loop,
action
points to the start of a list ofirqaction
data structures that indicate the actions to be taken upon receiving the interrupt (see Figure 4-4 earlier in this chapter).Disables the local interrupts with the
cli
assembly language instruction.Invokes
irq_exit( )
to decrement the_ _local_irq_count
field of theirq_stat
entry of the executing CPU.
All interrupt service routines act on the same parameters:
-
irq
The IRQ number
-
dev_id
The device identifier
-
regs
A pointer to the Kernel Mode stack area containing the registers saved right after the interrupt occurred
The first parameter allows a single ISR to handle several IRQ lines, the second one allows a single ISR to take care of several devices of the same type, and the last one allows the ISR to access the execution context of the interrupted kernel control path. In practice, most ISRs do not use these parameters.
The SA_INTERRUPT
flag of the main IRQ descriptor
determines whether interrupts must be enabled or disabled when the
do_IRQ( )
function invokes an ISR. An ISR that has
been invoked with the interrupts in one state is allowed to put them
in the opposite state. In a uniprocessor system, this can be achieved
by means of the cli
(disable interrupts) and
sti
(enable interrupts) assembly language
instructions. Globally enabling or disabling interrupts in a
multiprocessor system is a much more complicated task;
we’ll deal with it in Chapter 5.
The structure of an ISR depends on the characteristics of the device handled. We’ll give a few examples of ISRs in Chapter 6, Chapter 13, and Chapter 18.
As noticed in section Section 4.6.1.1, a few vectors are reserved for specific devices, while the remaining ones are dynamically handled. There is, therefore, a way in which the same IRQ line can be used by several hardware devices even if they do not allow IRQ sharing. The trick is to serialize the activation of the hardware devices so that just one owns the IRQ line at a time.
Before activating a device that is going to use an IRQ line, the
corresponding driver invokes request_irq( )
. This
function creates a new irqaction
descriptor and
initializes it with the parameter values; it then invokes the
setup_irq( )
function to insert the descriptor in
the proper IRQ list. The device driver aborts the operation if
setup_irq( )
returns an error code, which means
that the IRQ line is already in use by another device that does not
allow interrupt sharing. When the device operation is concluded, the
driver invokes the free_irq( )
function to remove
the descriptor from the IRQ list and release the memory area.
Let’s see how this scheme works on a simple example.
Assume a program wants to address the /dev/fd0
device file, which corresponds to the first floppy disk on the
system.[33]
The program can do this either
by directly accessing /dev/fd0
or by mounting a
filesystem on it. Floppy disk controllers are usually assigned IRQ 6;
given this, the floppy driver issues the following request:
request_irq(6, floppy_interrupt, SA_INTERRUPT|SA_SAMPLE_RANDOM, "floppy", NULL);
As can be observed, the floppy_interrupt( )
interrupt service routine must execute with the interrupts disabled
(SA_INTERRUPT
set) and no sharing of the IRQ
(SA_SHIRQ
flag cleared). The
SA_SAMPLE_RANDOM
flag set means that accesses to
the floppy disk are a good source of random events to be used for the
kernel random number generator. When the operation on the floppy disk
is concluded (either the I/O operation on
/dev/fd0
terminates or the filesystem is
unmounted), the driver releases IRQ 6:
free_irq(6, NULL);
To insert an irqaction
descriptor in the proper
list, the kernel invokes the setup_irq( )
function, passing to it the parameters irq _nr
,
the IRQ number, and new
(the address of a
previously allocated irqaction
descriptor). This
function:
Checks whether another device is already using the
irq _nr
IRQ and, if so, whether theSA_SHIRQ
flags in theirqaction
descriptors of both devices specify that the IRQ line can be shared. Returns an error code if the IRQ line cannot be used.Adds
*new
(the newirqaction
descriptor pointed to bynew
) at the end of the list to whichirq _desc[irq _nr]->action
points.If no other device is sharing the same IRQ, clears the
IRQ _DISABLED
,IRQ_AUTODETECT
, andIRQ _INPROGRESS
flags in theflags
field of*new
and invokes thestartup
method of theirq_desc[irq_nr]->handler
PIC object to make sure that IRQ signals are enabled.
Here is an example of how setup_irq( )
is used,
drawn from system initialization. The kernel initializes the
irq0
descriptor of the interval timer device by
executing the following instructions in the time_init( )
function (see Chapter 6):
struct irqaction irq0 = {timer_interrupt, SA_INTERRUPT, 0, "timer", NULL,}; setup_irq(0, &irq0);
First, the irq0
variable of type
irqaction
is initialized: the
handler
field is set to the address of the
timer_interrupt( )
function, the
flags
field is set to
SA_INTERRUPT
, the name
field is
set to "timer
“, and the last field is set to
NULL
to show that no dev_id
value is used. Next, the kernel invokes setup_irq( )
to insert irq0
in the list of
irqaction
descriptors associated with
IRQ0.
On multiprocessor systems, Linux defines the following five kinds of interprocessor interrupts (see also Table 4-2):
-
CALL_FUNCTION_VECTOR
(vector0xfb
) Sent to all CPUs but the sender, forcing those CPUs to run a function passed by the sender. The corresponding interrupt handler is named
call_function_interrupt( )
. The function passed as a parameter may, for instance, force all other CPUs to stop, or may force them to set the contents of the Memory Type Range Registers (MTRRs).[34] Usually this interrupt is sent to all CPUs except the CPU executing the calling function by means of thesmp_call_function( )
facility function.-
RESCHEDULE_VECTOR
(vector0xfc
) When a CPU receives this type of interrupt, the corresponding handler — named
reschedule_interrupt( )
— limits itself to acknowledge the interrupt. All the rescheduling is done automatically when returning from the interrupt (see Section 4.8 later in this chapter).-
INVALIDATE_TLB_VECTOR
(vector0xfd
) Sent to all CPUs but the sender, forcing them to invalidate their Translation Lookaside Buffers. The corresponding handler, named
invalidate_interrupt( )
, flushes some TLB entries of the processor as described in Section 2.5.7.-
ERROR_APIC_VECTOR
(vector0xfe
) This interrupt should never occur.
-
SPURIOUS_APIC_VECTOR
(vector0xff
) This interrupt should never occur.
Thanks to the following group of functions, issuing interprocessor interrupts (IPIs) becomes an easy task:
-
send_IPI_all( )
Sends an IPI to all CPUs (including the sender)
-
send_IPI_allbutself( )
Sends an IPI to all CPUs except the sender
-
send_IPI_self( )
Sends an IPI to the sender CPU
-
send_IPI_mask( )
Sends an IPI to a group of CPUs specified by a bit mask
The assembly language code of the interprocessor interrupt handlers
is generated by the BUILD_SMP_INTERRUPT
macro; the
code is almost identical to the code generated by the
BUILD_IRQ
macro (see the earlier section Section 4.6.1.4).
Each interprocessor interrupt has a different high-level handler,
which has the same name as the low-level handler preceded by
smp_
. For instance, the high-level handler of the
RESCHEDULE_VECTOR
interprocessor interrupt that is
invoked by the low-level reschedule_interrupt( )
handler is named smp_reschedule_interrupt( )
. Each
high-level handler acknowledges the interprocessor interrupt on the
local APIC and then performs the specific action triggered by the
interrupt.
[28] Contrary to
disable_irq_nosync( )
,
disable_irq(n)
waits until all interrupt handlers
for IRQn that are running on other CPUs have
completed before returning.
[29] There is an exception, though. Linux usually sets up the local APICs in such a way to honor the focus processor . When an IRQ signal is raised, the focus processor for that IRQ is the CPU to which a previous occurrence of the same IRQ has been already sent; moreover, either the interrupt is still pending (waiting to be handled) or the CPU is still servicing the corresponding interrupt handler. When focus mode is enabled, an interrupt is always sent to its focus processor, if it exists. However, Intel has dropped support for focus processors in the Pentium 4 model.
[30] Subtracting 256 from an IRQ number yields a negative number. Positive numbers are reserved to identify system calls (see Chapter 9).
[31] The
ret_from_intr( )
return address is missing from
the pt_regs
structure because the C compiler
expects a return address on top of the stack. It takes this into
account when generating the instructions to address
parameters.
[32] Because IRQ_PENDING
is a flag and not a
counter, only the second occurrence of the interrupt can be
recognized. Further occurrences in each iteration of the
do_IRQ( )
’s loop are simply
lost.
[33] Floppy disks are “old” devices that do not usually allow IRQ sharing.
[34] Starting with the Pentium Pro model, Intel microprocessors include these additional registers to easily customize cache operations. For instance, Linux may use these registers to disable the hardware cache for the addresses mapping the frame buffer of a PCI/AGP graphic card while maintaining the “write combining” mode of operation: the paging unit combines write transfers into larger chunks before copying them into the frame buffer.
Get Understanding the Linux Kernel, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.