Most exceptions issued by the CPU are interpreted by Linux
as error conditions. When one of them occurs, the kernel sends a signal
to the process that caused the exception to notify it of an anomalous
condition. If, for instance, a process performs a division by zero, the
CPU raises a “Divide error " exception, and the corresponding exception handler
sends a SIGFPE
signal to the current
process, which then takes the necessary steps to recover or (if no
signal handler is set for that signal) abort.
There are a couple of cases, however, where Linux exploits CPU
exceptions to manage hardware resources more efficiently. A first case
is already described in the section "Saving and Loading the FPU, MMX,
and XMM Registers" in Chapter
3. The “Device not available " exception is used together with the TS
flag of the cr0
register to force the kernel to load the floating point
registers of the CPU with new values. A second case involves the “Page
Fault " exception, which is used to defer allocating new page
frames to the process until the last possible moment. The corresponding
handler is complex because the exception may, or may not, denote an
error condition (see the section "Page Fault Exception Handler"
in Chapter 9).
Exception handlers have a standard structure consisting of three steps:
Save the contents of most registers in the Kernel Mode stack (this part is coded in assembly language).
Handle the exception by means of a high-level C function.
Exit from the handler by means of the
ret_from_exception( )
function.
To take advantage of exceptions, the IDT must be properly
initialized with an exception handler function for each recognized
exception. It is the job of the trap_init(
)
function to insert the final values—the functions that
handle the exceptions—into all IDT entries that refer to nonmaskable
interrupts and exceptions. This is accomplished through the set_trap_gate( )
, set_intr_gate( )
, set_system_gate( )
, set_system_intr_gate( )
, and set_task_gate( )
functions:
set_trap_gate(0,÷_error); set_trap_gate(1,&debug); set_intr_gate(2,&nmi); set_system_intr_gate(3,&int3); set_system_gate(4,&overflow); set_system_gate(5,&bounds); set_trap_gate(6,&invalid_op); set_trap_gate(7,&device_not_available); set_task_gate(8,31); set_trap_gate(9,&coprocessor_segment_overrun); set_trap_gate(10,&invalid_TSS); set_trap_gate(11,&segment_not_present); set_trap_gate(12,&stack_segment); set_trap_gate(13,&general_protection); set_intr_gate(14,&page_fault); set_trap_gate(16,&coprocessor_error); set_trap_gate(17,&alignment_check); set_trap_gate(18,&machine_check); set_trap_gate(19,&simd_coprocessor_error); set_system_gate(128,&system_call);
The “Double fault” exception is handled by means of a task gate
instead of a trap or system gate, because it denotes a serious kernel
misbehavior. Thus, the exception handler that tries to print out the
register values does not trust the current value of the esp
register. When such an exception occurs,
the CPU fetches the Task Gate Descriptor stored in the entry at index 8
of the IDT. This descriptor points to the special TSS segment descriptor
stored in the 32nd entry of the GDT. Next,
the CPU loads the eip
and esp
registers with the values stored in the
corresponding TSS segment. As a result, the processor executes the
doublefault_fn()
exception handler on
its own private stack.
Now we will look at what a typical exception handler does once it is invoked. Our description of exception handling will be a bit sketchy for lack of space. In particular we won’t be able to cover:
The signal codes (see Table 11-8 in Chapter 11) sent by some handlers to the User Mode processes.
Exceptions that occur when the kernel is operating in MS-DOS emulation mode (vm86 mode), which must be dealt with differently.
Let’s use handler_name
to denote the name of a generic
exception handler. (The actual names of all the exception handlers
appear on the list of macros in the previous section.) Each exception
handler starts with the following assembly language
instructions:
handler_name:
pushl $0 /* only for some exceptions */
pushl $do_handler_name
jmp error_code
If the control unit is not supposed to automatically insert a
hardware error code on the stack when the exception occurs, the
corresponding assembly language fragment includes a pushl $0
instruction to pad the stack with a
null value. Then the address of the high-level C function is pushed on
the stack; its name consists of the exception handler name prefixed by
do_
.
The assembly language fragment labeled as error_code
is the same for all exception
handlers except the one for the “Device not available " exception (see the section "Saving and Loading the FPU, MMX,
and XMM Registers" in Chapter 3). The code performs the
following steps:
Saves the registers that might be used by the high-level C function on the stack.
Issues a
cld
instruction to clear the direction flagDF
ofeflags
, thus making sure that autoincreases on theedi
andesi
registers will be used with string instructions .[*]Copies the hardware error code saved in the stack at location
esp+36
inedx
. Stores the value -1 in the same stack location. As we’ll see in the section "Reexecution of System Calls" in Chapter 11, this value is used to separate0x80
exceptions from other exceptions.Loads
edi
with the address of the high-leveldo_handler_name( )
C function saved in the stack at locationesp+32
; writes the contents ofes
in that stack location.Loads in the
eax
register the current top location of the Kernel Mode stack. This address identifies the memory cell containing the last register value saved in step 1.Loads the user data Segment Selector into the
ds
andes
registers.Invokes the high-level C function whose address is now stored in
edi
.
The invoked function receives its arguments from the eax
and edx
registers rather than from the stack. We
have already run into a function that gets its arguments from the CPU
registers: the _ _switch_to( )
function, discussed in the section "Performing the Process
Switch" in Chapter
3.
As already explained, the names of the C functions that
implement exception handlers always consist of the prefix do_
followed by the handler name. Most of
these functions invoke the do_trap()
function to store the hardware
error code and the exception vector in the process descriptor of
current
, and then send a suitable
signal to that process:
current->thread.error_code = error_code; current->thread.trap_no = vector; force_sig(sig_number, current);
The current process takes care of the signal right after the termination of the exception handler. The signal will be handled either in User Mode by the process’s own signal handler (if it exists) or in Kernel Mode. In the latter case, the kernel usually kills the process (see Chapter 11). The signals sent by the exception handlers are listed in Table 4-1.
The exception handler always checks whether the exception
occurred in User Mode or in Kernel Mode and, in the latter case,
whether it was due to an invalid argument passed to a system call.
We’ll describe in the section "Dynamic Address Checking: The
Fix-up Code" in Chapter
10 how the kernel defends itself against invalid arguments
passed to system calls. Any other exception raised in Kernel Mode is
due to a kernel bug. In this case, the exception handler knows the
kernel is misbehaving. In order to avoid data corruption on the hard
disks, the handler invokes the die(
)
function, which prints the contents of all CPU registers
on the console (this dump is called kernel oops
) and terminates the current
process by calling do_exit( )
(see "Process Termination" in
Chapter 3).
When the C function that implements the exception handling
terminates, the code performs a jmp
instruction to the ret_from_exception(
)
function. This function is described in the later section
"Returning from Interrupts
and Exceptions.”
[*] A single assembly language “string instruction,” such as
rep;movsb
, is able to act on a whole block of data
(string).
Get Understanding the Linux Kernel, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.