We will finish the chapter by examining the termination phase of interrupt and exception handlers. (Returning from a system call is a special case, and we shall describe it in Chapter 10.) Although the main objective is clear — namely, to resume execution of some program — several issues must be considered before doing it:
- Number of kernel control paths being concurrently executed
If there is just one, the CPU must switch back to User Mode.
- Pending process switch requests
If there is any request, the kernel must perform process scheduling; otherwise, control is returned to the current process.
- Pending signals
If a signal is sent to the current process, it must be handled.
- Single-step mode
If a debugger is tracing the execution of the current process, single-step mode must be restored before switching back to User Mode.
- Virtual-8086 mode
If the CPU is in virtual-8086 mode, the current process is executing a legacy Real Mode program, thus it must be handled in a special way.
A few flags are used to keep track of pending process switch
requests, of pending signals , and of single step execution; they are stored in the
flags
field of the thread_info
descriptor. The field stores other
flags as well, but they are not related to returning from interrupts and
exceptions. See Table
4-15 for a complete list of these flags.
Table 4-15. The flags field of the thread_info descriptor (continues)
Flag name | Description |
---|---|
| System calls are being traced |
| Not used in the 80 × 86 platform |
| The process has pending signals |
| Scheduling must be performed |
| Restore single step execution on return to User Mode |
| Force return from system call via
|
| System calls are being audited |
| The idle process is polling the
|
| The process is being destroyed to reclaim memory (see the section "The Out of Memory Killer" in Chapter 17) |
The kernel assembly language code that accomplishes all these
things is not, technically speaking, a function, because control is
never returned to the functions that invoke it. It is a piece of code
with two different entry points: ret_from_intr(
)
and ret_from_exception(
)
. As their names suggest, the kernel enters the former when
terminating an interrupt handler, and it enters the latter when
terminating an exception handler. We shall refer to the two entry points
as functions, because this makes the description simpler.
The general flow diagram with the corresponding two entry points
is illustrated in Figure
4-6. The gray boxes refer to assembly language instructions that
implement kernel preemption (see Chapter 5); if you want to see what
the kernel does when it is compiled without support for kernel
preemption, just ignore the gray boxes. The ret_from_exception( )
and ret_from_intr( )
entry points look quite
similar in the flow diagram. A difference exists only if support for
kernel preemption has been selected as a compilation option: in this
case, local interrupts are immediately disabled when returning from
exceptions.
The flow diagram gives a rough idea of the steps required to resume the execution of an interrupted program. Now we will go into detail by discussing the assembly language code.
The ret_from_intr( )
and
ret_from_exception( )
entry
points are essentially equivalent to the following assembly language
code:
ret_from_exception: cli ; missing if kernel preemption is not supported ret_from_intr: movl $-8192, %ebp ; -4096 if multiple Kernel Mode stacks are used andl %esp, %ebp movl 0x30(%esp), %eax movb 0x2c(%esp), %al testl $0x00020003, %eax jnz resume_userspace jpm resume_kernel
Recall that when returning from an interrupt, the local
interrupts are disabled (see step 3 in the earlier description of
handle_IRQ_event( )
); thus, the
cli
assembly language instruction is executed only when
returning from an exception.
The kernel loads the address of the thread_info
descriptor of current
in the ebp
register (see "Identifying a Process"
in Chapter 3).
Next, the values of the cs
and eflags
registers, which were pushed on the stack when the
interrupt or the exception occurred, are used to determine whether
the interrupted program was running in User Mode, or if the VM
flag of eflags
was set.[*] In either case, a jump is made to the resume_userspace
label. Otherwise, a jump
is made to the resume_kernel
label.
The assembly language code at the resume_kernel
label is executed if the
program to be resumed is running in Kernel Mode:
resume_kernel: cli ; these three instructions are cmpl $0, 0x14(%ebp) ; missing if kernel preemption jz need_resched ; is not supported restore_all: popl %ebx popl %ecx popl %edx popl %esi popl %edi popl %ebp popl %eax popl %ds popl %es addl $4, %esp iret
If the preempt_count
field
of the thread_info
descriptor is
zero (kernel preemption enabled), the kernel jumps to the need_resched
label. Otherwise, the
interrupted program is to be restarted. The function loads the
registers with the values saved when the interrupt or the exception
started, and the function yields control by executing the iret
instruction.
When this code is executed, none of the unfinished kernel
control paths is an interrupt handler, otherwise the preempt_count
field would be greater than
zero. However, as stated in "Nested Execution of Exception
and Interrupt Handlers" earlier in this chapter, there could
be up to two kernel control paths associated with exceptions (beside
the one that is terminating).
need_resched: movl 0x8(%ebp), %ecx testb $(1<<TIF_NEED_RESCHED), %cl jz restore_all testl $0x00000200,0x30(%esp) jz restore_all call preempt_schedule_irq jmp need_resched
If the TIF_NEED_RESCHED
flag in the flags
field of
current->thread_info
is zero,
no process switch is required, thus a jump is made to the restore_all
label. Also a jump to the same
label is made if the kernel control path that is being resumed was
running with the local interrupts disabled. In this case a process
switch could corrupt kernel data structures (see the section "When Synchronization Is
Necessary" in Chapter
5 for more details).
If a process switch is required, the preempt_schedule_irq( )
function is
invoked: it sets the PREEMPT_ACTIVE
flag in the preempt_count
field, temporarily sets the
big kernel lock counter to -
1
(see the section "The
Big Kernel Lock" in Chapter 5), enables the local
interrupts, and invokes schedule(
)
to select another process to run. When the former
process will resume, preempt_schedule_irq(
)
restores the previous value of the big kernel lock
counter, clears the PREEMPT_ACTIVE
flag, and disables local
interrupts. The schedule( )
function will continue to be invoked as long as the TIF_NEED_RESCHED
flag of the current
process is set.
If the program to be resumed was running in User Mode,
a jump is made to the resume_userspace
label:
resume_userspace: cli movl 0x8(%ebp), %ecx andl $0x0000ff6e, %ecx je restore_all jmp work_pending
After disabling the local interrupts, a check is made on the
value of the flags
field of
current->thread_info
. If no
flag except TIF_SYSCALL_TRACE
,
TIF_SYSCALL_AUDIT
, or TIF_SINGLESTEP
is set, nothing remains to
be done: a jump is made to the restore_all
label, thus resuming the User
Mode program.
The flags in the thread_info
descriptor state that
additional work is required before resuming the interrupted
program.
work_pending: testb $(1<<TIF_NEED_RESCHED), %cl jz work_notifysig work_resched: call schedule cli jmp resume_userspace
If a process switch request is pending, schedule( )
is invoked to select another
process to run. When the former process will resume, a jump is made
back to resume_userspace
.
There is other work to be done besides process switch requests:
work_notifysig: movl %esp, %eax testl $0x00020000, 0x30(%esp) je 1f work_notifysig_v86: pushl %ecx call save_v86_state popl %ecx movl %eax, %esp 1: xorl %edx, %edx call do_notify_resume jmp restore_all
If the VM
control flag in
the eflags
register of the User Mode program is set, the
save_v86_state( )
function is
invoked to build up the virtual-8086 mode data structures in the
User Mode address space. Then the do_notify_resume( )
function is invoked to
take care of pending signals and single stepping. Finally, a jump is
made to the restore_all
label to
resume the interrupted program.
[*] When this flag is set, programs are executed in virtual-8086 mode; see the Pentium manuals for more details.
Get Understanding the Linux Kernel, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.