GCC Extensions to the C Language: Appendix - Linux System Programming
by Robert LoveThis excerpt is from Linux System Programming.
This book is about writing software that makes the most effective use of the system you're running on -- code that interfaces directly with the kernel and core system libraries, including the shell, text editor, compiler, debugger, core utilities, and system daemons. Written primarily for engineers looking to program (better) at the low level, this book can give any programmer an understanding of core internals that makes for better code, no matter where it appears in the stack.
The GNU Compiler Collection (GCC) provides many extensions to the C language, some of which have proven to be of particular value to system programmers. The majority of the additions to the C language that we'll cover in this appendix offer ways for programmers to provide additional information to the compiler about the behavior and intended use of their code. The compiler, in turn, utilizes this information to generate more efficient machine code. Other extensions fill in gaps in the C programming language, particularly at lower levels.
GCC provides several extensions now available in the latest C standard, ISO C99. Some of these extensions function similarly to their C99 cousins, but ISO C99 implemented other extensions rather differently. New code should use the ISO C99 variants of these features. We won't cover such extensions here; we'll discuss only GCC-unique additions.
The flavor of C supported by GCC is often called GNU C. In the 1990s, GNU C filled in several gaps in the C language, providing features such as complex variables, zero-length arrays, inline functions, and named initializers. But after nearly a decade, C was finally upgraded, and with the standardization of ISO C99, GNU C extensions became less relevant. Nonetheless, GNU C continues to provide useful features, and many Linux programmers still use a subset of GNU C—often just an extension or two—in their C90- or C99-compliant code.
One prominent example of a GCC-specific code base is the Linux kernel, which is written strictly in GNU C. Recently, however, Intel has invested engineering effort in allowing the Intel C Compiler (ICC) to understand the GNU C extensions used by the kernel. Consequently, many of these extensions are now growing less GCC-specific.
The compiler copies the entire code of an "inline" function into the site where the function is called. Instead of storing the function externally and jumping to it whenever it is called, it runs the contents of the function directly. Such behavior saves the overhead of the function call, and allows for potential optimizations at the call site because the compiler can optimize the caller and callee together. This latter point is particularly valid if the parameters to the function are constant at the call site. Naturally, however, copying a function into each and every chunk of code that invokes it can have a detrimental effect on code size. Therefore, functions should be inlined only if they are small and simple, or are not called in many different places.
For many years, GCC has supported the inline keyword, instructing the compiler to
inline the given function. C99 formalized this keyword:
static inline int foo (void) { /* ... */ }Technically, however, the keyword is merely a hint—a suggestion to the compiler to consider inlining the given function. GCC further provides an extension for instructing the compiler to always inline the designated function:
static inline _ _attribute_ _ ((always_inline)) int foo (void) { /* ... */ }The most obvious candidate for an inline function is a preprocessor macro. An inline function in GCC will perform as well as a macro, and, additionally, receives type checking. For example, instead of this macro:
#define max(a,b) ({ a > b ? a : b; })one might use the corresponding inline function:
static inline max (int a, int b)
{
if (a > b)
return a;
return b;
}Programmers tend to overuse inline functions. Function call overhead on most modern architectures—the x86 in particular—is very, very low. Only the most worthy of functions should receive consideration!
In its most aggressive optimization mode, GCC automatically
selects functions that appear suitable for inlining and inlines them.
This is normally a good idea, but sometimes the programmer knows that a
function will perform incorrectly if inlined. One possible example of
this is when using _
_builtin_return_address (discussed later in this appendix). To
suppress inlining, use the noinline
keyword:
_ _attribute_ _ ((noinline)) int foo (void) { /* ... */ }A "pure" function is one that has no effects, and whose return
value reflects only the function's parameters or nonvolatile global
variables. Any parameter or global variable access must be read-only.
Loop optimization and subexpression elimination can be applied to such
functions. Functions are marked as pure via the pure keyword:
__attribute_ _ ((pure)) int foo (int val) { /* ... */ }A common example is strlen( ).
Given identical inputs, this function's return value is invariant across
multiple invocations, and thus it can be pulled out of a loop, and
called just once. For example, consider the following code:
/* character by character, print each letter in 'p' in uppercase */
for (i = 0; i < strlen (p); i++)
printf ("%c", toupper (p[i]));If the compiler did not know that strlen(
) was pure, it might invoke the function with each iteration
of the loop!
Smart programmers—as well as the compiler, if strlen( ) were marked pure—would write or
generate code like this:
size_t len;
len = strlen (p);
for (i = 0; i < len; i++)
printf ("%c", toupper (p[i]));Parenthetically, even smarter programmers (such as this book's readers) would write:
while (*p)
printf ("%c", toupper (*p++));It is illegal, and indeed makes no sense, for a pure function to
return void, as the return value is
the sole point of such functions.
A "constant" function is a stricter variant of a pure function.
Such functions cannot access global variables, and cannot take pointers
as parameters. Thus, the constant function's return value reflects
nothing but the passed-by-value parameters. Additional optimizations, on
top of those possible with pure functions, are possible for such
functions. Math functions, such as abs(
), are examples of constant functions (presuming they don't
save state or otherwise pull tricks in the name of optimization). A
programmer marks a function constant via the const keyword:
_ _attribute_ _ ((const)) int foo (int val) { /* ... */ }As with pure functions, it makes no sense for a constant function
to return void.
If a function does not return—perhaps because it invariantly calls
exit( )—the programmer can mark the
function with the noreturn keyword,
enlightening the compiler to that fact:
_ _attribute_ _ ((noreturn)) void foo (int val) { /* ... */ }In turn, the compiler can make additional optimizations, with the
understanding that under no circumstances will the invoked function ever
return. It does not make sense for such a function to return anything
but void.
If a function returns a pointer that can never alias[45] existing memory—almost assuredly because the function just
allocated fresh memory, and is returning a pointer to it—the programmer
can mark the function as such with the malloc keyword, and the compiler can in turn
perform suitable optimizations:
_ _attribute_ _ ((malloc)) void * get_page (void)
{
int page_size;
page_size = getpagesize ( );
if (page_size <= 0)
return NULL;
return malloc (page_size);
}Not an optimization, but a programming aid, the warn_unused_result attribute instructs the
compiler to generate a warning whenever the return value of a function
is not stored or used in a conditional statement:
_ _attribute_ _ ((warn_unused_result)) int foo (void) { /* ... */ }This allows the programmer to ensure that all callers check and
handle the return value from a function where the value is of particular
importance. Functions with important but oft-ignored return values, such
as read( ), make excellent candidates
for this attribute. Such functions cannot return void.
The deprecated attribute
instructs the compiler to generate a warning at the call site whenever
the function is invoked:
_ _attribute_ _ ((deprecated)) void foo (void) { /* ... */ }This helps wean programmers off deprecated and obsolete interfaces.
Occasionally, no code visible to a compiler invokes a particular
function. Marking a function with the used attribute instructs the compiler that the
program uses that function, despite appearances that the function is
never referenced:
static _ _attribute_ _ ((used)) void foo (void) { /* ... */ }The compiler therefore outputs the resulting assembly language, and does not display a warning about an unused function. This attribute is useful if a static function is invoked only from handwritten assembly code. Normally, if the compiler is not aware of any invocation, it will generate a warning, and potentially optimize away the function.
The unused attribute tells the
compiler that the given function or function parameter is unused, and
instructs it not to issue any corresponding warnings:
int foo (long _ _ attribute_ _ ((unused)) value) { /* ... */ }This is useful if you're compiling with -W or -Wunused, and you want to catch unused
function parameters, but you occasionally have functions that must match
a predetermined signature (as is common in event-driven GUI programming
or signal handlers).
The packed attribute tells the
compiler that a type or variable should be packed into memory using the
minimum amount of space possible, potentially disregarding alignment
requirements. If specified on a struct or union, all variables therein are so packed. If
specified on just one variable, only that specific object is
packed.
The following packs all variables within the structure into the minimum amount of space:
struct _ _attribute_ _ ((packed)) foo { ... };As an example, a structure containing a char followed by an int would most likely find the integer aligned
to a memory address not immediately following the char, but, say, three bytes later. The
compiler aligns the variables by inserting bytes of unused padding
between them. A packed structure lacks this padding, potentially
consuming less memory, but failing to meet architectural alignment
requirements.
As well as allowing packing of variables, GCC also allows
programmers to specify an alternative minimum alignment for a given
variable. GCC will then align the specified variable to at
least this value, as opposed to the minimum required
alignment dictated by the architecture and ABI. For example, this
statement declares an integer named beard_length with a minimum alignment of 32
bytes (as opposed to the typical alignment of 4 bytes on machines with
32-bit integers):
int beard_length _ _attribute_ _ ((aligned (32))) = 0;
Forcing the alignment of a type is generally useful only when dealing with hardware that may impose greater alignment requirements than the architecture itself, or when you are hand-mixing C and assembly code, and you want to use instructions that require specially aligned values. One example where this alignment functionality is utilized is for storing oft-used variables on processor cache lines to optimize cache behavior. The Linux kernel makes use of this technique.
As an alternative to specifying a certain minimum alignment, you
can ask that GCC align a given type to the largest minimum alignment
that is ever used for any data type. For example, this instructs GCC to
align parrot_height to the largest
alignment it ever uses, which is probably the alignment of a double:
short parrot_height _ _attribute_ _ ((aligned)) = 5;
This decision generally involves a space/time tradeoff: variables aligned in this manner consume more space, but copying to or from them (along with other complex manipulations) may be faster because the compiler can issue machine instructions that deal with the largest amount of memory.
Various aspects of the architecture or the system's tool chain may impose maximum limits on a variable's alignment. For example, on some Linux architectures, the linker is unable to recognize alignments beyond a rather small default. In that case, an alignment provided using this keyword is rounded down to the smallest allowed alignment. For example, if you request an alignment of 32, but the system's linker is unable to align to more than 8 bytes, the variable will be aligned along an 8 byte boundary.
GCC allows programmers to place global variables in a specific machine register, where the variables will then reside for the duration of the program's execution. GCC calls such variables global register variables.
The syntax requires that the programmer specify the machine
register. The following example uses ebx:
register int *foo asm ("ebx");The programmer must select a variable that is not
function-clobbered: that is, the selected variable must be usable by
local functions, saved and restored on function call invocation, and not
specified for any special purpose by the architecture or operating
system's ABI. The compiler will generate a warning if the selected
register is inappropriate. If the register is appropriate—ebx, used in this example, is fine for the x86
architecture—the compiler will in turn stop using the register
itself.
Such an optimization can provide huge performance boosts if the variable is frequently used. A good example is with a virtual machine. Placing the variable that holds, say, the virtual stack frame pointer in a register might lead to substantial gains. On the other hand, if the architecture is starved of registers to begin with (as the x86 architecture is), this optimization makes little sense.
Global register variables cannot be used in signal handlers, or by more than one thread of execution. They also cannot have initial values because there is no mechanism for executable files to supply default contents for registers. Global register variable declarations should precede any function definitions.
GCC allows programmers to annotate the expected value of an expression—for example, to tell the compiler whether a conditional statement is likely to be true or false. GCC, in turn, can then perform block reordering, and other optimizations to improve the performance of conditional branches.
The GCC syntax for branch notation is horrendously ugly. To make branch annotation easier on the eyes, we use preprocessor macros:
#define likely(x) _ _builtin_expect (!!(x), 1) #define unlikely(x) _ _builtin_expect (!!(x), 0)
Programmers can mark an expression as likely or unlikely true by
wrapping it in likely( ) or unlikely( ), respectively.
The following example marks a branch as unlikely true (that is, likely to be false):
int ret;
ret = close (fd);
if (unlikely (ret))
perror ("close");Conversely, the following example marks a branch as likely true:
const char *home;
home = getenv ("HOME");
if (likely (home))
printf ("Your home directory is %s\n", home);
else
fprintf (stderr, "Environment variable HOME not set!\n");As with inline functions, programmers have a tendency to overuse
branch annotation. Once you start anointing expressions, you might be
tempted to mark all expressions. Be careful,
though—you should mark branches as likely or unlikely only if you know
a priori and with little doubt that the expressions
will be true or false nearly all of the time (say,
with 99 percent certainty). Seldom-occurring errors are good candidates
for unlikely( ). Bear in mind,
however, that a false prediction is worse than no prediction at
all.
GCC provides the typeof( )
keyword to obtain the type of a given expression. Semantically, the
keyword operates the same as sizeof(
). For example, this expression returns the type of whatever
x points at:
typeof (*x)
We can use this to declare an array, y, of those types:
typeof (*x) y[42];
A popular use for typeof( ) is
to write "safe" macros, which can operate on any arithmetic value, and
evaluate its parameters only once:
#define max(a,b) ({ \
typeof (a) _a = (a); \
typeof (b) _b = (b); \
_a > _b ? _a : _b; \
})GCC provides the keyword _ _alignof_
_ to obtain the alignment of a given object. The value is
architecture- and ABI-specific. If the current architecture does not
have a required alignment, the keyword returns the ABI's recommended
alignment. Otherwise, the keyword returns the minimum required
alignment.
The syntax is identical to sizeof(
):
_ _alignof_ _(int)
Depending on the architecture, this probably returns 4, as 32-bit integers are generally aligned
along 4 byte boundaries.
The keyword works on lvalues, too. In that case, the returned
alignment is the minimum alignment of the backing type, not the actual
alignment of the specific lvalue. If the minimum alignment was changed
via the aligned attribute (described
earlier, in "the section called “Increasing the Alignment of a Variable”"), that change is
reflected by _ _alignof_ _.
For example, consider this structure:
struct ship {
int year_built;
char canons;
int mast_height;
};along with this code snippet:
struct ship my_ship;
printf ("%d\n", _ _alignof_ _(my_ship.canons));The _ _alignof_ _ in this
snippet will return 1, even though
structure padding probably results in canons consuming four bytes.
GCC provides a built-in keyword for obtaining the offset of a
member of a structure within that structure. The offsetof( ) macro, defined in <stddef.h>, is part of the ISO C
standard. Most definitions are horrid, involving obscene pointer
arithmetic and code unfit for minors. The GCC extension is simpler and
potentially faster:
#define offsetof(type, member) _ _builtin_offsetof (type, member)
A call returns the offset of member within type—that is, the number of bytes, starting
from zero, from the beginning of the structure to that member. For
example, consider the following structure:
struct rowboat {
char *boat_name;
unsigned int nr_oars;
short length;
};The actual offsets depend on the size of the variables, and the
architecture's alignment requirements and padding behavior, but on a
32-bit machine, we might expect calling offsetof( ) on struct
rowboat and boat_name,
nr_oars, and length to return 0, 4, and
8, respectively.
On a Linux system, the offsetof(
) macro should be defined using the GCC keyword, and need not
be redefined.
GCC provides a keyword for obtaining the return address of the current function, or one of the callers of the current function:
void * _ _builtin_return_address (unsigned int level)
The parameter level specifies
the function in the call chain whose address should be returned. A value
of 0 asks for the return address of
the current function, a value of 1
asks for the return address of the caller of the current function, a
value of 2 asks for
that function's caller's return address, and so
on.
If the current function is an inline function, the address
returned is that of the calling function. If this is unacceptable, use
the noinline keyword (described
earlier, in "the section called “Suppressing Inlining”") to force the
compiler not to inline the function.
There are several uses for the _
_builtin_return_address keyword. One is for debugging or
informational purposes. Another is to unwind a call chain, in order to
implement introspection, a crash dump utility, a debugger, and so
on.
Note that some architectures can return only the address of the
invoking function. On such architectures, a nonzero parameter value can
result in a random return value. Thus, any parameter other than 0 is nonportable, and should be used only for
debugging purposes.
GCC allows case statement
labels to specify a range of values for a single block. The general
syntax is as follows:
case low ... high:
For example:
switch (val) {
case 1 ... 10:
/* ... */
break;
case 11 ... 20:
/* ... */
break;
default:
/* ... */
}This functionality is quite useful for ASCII case ranges, too:
case 'A' ... 'Z':
Note that there should be a space before and after the ellipsis. Otherwise, the compiler can become confused, particularly with integer ranges. Always do the following:
case 4 ... 8:
and never this:
case 4...8:
In GCC, addition and subtraction operations are allowed on
pointers of type void, and pointers
to functions. Normally, ISO C does not allow arithmetic on such pointers
because the size of a "void" is a silly concept, and is dependent on
what the pointer is actually pointing to. To facilitate such arithmetic,
GCC treats the size of the referential object as one byte. Thus, the
following snippet advances a by
one:
a++; /* a is a void pointer */
The option -Wpointer-arith
causes GCC to generate a warning when these extensions are used.
Let's face it, the _ _attribute_
_ syntax is not pretty. Some of the extensions we've looked at
in this chapter essentially require preprocessor macros to make their
use palatable, but all of them can benefit from a sprucing up in
appearance.
With a little preprocessor magic, this is not hard. Further, in the same action, we can make the GCC extensions portable, by defining them away in the case of a non-GCC compiler (whatever that is).
To do so, stick the following code snippet in a header, and include that header in your source files:
#if __GNUC_ _ >= 3 # undef inline # define inline inline __attribute_ _ ((always_inline)) # define __noinline __attribute_ _ ((noinline)) # define __pure __attribute_ _ ((pure)) # define __const __attribute_ _ ((const)) # define __noreturn __attribute_ _ ((noreturn)) # define __malloc __attribute_ _ ((malloc)) # define __must_check __attribute_ _ ((warn_unused_result)) # define __deprecated __attribute_ _ ((deprecated)) # define __used __attribute_ _ ((used)) # define __unused __attribute_ _ ((unused)) # define __packed __attribute_ _ ((packed)) # define __align(x) __attribute_ _ ((aligned (x))) # define __align_max __attribute_ _ ((aligned)) # define likely(x) _ _builtin_expect (!!(x), 1) # define unlikely(x) _ _builtin_expect (!!(x), 0) #else # define _ _noinline /* no noinline */ # define _ _pure /* no pure */ # define _ _const /* no const */ # define _ _noreturn /* no noreturn */ # define _ _malloc /* no malloc */ # define _ _must_check /* no warn_unused_result */ # define _ _deprecated /* no deprecated */ # define _ _used /* no used */ # define _ _unused /* no unused */ # define _ _packed /* no packed */ # define _ _align(x) /* no aligned */ # define _ _align_max /* no align_max */ # define likely(x) (x) # define unlikely(x) (x) #endif
For example, the following marks a function as pure, using our shortcut:
_ _pure int foo (void) { /* ... */If GCC is in use, the function is marked with the pure attribute. If GCC is not the compiler,
the preprocessor replaces the _ _pure
token with a no-op. Note that you can place multiple attributes on a
given definition, and thus you can use more than one of these defines on
a single definition with no problems.
Easier, prettier, and portable!
[45] * A memory alias occurs when two or more pointer variables point at the same memory address. This can happen in trivial cases where a pointer is assigned the value of another pointer, and also in more complex, less obvious cases. If a function is returning the address of newly allocated memory, no other pointers to that same address should exist.
If you enjoyed this excerpt, buy a copy of Linux System Programming
