GCC Extensions to the C Language: Appendix - Linux System Programming

by Robert Love
Linux System Programming book cover

This excerpt is from Linux System Programming.

This book is about writing software that makes the most effective use of the system you're running on -- code that interfaces directly with the kernel and core system libraries, including the shell, text editor, compiler, debugger, core utilities, and system daemons. Written primarily for engineers looking to program (better) at the low level, this book can give any programmer an understanding of core internals that makes for better code, no matter where it appears in the stack.

buy button

The GNU Compiler Collection (GCC) provides many extensions to the C language, some of which have proven to be of particular value to system programmers. The majority of the additions to the C language that we'll cover in this appendix offer ways for programmers to provide additional information to the compiler about the behavior and intended use of their code. The compiler, in turn, utilizes this information to generate more efficient machine code. Other extensions fill in gaps in the C programming language, particularly at lower levels.

GCC provides several extensions now available in the latest C standard, ISO C99. Some of these extensions function similarly to their C99 cousins, but ISO C99 implemented other extensions rather differently. New code should use the ISO C99 variants of these features. We won't cover such extensions here; we'll discuss only GCC-unique additions.

GNU C

The flavor of C supported by GCC is often called GNU C. In the 1990s, GNU C filled in several gaps in the C language, providing features such as complex variables, zero-length arrays, inline functions, and named initializers. But after nearly a decade, C was finally upgraded, and with the standardization of ISO C99, GNU C extensions became less relevant. Nonetheless, GNU C continues to provide useful features, and many Linux programmers still use a subset of GNU C—often just an extension or two—in their C90- or C99-compliant code.

One prominent example of a GCC-specific code base is the Linux kernel, which is written strictly in GNU C. Recently, however, Intel has invested engineering effort in allowing the Intel C Compiler (ICC) to understand the GNU C extensions used by the kernel. Consequently, many of these extensions are now growing less GCC-specific.

Inline Functions

The compiler copies the entire code of an "inline" function into the site where the function is called. Instead of storing the function externally and jumping to it whenever it is called, it runs the contents of the function directly. Such behavior saves the overhead of the function call, and allows for potential optimizations at the call site because the compiler can optimize the caller and callee together. This latter point is particularly valid if the parameters to the function are constant at the call site. Naturally, however, copying a function into each and every chunk of code that invokes it can have a detrimental effect on code size. Therefore, functions should be inlined only if they are small and simple, or are not called in many different places.

For many years, GCC has supported the inline keyword, instructing the compiler to inline the given function. C99 formalized this keyword:

static inline int foo (void) { /* ... */ }

Technically, however, the keyword is merely a hint—a suggestion to the compiler to consider inlining the given function. GCC further provides an extension for instructing the compiler to always inline the designated function:

static inline _  _attribute_  _ ((always_inline)) int foo (void) { /* ... */ }

The most obvious candidate for an inline function is a preprocessor macro. An inline function in GCC will perform as well as a macro, and, additionally, receives type checking. For example, instead of this macro:

#define max(a,b) ({ a > b ? a : b; })

one might use the corresponding inline function:

static inline max (int a, int b)
{
        if (a > b)
                return a;
        return b;
}

Programmers tend to overuse inline functions. Function call overhead on most modern architectures—the x86 in particular—is very, very low. Only the most worthy of functions should receive consideration!

Suppressing Inlining

In its most aggressive optimization mode, GCC automatically selects functions that appear suitable for inlining and inlines them. This is normally a good idea, but sometimes the programmer knows that a function will perform incorrectly if inlined. One possible example of this is when using _ _builtin_return_address (discussed later in this appendix). To suppress inlining, use the noinline keyword:

_  _attribute_  _ ((noinline)) int foo (void) { /* ... */ }

Pure Functions

A "pure" function is one that has no effects, and whose return value reflects only the function's parameters or nonvolatile global variables. Any parameter or global variable access must be read-only. Loop optimization and subexpression elimination can be applied to such functions. Functions are marked as pure via the pure keyword:

__attribute_  _ ((pure)) int foo (int val) { /* ... */ }

A common example is strlen( ). Given identical inputs, this function's return value is invariant across multiple invocations, and thus it can be pulled out of a loop, and called just once. For example, consider the following code:

/* character by character, print each letter in 'p' in uppercase */
for (i = 0; i < strlen (p); i++)
        printf ("%c", toupper (p[i]));

If the compiler did not know that strlen( ) was pure, it might invoke the function with each iteration of the loop!

Smart programmers—as well as the compiler, if strlen( ) were marked pure—would write or generate code like this:

size_t len;

len = strlen (p);
for (i = 0; i < len; i++)
        printf ("%c", toupper (p[i]));

Parenthetically, even smarter programmers (such as this book's readers) would write:

while (*p)
        printf ("%c", toupper (*p++));

It is illegal, and indeed makes no sense, for a pure function to return void, as the return value is the sole point of such functions.

Constant Functions

A "constant" function is a stricter variant of a pure function. Such functions cannot access global variables, and cannot take pointers as parameters. Thus, the constant function's return value reflects nothing but the passed-by-value parameters. Additional optimizations, on top of those possible with pure functions, are possible for such functions. Math functions, such as abs( ), are examples of constant functions (presuming they don't save state or otherwise pull tricks in the name of optimization). A programmer marks a function constant via the const keyword:

_  _attribute_  _ ((const)) int foo (int val) { /* ... */ }

As with pure functions, it makes no sense for a constant function to return void.

Functions That Do Not Return

If a function does not return—perhaps because it invariantly calls exit( )—the programmer can mark the function with the noreturn keyword, enlightening the compiler to that fact:

_  _attribute_  _ ((noreturn)) void foo (int val) { /* ... */ }

In turn, the compiler can make additional optimizations, with the understanding that under no circumstances will the invoked function ever return. It does not make sense for such a function to return anything but void.

Functions That Allocate Memory

If a function returns a pointer that can never alias[45] existing memory—almost assuredly because the function just allocated fresh memory, and is returning a pointer to it—the programmer can mark the function as such with the malloc keyword, and the compiler can in turn perform suitable optimizations:

_  _attribute_  _ ((malloc)) void * get_page (void)
{
        int page_size;

        page_size = getpagesize (  );
        if (page_size <= 0)
                return NULL;

        return malloc (page_size);
}

Forcing Callers to Check the Return Value

Not an optimization, but a programming aid, the warn_unused_result attribute instructs the compiler to generate a warning whenever the return value of a function is not stored or used in a conditional statement:

_  _attribute_  _ ((warn_unused_result)) int foo (void) { /* ... */ }

This allows the programmer to ensure that all callers check and handle the return value from a function where the value is of particular importance. Functions with important but oft-ignored return values, such as read( ), make excellent candidates for this attribute. Such functions cannot return void.

Marking Functions As Deprecated

The deprecated attribute instructs the compiler to generate a warning at the call site whenever the function is invoked:

_  _attribute_  _ ((deprecated)) void foo (void) { /* ... */ }

This helps wean programmers off deprecated and obsolete interfaces.

Marking Functions As Used

Occasionally, no code visible to a compiler invokes a particular function. Marking a function with the used attribute instructs the compiler that the program uses that function, despite appearances that the function is never referenced:

static _  _attribute_  _ ((used)) void foo (void) { /* ... */ }

The compiler therefore outputs the resulting assembly language, and does not display a warning about an unused function. This attribute is useful if a static function is invoked only from handwritten assembly code. Normally, if the compiler is not aware of any invocation, it will generate a warning, and potentially optimize away the function.

Marking Functions or Parameters As Unused

The unused attribute tells the compiler that the given function or function parameter is unused, and instructs it not to issue any corresponding warnings:

int foo (long _  _ attribute_  _ ((unused)) value) { /* ... */ }

This is useful if you're compiling with -W or -Wunused, and you want to catch unused function parameters, but you occasionally have functions that must match a predetermined signature (as is common in event-driven GUI programming or signal handlers).

Packing a Structure

The packed attribute tells the compiler that a type or variable should be packed into memory using the minimum amount of space possible, potentially disregarding alignment requirements. If specified on a struct or union, all variables therein are so packed. If specified on just one variable, only that specific object is packed.

The following packs all variables within the structure into the minimum amount of space:

struct _  _attribute_  _ ((packed)) foo { ... };

As an example, a structure containing a char followed by an int would most likely find the integer aligned to a memory address not immediately following the char, but, say, three bytes later. The compiler aligns the variables by inserting bytes of unused padding between them. A packed structure lacks this padding, potentially consuming less memory, but failing to meet architectural alignment requirements.

Increasing the Alignment of a Variable

As well as allowing packing of variables, GCC also allows programmers to specify an alternative minimum alignment for a given variable. GCC will then align the specified variable to at least this value, as opposed to the minimum required alignment dictated by the architecture and ABI. For example, this statement declares an integer named beard_length with a minimum alignment of 32 bytes (as opposed to the typical alignment of 4 bytes on machines with 32-bit integers):

int beard_length _  _attribute_  _ ((aligned (32))) = 0;

Forcing the alignment of a type is generally useful only when dealing with hardware that may impose greater alignment requirements than the architecture itself, or when you are hand-mixing C and assembly code, and you want to use instructions that require specially aligned values. One example where this alignment functionality is utilized is for storing oft-used variables on processor cache lines to optimize cache behavior. The Linux kernel makes use of this technique.

As an alternative to specifying a certain minimum alignment, you can ask that GCC align a given type to the largest minimum alignment that is ever used for any data type. For example, this instructs GCC to align parrot_height to the largest alignment it ever uses, which is probably the alignment of a double:

short parrot_height _  _attribute_  _ ((aligned)) = 5;

This decision generally involves a space/time tradeoff: variables aligned in this manner consume more space, but copying to or from them (along with other complex manipulations) may be faster because the compiler can issue machine instructions that deal with the largest amount of memory.

Various aspects of the architecture or the system's tool chain may impose maximum limits on a variable's alignment. For example, on some Linux architectures, the linker is unable to recognize alignments beyond a rather small default. In that case, an alignment provided using this keyword is rounded down to the smallest allowed alignment. For example, if you request an alignment of 32, but the system's linker is unable to align to more than 8 bytes, the variable will be aligned along an 8 byte boundary.

Placing Global Variables in a Register

GCC allows programmers to place global variables in a specific machine register, where the variables will then reside for the duration of the program's execution. GCC calls such variables global register variables.

The syntax requires that the programmer specify the machine register. The following example uses ebx:

register int *foo asm ("ebx");

The programmer must select a variable that is not function-clobbered: that is, the selected variable must be usable by local functions, saved and restored on function call invocation, and not specified for any special purpose by the architecture or operating system's ABI. The compiler will generate a warning if the selected register is inappropriate. If the register is appropriate—ebx, used in this example, is fine for the x86 architecture—the compiler will in turn stop using the register itself.

Such an optimization can provide huge performance boosts if the variable is frequently used. A good example is with a virtual machine. Placing the variable that holds, say, the virtual stack frame pointer in a register might lead to substantial gains. On the other hand, if the architecture is starved of registers to begin with (as the x86 architecture is), this optimization makes little sense.

Global register variables cannot be used in signal handlers, or by more than one thread of execution. They also cannot have initial values because there is no mechanism for executable files to supply default contents for registers. Global register variable declarations should precede any function definitions.

Branch Annotation

GCC allows programmers to annotate the expected value of an expression—for example, to tell the compiler whether a conditional statement is likely to be true or false. GCC, in turn, can then perform block reordering, and other optimizations to improve the performance of conditional branches.

The GCC syntax for branch notation is horrendously ugly. To make branch annotation easier on the eyes, we use preprocessor macros:

#define likely(x)    _  _builtin_expect (!!(x), 1)
#define unlikely(x)  _  _builtin_expect (!!(x), 0)

Programmers can mark an expression as likely or unlikely true by wrapping it in likely( ) or unlikely( ), respectively.

The following example marks a branch as unlikely true (that is, likely to be false):

int ret;

ret = close (fd);
if (unlikely (ret))
        perror ("close");

Conversely, the following example marks a branch as likely true:

const char *home;

home = getenv ("HOME");
if (likely (home))
        printf ("Your home directory is %s\n", home);
else
        fprintf (stderr, "Environment variable HOME not set!\n");

As with inline functions, programmers have a tendency to overuse branch annotation. Once you start anointing expressions, you might be tempted to mark all expressions. Be careful, though—you should mark branches as likely or unlikely only if you know a priori and with little doubt that the expressions will be true or false nearly all of the time (say, with 99 percent certainty). Seldom-occurring errors are good candidates for unlikely( ). Bear in mind, however, that a false prediction is worse than no prediction at all.

Getting the Type of an Expression

GCC provides the typeof( ) keyword to obtain the type of a given expression. Semantically, the keyword operates the same as sizeof( ). For example, this expression returns the type of whatever x points at:

typeof (*x)

We can use this to declare an array, y, of those types:

typeof (*x) y[42];

A popular use for typeof( ) is to write "safe" macros, which can operate on any arithmetic value, and evaluate its parameters only once:

#define max(a,b) ({          \
        typeof (a) _a = (a); \
        typeof (b) _b = (b); \
       _a > _b ? _a : _b; \
})

Getting the Alignment of a Type

GCC provides the keyword _ _alignof_ _ to obtain the alignment of a given object. The value is architecture- and ABI-specific. If the current architecture does not have a required alignment, the keyword returns the ABI's recommended alignment. Otherwise, the keyword returns the minimum required alignment.

The syntax is identical to sizeof( ):

_  _alignof_  _(int)

Depending on the architecture, this probably returns 4, as 32-bit integers are generally aligned along 4 byte boundaries.

The keyword works on lvalues, too. In that case, the returned alignment is the minimum alignment of the backing type, not the actual alignment of the specific lvalue. If the minimum alignment was changed via the aligned attribute (described earlier, in "the section called “Increasing the Alignment of a Variable”"), that change is reflected by _ _alignof_ _.

For example, consider this structure:

struct ship {
        int year_built;
        char canons;
        int mast_height;
};

along with this code snippet:

struct ship my_ship;

printf ("%d\n", _  _alignof_  _(my_ship.canons));

The _ _alignof_ _ in this snippet will return 1, even though structure padding probably results in canons consuming four bytes.

The Offset of a Member Within a Structure

GCC provides a built-in keyword for obtaining the offset of a member of a structure within that structure. The offsetof( ) macro, defined in <stddef.h>, is part of the ISO C standard. Most definitions are horrid, involving obscene pointer arithmetic and code unfit for minors. The GCC extension is simpler and potentially faster:

#define offsetof(type, member)  _  _builtin_offsetof (type, member)

A call returns the offset of member within type—that is, the number of bytes, starting from zero, from the beginning of the structure to that member. For example, consider the following structure:

struct rowboat {
        char *boat_name;
        unsigned int nr_oars;
        short length;
};

The actual offsets depend on the size of the variables, and the architecture's alignment requirements and padding behavior, but on a 32-bit machine, we might expect calling offsetof( ) on struct rowboat and boat_name, nr_oars, and length to return 0, 4, and 8, respectively.

On a Linux system, the offsetof( ) macro should be defined using the GCC keyword, and need not be redefined.

Obtaining the Return Address of a Function

GCC provides a keyword for obtaining the return address of the current function, or one of the callers of the current function:

void * _  _builtin_return_address (unsigned int level)

The parameter level specifies the function in the call chain whose address should be returned. A value of 0 asks for the return address of the current function, a value of 1 asks for the return address of the caller of the current function, a value of 2 asks for that function's caller's return address, and so on.

If the current function is an inline function, the address returned is that of the calling function. If this is unacceptable, use the noinline keyword (described earlier, in "the section called “Suppressing Inlining”") to force the compiler not to inline the function.

There are several uses for the _ _builtin_return_address keyword. One is for debugging or informational purposes. Another is to unwind a call chain, in order to implement introspection, a crash dump utility, a debugger, and so on.

Note that some architectures can return only the address of the invoking function. On such architectures, a nonzero parameter value can result in a random return value. Thus, any parameter other than 0 is nonportable, and should be used only for debugging purposes.

Case Ranges

GCC allows case statement labels to specify a range of values for a single block. The general syntax is as follows:

case low ... high:

For example:

switch (val) {
case 1 ... 10:
        /* ... */
        break;
case 11 ... 20:
        /* ... */
        break;
default:
        /* ... */
}

This functionality is quite useful for ASCII case ranges, too:

case 'A' ... 'Z':

Note that there should be a space before and after the ellipsis. Otherwise, the compiler can become confused, particularly with integer ranges. Always do the following:

case 4 ... 8:

and never this:

case 4...8:

Void and Function Pointer Arithmetic

In GCC, addition and subtraction operations are allowed on pointers of type void, and pointers to functions. Normally, ISO C does not allow arithmetic on such pointers because the size of a "void" is a silly concept, and is dependent on what the pointer is actually pointing to. To facilitate such arithmetic, GCC treats the size of the referential object as one byte. Thus, the following snippet advances a by one:

a++;        /* a is a void pointer */

The option -Wpointer-arith causes GCC to generate a warning when these extensions are used.

More Portable and More Beautiful in One Fell Swoop

Let's face it, the _ _attribute_ _ syntax is not pretty. Some of the extensions we've looked at in this chapter essentially require preprocessor macros to make their use palatable, but all of them can benefit from a sprucing up in appearance.

With a little preprocessor magic, this is not hard. Further, in the same action, we can make the GCC extensions portable, by defining them away in the case of a non-GCC compiler (whatever that is).

To do so, stick the following code snippet in a header, and include that header in your source files:

#if __GNUC_  _ >= 3
# undef  inline
# define inline         inline __attribute_  _ ((always_inline))
# define __noinline     __attribute_  _ ((noinline))
# define __pure         __attribute_  _ ((pure))
# define __const        __attribute_  _ ((const))
# define __noreturn     __attribute_  _ ((noreturn))
# define __malloc       __attribute_  _ ((malloc))
# define __must_check   __attribute_  _ ((warn_unused_result))
# define __deprecated   __attribute_  _ ((deprecated))
# define __used         __attribute_  _ ((used))
# define __unused       __attribute_  _ ((unused))
# define __packed       __attribute_  _ ((packed))
# define __align(x)     __attribute_  _ ((aligned (x)))
# define __align_max    __attribute_  _ ((aligned))
# define likely(x)      _  _builtin_expect (!!(x), 1)
# define unlikely(x)    _  _builtin_expect (!!(x), 0)
#else
# define _  _noinline     /* no noinline */
# define _  _pure         /* no pure */
# define _  _const        /* no const */
# define _  _noreturn     /* no noreturn */
# define _  _malloc       /* no malloc */
# define _  _must_check   /* no warn_unused_result */
# define _  _deprecated   /* no deprecated */
# define _  _used         /* no used */
# define _  _unused       /* no unused */
# define _  _packed       /* no packed */
# define _  _align(x)     /* no aligned */
# define _  _align_max    /* no align_max */
# define likely(x)      (x)
# define unlikely(x)    (x)
#endif

For example, the following marks a function as pure, using our shortcut:

_  _pure int foo (void) { /* ... */

If GCC is in use, the function is marked with the pure attribute. If GCC is not the compiler, the preprocessor replaces the _ _pure token with a no-op. Note that you can place multiple attributes on a given definition, and thus you can use more than one of these defines on a single definition with no problems.

Easier, prettier, and portable!



[45] * A memory alias occurs when two or more pointer variables point at the same memory address. This can happen in trivial cases where a pointer is assigned the value of another pointer, and also in more complex, less obvious cases. If a function is returning the address of newly allocated memory, no other pointers to that same address should exist.

If you enjoyed this excerpt, buy a copy of Linux System Programming