Attackers can often control the value of important environment variables, sometimes even remotely—for example, in CGI scripts, where invocation data is passed through environment variables.
You need to make sure that an attacker does not set environment variables to malicious values.
Many programs and libraries, including the shared library loader on both Unix and Windows systems, depend on environment variable settings. Because environment variables are inherited from the parent process when a program is executed, an attacker can easily sabotage variables, causing your program to behave in an unexpected and even insecure manner.
Typically, Unix systems are considerably more dependent on environment variables than are Windows systems. In fact, the only scenario common to both Unix and Windows is that there is an environment variable defining the path that the system should search to find an executable or shared library (although differently named variables are used on each platform). On Windows, one environment variable controls the search path for finding both executables and shared libraries. On Unix, these are controlled by separate environment variables. Generally, you should not specify a filename and then rely on these variables for determining the full path. Instead, you should always use absolute paths to known locations.[1]
Certain variables expected to be present in the environment can cause insecure program behavior if they are missing or improperly set. Make sure, therefore, that you never fully purge the environment and leave it empty. Instead, variables that should exist should be forced to sane values or, at the very least, treated as highly suspect and examined closely before they’re used. Remove any unknown variables from the environment altogether.
The standard C runtime library defines a global variable,[2]
environ
, as a
NULL
-terminated array of strings, where each
string in the array is of the form
“name=value”.
Most systems do not
declare the variable in any standard header file, Linux being the
notable exception, providing a declaration in
unistd.h. You can gain access to the variable by
including the following extern
statement in your
code:
extern char **environ;
Several functions defined in stdlib.h, such as
getenv( )
and putenv(
)
, provide access to environment variables,
and they all operate on this variable. You can therefore make changes
to the contents of the array or even build a new array and assign it
to the variable.
This variable also exists in the standard C runtime library on
Windows; however, the C runtime on Windows is not as tightly bound to
the operating system as it is on Unix. Directly manipulating the
environ
variable on Windows will not necessarily
produce the same effects as it will on Unix; in the majority of
Windows programs, the C runtime is never used at all, instead
favoring the Win32 API to perform the same functions as those
provided by the C runtime. Because of this, and because of
Windows’ lack of dependence on environment
variables, we do not recommend using the code in this recipe on
Windows. It simply does not apply. However, we do recommend that you
at least skim the textual content of this recipe so that
you’re aware of potential pitfalls that could affect
you on Windows.
On a Unix system, if you invoke the command printenv at a shell prompt, you’ll likely see a sizable list of environment variables as a result. Many of the environment variables you will see are set by whichever shell you’re using (i.e., bash or tcsh). You should never use nor trust any of the environment variables that are set by the shell. In addition, a malicious user may be able to set other environment variables.
In most cases, the information contained in the environment variables
set by the shell can be determined by much more reliable means. For
example, most shells set the HOME
environment
variable, which is intended to be the user’s home
directory. It’s much more reliable to call
getuid( )
to determine who the user is, and then call
getpwuid( )
to get the user’s password
file record, which will contain the user’s home
directory. For example:
#include <sys/types.h> #include <stdio.h> #include <string.h> #include <unistd.h> #include <pwd.h> int main(int argc, char *argv[ ]) { uid_t uid; struct passwd *pwd; uid = getuid( ); printf("User's UID is %d.\n", (int)uid); if (!(pwd = getpwuid(uid))) { printf("Unable to get user's password file record!\n"); endpwent( ); return 1; } printf("User's home directory is %s\n", pwd->pw_dir); endpwent( ); return 0; }
Warning
The code above is not thread-safe. Be sure multiple threads do not try to manipulate the password database at the same time.
In many cases, it is reasonably safe to throw away most of the
environment variables that your program will inherit from its parent
process, but you should make it a point to be aware of any
environment variables that will be used by code
you’re using, including the operating
system’s dynamic loader and the standard C runtime
library. In particular, dynamic loaders on
ELF-based Unix systems (among the Unix
variants we’re explicitly supporting in this book,
Darwin is the major exception here
because it does not use ELF (Executable and Linking Format) for its
executable format) and most standard implementations of
malloc( )
all recognize a wide variety of
environment variables that control their behavior.
In most cases, you should never be doing anything in your programs
that will make use of the
PATH
environment variable. Circumstances do
exist in which it may be reasonable to do so, but make sure to weigh
your options carefully beforehand. Indeed, you should consider
carefully whether you should be using any
environment variable in your programs. Regardless, if you launch
external programs from within your program, you may not have control
over what the external programs do, so you should take care to
provide any external programs you launch with a sane and secure
environment.
In particular, the two environment variables
IFS
and PATH
should
always be forced to sane values. The IFS
environment variable is somewhat obscure, but it is used by many
shells to determine which character separates
command-line arguments. Modern Unix
shells use a reasonable default value for IFS if it is not already
set. Nonetheless, you should defensively assume that the shell does
nothing of the sort. Therefore, instead of simply deleting the IFS
environment variable, set it to something sane, such as a space, tab,
and newline character.
The PATH
environment variable is used by the shell
and some of the exec*( )
family of standard C
functions to locate an executable if a path is not explicitly
specified. The search path should never include
relative
paths, especially the current directory as
denoted by a single period. To be safe, you should always force the
setting of the PATH
environment variable to
_PATH_STDPATH
, which is defined in
paths.h. This value is what the shell normally
uses to initialize the variable, but an attacker or naïve
user could change it later. The definition of
_PATH_STDPATH
differs from platform to platform,
so you should generally always use that value so that you get the
right standard paths for the system your program is running on.
Finally, the TZ
environment variable denotes the time
zone that the program should use, when relevant. Because users may
not be in the same time zone as the machine (which will use a default
whenever the variable is not set), it is a good idea to preserve this
variable, if present. Note also that this variable is generally used
by the OS, not the application. If you’re using it
at the application level, make sure to do proper input validation to
protect against problems such as buffer overflow.
Finally, a special environment variable,, is defined to be the time zone on many systems. All systems will use it if it is defined, but while most systems will get along fine without it, some systems will not function properly without its being set. Therefore, you should preserve it if it is present.
Any other environment variables that are defined should be removed unless you know, for some reason, that you need the variable to be set. For any environment variables you preserve, be sure to treat them as untrusted user input. You may be expecting them to be set to reasonable values—and in most cases, they probably will be—but never assume they are. If for some reason you’re writing CGI code in C, the list of environment variables passed from the web server to your program can be somewhat large, but these are largely trustworthy unless an attacker somehow manages to wedge another program between the web server and your program.
Of particular interest among environment variables commonly passed
from a web server to CGI scripts are any environment variables whose
names begin with HTTP_
and those listed in Table 1-1.
Table 1-1. Environment variables commonly passed from web servers to CGI scripts
Environment variable name |
Comments |
---|---|
|
If authentication was required to make the request, this contains the authentication type that was used, usually “BASIC”. |
|
The number of bytes of content, as specified by the client. |
|
The MIME type of the content sent by the client. |
|
The version of the CGI specification with which the server complies. |
|
Extra path information from the URL. |
|
Extra path information from the URL, translated by the server. |
|
The portion of the URL following the question mark. |
|
The IP address of the remote client in dotted decimal form. |
|
The host name of the remote client. |
|
If RFC1413 identification was used, this contains the user name that was retrieved from the remote identification server. |
|
If authentication was required to make the request, this contains the user name that was authenticated. |
|
The method used to make the current request, usually either “GET” or “POST”. |
|
The name of the script that is running, canonicalized to the root of
the web site’s document tree (e.g.,
|
|
The host name or IP address of the server. |
|
The port on which the server is running. |
|
The protocol used to make the request, typically “HTTP/1.0” or “HTTP/1.1”. |
|
The name and version of the server. |
The code presented in this section defines a function called
spc_sanitize_environment(
)
that will build a new environment with the
IFS
and PATH
environment
variables set to sane values, and with the TZ
environment variable preserved from the original environment if it is
present. You can also specify a list of environment variables to
preserve from the original in addition to the TZ
environment variable.
The first thing that spc_sanitize_environment( )
does is determine how much memory it will need to allocate to build
the new environment. If the memory it needs cannot be allocated, the
function will call abort(
)
to
terminate the program immediately. Otherwise, it will then build the
new environment and replace the old environ
pointer with a pointer to the newly allocated one. Note that the
memory is allocated in one chunk rather than in smaller pieces for
the individual strings. While this is not strictly necessary (and it
does not provide any specific security benefit),
it’s faster and places less strain on memory
allocation. Note, however, that you should be performing this
operation early in your program, so heap fragmentation
shouldn’t be much of an issue.
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <paths.h> extern char **environ; /* These arrays are both NULL-terminated. */ static char *spc_restricted_environ[ ] = { "IFS= \t\n", "PATH=" _PATH_STDPATH, 0 }; static char *spc_preserve_environ[ ] = { "TZ", 0 }; void spc_sanitize_environment(int preservec, char **preservev) { int i; char **new_environ, *ptr, *value, *var; size_t arr_size = 1, arr_ptr = 0, len, new_size = 0; for (i = 0; (var = spc_restricted_environ[i]) != 0; i++) { new_size += strlen(var) + 1; arr_size++; } for (i = 0; (var = spc_preserve_environ[i]) != 0; i++) { if (!(value = getenv(var))) continue; new_size += strlen(var) + strlen(value) + 2; /* include the '=' */ arr_size++; } if (preservec && preservev) { for (i = 0; i < preservec && (var = preservev[i]) != 0; i++) { if (!(value = getenv(var))) continue; new_size += strlen(var) + strlen(value) + 2; /* include the '=' */ arr_size++; } } new_size += (arr_size * sizeof(char *)); if (!(new_environ = (char **)malloc(new_size))) abort( ); new_environ[arr_size - 1] = 0; ptr = (char *)new_environ + (arr_size * sizeof(char *)); for (i = 0; (var = spc_restricted_environ[i]) != 0; i++) { new_environ[arr_ptr++] = ptr; len = strlen(var); memcpy(ptr, var, len + 1); ptr += len + 1; } for (i = 0; (var = spc_preserve_environ[i]) != 0; i++) { if (!(value = getenv(var))) continue; new_environ[arr_ptr++] = ptr; len = strlen(var); memcpy(ptr, var, len); *(ptr + len + 1) = '='; memcpy(ptr + len + 2, value, strlen(value) + 1); ptr += len + strlen(value) + 2; /* include the '=' */ } if (preservec && preservev) { for (i = 0; i < preservec && (var = preservev[i]) != 0; i++) { if (!(value = getenv(var))) continue; new_environ[arr_ptr++] = ptr; len = strlen(var); memcpy(ptr, var, len); *(ptr + len + 1) = '='; memcpy(ptr + len + 2, value, strlen(value) + 1); ptr += len + strlen(value) + 2; /* include the '=' */ } } environ = new_environ; }
[1] Note that the shared library environment variable can be relatively benign on modern Unix-based operating systems, because the environment variable will get ignored when a program that can change permissions (i.e., a setuid program) is invoked. Nonetheless, it is better to be safe than sorry!
[2] The use of the term “variable” can quickly become confusing because C defines variables and the environment defines variables. In this recipe, when we are referring to a C variable, we simply say “variable,” and when we are referring to an environment variable, we say “environment variable.”
Get Secure Programming Cookbook for C and C++ now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.