The C preprocessor

Before the compiler even sees your code, it first passes through the preprocessor. The preprocessor performs a set of simple text substitutions based on directives. Each directive goes on its own line and consists of a “#” symbol followed by a word and then the arguments. Probably the most common preprocessor directive is one we have already seen: #include. The #include directive takes one argument, namely a file name, and pastes the These directives make up a simple programming language for manipulating C source code.

Instead of trying to describe all of the details of the C preprocessor myself, I refer you to Wikipedia. What I will relate here are some particularly useful applications of the preprocessor along with some warnings. Please read the Wikipedia page on the C preprocessor before continuing.

The include directive

There are a number of different things you declare in C that aren’t, strictly speaking, code. This includes structure definitions and function prototypes as well as a few other things we have yet to discuss. The problem is that, in order to use these things, you need to have them declared in your source file. One option would be to simply copy and paste this information at the top of every source file where you plan to use that structure or function. However, copying and pasting is very error-prone and if you ever change any of those structures or functions prototypes, you will have to change it in every single file.

This is where the #include directive is used. Most C projects have several files called header files that usually end in “.h”. These files contain all of the structure definitions, function prototypes, and other things that that are need in multiple files. These header files are then included at the top of the C files by using #include. This way, if you ever have to change any of these prototypes, you only have to change them in the header file and nowhere else.

In general, you should not put actual code in header files. If you put the code for a function in a header file, then each C file will have its own copy of the function and the linker will refuse to link the object files.

Using the define directive for constants

The #define directive creates macros that allow the preprocessor to effectively do a search-and-replace operation on your code. This can be very useful if your code has certain constants that you may, at some point, want to change. For example, suppose you had an algorithm that worked on matrices by cutting them into 16x16 blocks. Instead of putting the number 16 all over your code, you could put the following at the top of your file:

#define MATRIX_BLOCK_SIZE 16

Then, wherever you need the block size, you write MATRIX_BLOCK_SIZE. In the preprocessing stage, the compiler will replace every occurrence of MATRIX_BLOC_SIZE with the number 16. If you ever want to change the block size, you simply have to change the one line. This allows us to avoid magic numbers as they are often called in programming circles.

Because expanding macros is simply a search-and-replace operation, a macro can technically contain anything. However, because it is just search-and-replace, the preprocessor knows nothing about the variables etc. so the macro may expand to something that makes no sense in the context. For example, having a macro expand to 2n+1 is probably a bad idea because n may mean different things in different contexts.

Using the define directive for macros

In order to avoid the problem that macros are unaware of context, they are allowed to take arguments. For example, one macro that I have used from time to time is the following:

#define MAX(X,Y) (((X) > (Y)) ? (X) : (y))

Then, if I want the maximum of two numbers, I can just type MAX(a,b) instead of having to write an if statement every time.

The above macro is worth dissecting a bit more. First, what are the ? and : doing in there? This is what is called the conditional operator. The conditional operator is a ternary operator that takes a condition and two values. If the condition is true, then the expression takes on the first value; otherwise, the expression takes on the second value. The syntax for the conditional operator is <condition> ? <true_val> : <false_val>.

The other thing to notice about the above macro is the abundance of parentheses. Again, this goes back to the fact that all the C preprocessor does is search-and-replace. When writing the macro, you don’t know if the arguments will be a single variable or number or if they will be expressions or function calls. Therefore, you usually put parentheses around them to make sure that the operator precedence order doesn’t cause problems with the expanded macro. For the same reason, the entire expression of the macro is, again, put in parentheses to ensure proper order of operations.

Another example of a potentially useful macro would be if you had a matrix structure and wanted to easily get elements out of it:

struct matrix {
    double *data;
    size_t width;
    size_t height;
};

#define MAT_ELEM(MAT, R, C) (MAT)->data[(R) * (mat)->width + (C)]

One final note about macros: be careful about putting function calls into macro arguments. When the macro gets expanded, the expanded version will contain the macro’s arguments exactly as they were given, function calls and all. This means that, in the case of or MAX macro, if a function call is one of its arguments, the function may get called twice because it shows up in the macro expansion twice. If you want to use the return value of the function call as a macro argument, it is usually better to save it off in a variable and use that in the macro.

Conditional compiling

Another major use of the preprocessor is to conditionally add or remove pieces of code. One example of where this is useful is for debugging. Sometimes, when trying to get all the bugs out of a program, it is useful to have it print out extra information or run extra checks. Then, when it comes time to let it run for a few hours or days on a computation, you don’t want all that extra code slowing it down. In this case you could use the preprocessor to remove the debugging code whenever the DEBUG macro is not set. For example:

#ifdef DEBUG
    /* My debug code goes here */
#endif

In this case, if the DEBUG macro exists, the debug code will get compiled, otherwise it will get ignored.