Before the compiler even sees your code, it first passes through the
preprocessor. The preprocessor performs a set of simple text
substitutions based on directives. Each directive goes
on its own line and consists of a “#” symbol followed by a word and then
the arguments. Probably the most common preprocessor directive is one we
have already seen: #include
. The #include
directive takes one argument, namely a file name, and pastes the These
directives make up a simple programming language for manipulating C
source code.
Instead of trying to describe all of the details of the C preprocessor myself, I refer you to Wikipedia. What I will relate here are some particularly useful applications of the preprocessor along with some warnings. Please read the Wikipedia page on the C preprocessor before continuing.
include
directiveThere are a number of different things you declare in C that aren’t, strictly speaking, code. This includes structure definitions and function prototypes as well as a few other things we have yet to discuss. The problem is that, in order to use these things, you need to have them declared in your source file. One option would be to simply copy and paste this information at the top of every source file where you plan to use that structure or function. However, copying and pasting is very error-prone and if you ever change any of those structures or functions prototypes, you will have to change it in every single file.
This is where the #include
directive is used. Most C
projects have several files called header files that
usually end in “.h”. These files contain all of the structure
definitions, function prototypes, and other things that that are need in
multiple files. These header files are then included at the top of the C
files by using #include
. This way, if you ever have to
change any of these prototypes, you only have to change them in the
header file and nowhere else.
In general, you should not put actual code in header files. If you put the code for a function in a header file, then each C file will have its own copy of the function and the linker will refuse to link the object files.
define
directive for constantsThe #define
directive creates macros
that allow the preprocessor to effectively do a search-and-replace
operation on your code. This can be very useful if your code has certain
constants that you may, at some point, want to change. For example,
suppose you had an algorithm that worked on matrices by cutting them
into 16x16 blocks. Instead of putting the number 16 all over your code,
you could put the following at the top of your file:
#define MATRIX_BLOCK_SIZE 16
Then, wherever you need the block size, you write
MATRIX_BLOCK_SIZE
. In the preprocessing stage, the compiler
will replace every occurrence of MATRIX_BLOC_SIZE
with the
number 16. If you ever want to change the block size, you simply have to
change the one line. This allows us to avoid magic
numbers as they are often called in programming circles.
Because expanding macros is simply a search-and-replace operation, a
macro can technically contain anything. However, because it is just
search-and-replace, the preprocessor knows nothing about the variables
etc. so the macro may expand to something that makes no sense in the
context. For example, having a macro expand to 2n+1
is
probably a bad idea because n
may mean different things in
different contexts.
define
directive for macrosIn order to avoid the problem that macros are unaware of context, they are allowed to take arguments. For example, one macro that I have used from time to time is the following:
#define MAX(X,Y) (((X) > (Y)) ? (X) : (y))
Then, if I want the maximum of two numbers, I can just type
MAX(a,b)
instead of having to write an if statement every
time.
The above macro is worth dissecting a bit more. First, what are the
?
and :
doing in there? This is what is called
the conditional operator. The conditional operator is a
ternary operator that takes a condition and two values. If the condition
is true, then the expression takes on the first value; otherwise, the
expression takes on the second value. The syntax for the conditional
operator is
<condition> ? <true_val> : <false_val>
.
The other thing to notice about the above macro is the abundance of parentheses. Again, this goes back to the fact that all the C preprocessor does is search-and-replace. When writing the macro, you don’t know if the arguments will be a single variable or number or if they will be expressions or function calls. Therefore, you usually put parentheses around them to make sure that the operator precedence order doesn’t cause problems with the expanded macro. For the same reason, the entire expression of the macro is, again, put in parentheses to ensure proper order of operations.
Another example of a potentially useful macro would be if you had a matrix structure and wanted to easily get elements out of it:
struct matrix {
double *data;
size_t width;
size_t height;
};
#define MAT_ELEM(MAT, R, C) (MAT)->data[(R) * (mat)->width + (C)]
One final note about macros: be careful about putting function calls
into macro arguments. When the macro gets expanded, the expanded version
will contain the macro’s arguments exactly as they were given, function
calls and all. This means that, in the case of or MAX
macro, if a function call is one of its arguments, the function may get
called twice because it shows up in the macro expansion twice. If you
want to use the return value of the function call as a macro argument,
it is usually better to save it off in a variable and use that in the
macro.
Another major use of the preprocessor is to conditionally add or
remove pieces of code. One example of where this is useful is for
debugging. Sometimes, when trying to get all the bugs out of a program,
it is useful to have it print out extra information or run extra checks.
Then, when it comes time to let it run for a few hours or days on a
computation, you don’t want all that extra code slowing it down. In this
case you could use the preprocessor to remove the debugging code
whenever the DEBUG
macro is not set. For example:
#ifdef DEBUG
/* My debug code goes here */
#endif
In this case, if the DEBUG
macro exists, the debug code
will get compiled, otherwise it will get ignored.