Because header files are simply copied and pasted wherever the
#include
statement is located, there are a lot of things
that can go wrong. The problems only get worse when you realize that
most header files include other header files which then, in turn,
include other header files and so on. What follows is a brief discussion
of some frequent problems with header files and best-practice ways to
get rid of those problems.
One of the most important parts of any programming project is keeping
things well-organized. You should work, as much as you can, to
compartmentalize and organize your code. If you have a data structure,
you should have that data structure and all the helper functions
required to work with it in one place. For example, if you have a data
structure that represents a matrix, you have matrix.h
and
matrix.c
files that contain everything needed to work with
your matrix structure. As much as possible, nothing should directly
access your matrix structure except for the functions or macros in
matrix.h
and matrix.c
. If you do a good job
organizing your code, you can make fairly clean breaks between what
belongs in one file verses another.
It is usually also a good idea to limit what you put in your header files. Not every function needs to be accessible to all of your program. For example, suppose you are implementing a recursive algorithm. You might have one function for the recursive step and a second wrapper function that handles the base case and calls the recursive function. Most of the time, there is no reason to have a prototype for the recursive function in your header file. The only one of those two functions that the rest of your code has any business interacting with is the wrapper function.
The header files you include (such as stdio.h
) will
contain things like structure definitions, function prototypes, and
possibly even preprocessor macros. In your own header files, you will
also have these things. A name collision is when you
have two things with the same name. In this case the compiler will quit
with an error because there is no way in C to tell two things with the
same name apart. There are a few conventions you can follow that will
help with this problem.
Another thing you can do is to use a namespace. Some
languages provide an explicit concept of a namespace; C does not. The
common way to do namespacing in C is to pick a prefix and then put
prefix_
at the beginning of every function or structure
name to indicate that it belongs to your project and not something else.
If a function is associated with a particular data structure, it is also
common to use the name of that data structure as a prefix for the
function.
Another convention is to name macros in UPPER CASE and name functions and structures using lower case (with “_” instead of spaces) or CamelCase.
Another issue that occurs frequently with header files is a header
file accidentally getting included twice. While function prototypes can
be repeated without causing harm, structure declarations and macros
cannot. For this reason, you generally do not want a header file to be
included more than once. How can this happen? Usually, you don’t have
two #include <stdio.h>
lines in one source file;
however, if you have header files that include other header files, it’s
very easy to see how a header file could get double-included. Suppose
you have two header files, a.h
and b.h
and
suppose they both include c.h
. If you have a source file
that includes both a.h
and b.h
,
c.h
will get included twice.
There are few different ways to solve this problem. One obvious
solution would to simply be careful with your graph of included files to
make sure no double-includes happen. However, this is horribly
impractical and no one does it in practice. Another solution is that
some compilers support an extra #paragma once
preprocessor
directive that tells the preprocessor to include the file at most once.
The problem with this solution is that not all compilers implement it
and those that do don’t always implement it the same way.
The most common solution to this problem is to use include
guards. Include guards are a simple preprocessor trick that
takes advantage of conditional
compilation. As an example, I have a project called “tetris” with a
header file named pieces.h
. The header file (with the
contents removed) looks like this:
:::c
/* pieces.h: Functions and definitions for Tetris pieces */
#ifndef __TETRIS_PIECES_H__
#define __TETRIS_PIECES_H__
/* Structure definitions, macros, and function prototypes go here */
#endif /* ! defined __TETRIS_PIECES_H__ */
Look at what the preprocessor does when it includes this file. All of
the actual content (not including comments) of the header file is inside
of a #ifdef
/#endif
block. If this is the first
time that the pieces.h
has been included in this C file,
the __TETRIS_PIECES_H__
macro will not be defined. In this
case #ifndef
will evaluate to true, and the code between
the #ifndef
and the #endif
will be used. The
first line after the #ifndef
immediately defines the
__TETRIS_PIECES_H__
macro. If the header file is ever
included again, the #ifndef
statement will evaluate to
false and the preprocessor will discard everything between the
#ifndef
and the #endif
. This way, even though
it technically gets included twice, it doesn’t matter because the body
of the header file gets discarded for all but the first time it is
included.
Notice that the macro used in the include guard above was fairly specific. It is extremely important to make sure that include guard macros are thoroughly namespaced. Normally, if you have a collision, the compiler simply tells you about it. If you have a collision in your include guard macro, the contents of the header file will get silently discarded. This won’t get caught until something fails to build because it was missing a function prototype or something. Figuring out that it was an include guard collision can be very difficult.
In general, it is probably a good idea to use include guards on every header file in your project. They’re only 3 lines so they don’t take up much space or time and are thoroughly worth it in the long term.
While include guards help solve the multiple include problem, they
don’t fix all header file include order woes. Why would order matter?
Let’s say that you’re writing a program to help run a chicken farm. You
want to store some information about chickens, so you write a structure
called chicken
. One thing we’d like to store about a
chicken is what all eggs it has laid, so the chicken structure contains
an array of eggs:
:::c
struct chicken {
struct egg *eggs;
/* Other fields */
};
OK, so we have chickens now. However, the chicken
structure refers to a structure called egg
that we haven’t
defined yet. That’s not a problem, we just have to define it. Because we
want to keep very good track of our eggs, we want to record which
chicken laid the egg. Let’s say the egg
structure looks
like this:
:::c
struct egg {
struct chicken *mother;
/* Other fields */
};
Now we have a problem. Which structure do we define first,
chicken
or egg
? We can’t define
egg
until we know that there is a data type
struct chicken
and we can’t define chicken
until we have defined struct egg
. To solve this problem, we
can use a predeclaration. A predeclaration for a
structure is similar to a function prototype in that it tells us that
the structure exists without telling us what’s inside. If we want to
define our chicken
and egg
structures, we
could do so as follows:
:::c
struct egg;
struct chicken {
struct egg *eggs;
/* Other fields */
};
struct egg {
struct chicken *mother;
/* Other fields */
};
The first line in the above is the predeclaration that tells C that
we will eventually define a structure named egg
. Once you
have a predeclaration, you can create pointers to values of type
struct egg
but you can’t crack it open and look inside. In
other words, you can’t dereference it and look at fields and you can’t
create one on the stack. This is because you have no idea from just the
predeclaration what fields it has, what types they may be, or even what
size a struct egg
should be. All the predeclaration is good
for is telling the compiler that it exists so that you can create and
pass around pointers to them.
Things get even more interesting if we put them in different header
files. For instance, say we define struct egg
in
egg.h
but struct chicken
is defined in
chicken.h
. It is temping to have egg.h
include
chicken.h
and vice-versa, but that doesn’t really solve our
problem. If we don’t have include guards, the circular dependency will
cause an infinite include loop and the preprocessor will fail. If we do
have include guards, they will each only get included once, but the
order in which they get included will change based on whether
chicken.h
or egg.h
is included first.
To solve this, we can simply put a predeclaration for
struct chicken
at the top of egg.h
and
vice-versa. This way, no matter which gets included first, both
structures have enough information to be defined. Unlike the actual
structure definition, you can duplicate predeclarations as much as you
want without trouble.
Using predeclarations also allows you to avoid having header files include too many other header files. While this shouldn’t be a huge issue, it can sometimes be a nice simplification. However, you should be warned that, as soon as you start using a lot of predeclarations, you are going to run into the copy-and-paste problem we discussed in an earlier section. If you ever change the name of a structure, you have to go update all of the predeclarations.