Header file best practices

Because header files are simply copied and pasted wherever the #include statement is located, there are a lot of things that can go wrong. The problems only get worse when you realize that most header files include other header files which then, in turn, include other header files and so on. What follows is a brief discussion of some frequent problems with header files and best-practice ways to get rid of those problems.

Good organization and conservative header file usage

One of the most important parts of any programming project is keeping things well-organized. You should work, as much as you can, to compartmentalize and organize your code. If you have a data structure, you should have that data structure and all the helper functions required to work with it in one place. For example, if you have a data structure that represents a matrix, you have matrix.h and matrix.c files that contain everything needed to work with your matrix structure. As much as possible, nothing should directly access your matrix structure except for the functions or macros in matrix.h and matrix.c. If you do a good job organizing your code, you can make fairly clean breaks between what belongs in one file verses another.

It is usually also a good idea to limit what you put in your header files. Not every function needs to be accessible to all of your program. For example, suppose you are implementing a recursive algorithm. You might have one function for the recursive step and a second wrapper function that handles the base case and calls the recursive function. Most of the time, there is no reason to have a prototype for the recursive function in your header file. The only one of those two functions that the rest of your code has any business interacting with is the wrapper function.

Namespacing and naming conventions

The header files you include (such as stdio.h) will contain things like structure definitions, function prototypes, and possibly even preprocessor macros. In your own header files, you will also have these things. A name collision is when you have two things with the same name. In this case the compiler will quit with an error because there is no way in C to tell two things with the same name apart. There are a few conventions you can follow that will help with this problem.

Another thing you can do is to use a namespace. Some languages provide an explicit concept of a namespace; C does not. The common way to do namespacing in C is to pick a prefix and then put prefix_ at the beginning of every function or structure name to indicate that it belongs to your project and not something else. If a function is associated with a particular data structure, it is also common to use the name of that data structure as a prefix for the function.

Another convention is to name macros in UPPER CASE and name functions and structures using lower case (with “_” instead of spaces) or CamelCase.

Double includes

Another issue that occurs frequently with header files is a header file accidentally getting included twice. While function prototypes can be repeated without causing harm, structure declarations and macros cannot. For this reason, you generally do not want a header file to be included more than once. How can this happen? Usually, you don’t have two #include <stdio.h> lines in one source file; however, if you have header files that include other header files, it’s very easy to see how a header file could get double-included. Suppose you have two header files, a.h and b.h and suppose they both include c.h. If you have a source file that includes both a.h and b.h, c.h will get included twice.

There are few different ways to solve this problem. One obvious solution would to simply be careful with your graph of included files to make sure no double-includes happen. However, this is horribly impractical and no one does it in practice. Another solution is that some compilers support an extra #paragma once preprocessor directive that tells the preprocessor to include the file at most once. The problem with this solution is that not all compilers implement it and those that do don’t always implement it the same way.

The most common solution to this problem is to use include guards. Include guards are a simple preprocessor trick that takes advantage of conditional compilation. As an example, I have a project called “tetris” with a header file named pieces.h. The header file (with the contents removed) looks like this:

:::c
/* pieces.h: Functions and definitions for Tetris pieces */
#ifndef __TETRIS_PIECES_H__
#define __TETRIS_PIECES_H__

/* Structure definitions, macros, and function prototypes go here */

#endif /* ! defined __TETRIS_PIECES_H__ */

Look at what the preprocessor does when it includes this file. All of the actual content (not including comments) of the header file is inside of a #ifdef/#endif block. If this is the first time that the pieces.h has been included in this C file, the __TETRIS_PIECES_H__ macro will not be defined. In this case #ifndef will evaluate to true, and the code between the #ifndef and the #endif will be used. The first line after the #ifndef immediately defines the __TETRIS_PIECES_H__ macro. If the header file is ever included again, the #ifndef statement will evaluate to false and the preprocessor will discard everything between the #ifndef and the #endif. This way, even though it technically gets included twice, it doesn’t matter because the body of the header file gets discarded for all but the first time it is included.

Notice that the macro used in the include guard above was fairly specific. It is extremely important to make sure that include guard macros are thoroughly namespaced. Normally, if you have a collision, the compiler simply tells you about it. If you have a collision in your include guard macro, the contents of the header file will get silently discarded. This won’t get caught until something fails to build because it was missing a function prototype or something. Figuring out that it was an include guard collision can be very difficult.

In general, it is probably a good idea to use include guards on every header file in your project. They’re only 3 lines so they don’t take up much space or time and are thoroughly worth it in the long term.

Predeclarations

While include guards help solve the multiple include problem, they don’t fix all header file include order woes. Why would order matter? Let’s say that you’re writing a program to help run a chicken farm. You want to store some information about chickens, so you write a structure called chicken. One thing we’d like to store about a chicken is what all eggs it has laid, so the chicken structure contains an array of eggs:

:::c
struct chicken {
    struct egg *eggs;

    /* Other fields */
};

OK, so we have chickens now. However, the chicken structure refers to a structure called egg that we haven’t defined yet. That’s not a problem, we just have to define it. Because we want to keep very good track of our eggs, we want to record which chicken laid the egg. Let’s say the egg structure looks like this:

:::c
struct egg {
    struct chicken *mother;

    /* Other fields */
};

Now we have a problem. Which structure do we define first, chicken or egg? We can’t define egg until we know that there is a data type struct chicken and we can’t define chicken until we have defined struct egg. To solve this problem, we can use a predeclaration. A predeclaration for a structure is similar to a function prototype in that it tells us that the structure exists without telling us what’s inside. If we want to define our chicken and egg structures, we could do so as follows:

:::c
struct egg;

struct chicken {
    struct egg *eggs;

    /* Other fields */
};

struct egg {
    struct chicken *mother;

    /* Other fields */
};

The first line in the above is the predeclaration that tells C that we will eventually define a structure named egg. Once you have a predeclaration, you can create pointers to values of type struct egg but you can’t crack it open and look inside. In other words, you can’t dereference it and look at fields and you can’t create one on the stack. This is because you have no idea from just the predeclaration what fields it has, what types they may be, or even what size a struct egg should be. All the predeclaration is good for is telling the compiler that it exists so that you can create and pass around pointers to them.

Things get even more interesting if we put them in different header files. For instance, say we define struct egg in egg.h but struct chicken is defined in chicken.h. It is temping to have egg.h include chicken.h and vice-versa, but that doesn’t really solve our problem. If we don’t have include guards, the circular dependency will cause an infinite include loop and the preprocessor will fail. If we do have include guards, they will each only get included once, but the order in which they get included will change based on whether chicken.h or egg.h is included first.

To solve this, we can simply put a predeclaration for struct chicken at the top of egg.h and vice-versa. This way, no matter which gets included first, both structures have enough information to be defined. Unlike the actual structure definition, you can duplicate predeclarations as much as you want without trouble.

Using predeclarations also allows you to avoid having header files include too many other header files. While this shouldn’t be a huge issue, it can sometimes be a nice simplification. However, you should be warned that, as soon as you start using a lot of predeclarations, you are going to run into the copy-and-paste problem we discussed in an earlier section. If you ever change the name of a structure, you have to go update all of the predeclarations.