Error handling and memory management

Whenever you write code that does more than direct calculations, you will have to handle error cases. This includes anything that interacts with a user, parses command-line arguments, reads from a file, or even allocates dynamic memory. Consider the following example:

/* add.c: Adds two numbers and prints the result to standard output */

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    int i,j;

    i = atoi(argv[1]);
    j = atoi(argv[1]);

    printf("%d\n", i + j);

    return 0;
}

This looks very simple, but there are an amazing amount of things that can go wrong. What happens if the user doesn’t give you two arguments? What happens if they give you more than two? What if one of the arguments isn’t a number? Each of these cases counts as an exceptional case. When dealing with anything from outside your program, you have to assume that anything can happen and handle all of those cases. A more better version of the above example would look something like this:

/* sum.c: An improved version of add.c */

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    int i;
    long int sum;
    char *endptr;

    if (argc <= 2) {
        fprintf(stderr, "At least two arguments are required for a sum, ");
        fprintf(stderr, "%d given.\n", argc - 1);
        return 3;
    }

    sum = 0;
    for (i = 1; i < argc; i++) {
        endptr = argv[i];
        sum += strtol(argv[i], &endptr, 10);
        if (endptr == argv[i]) {
            /* this will happen if argv[i] is not an integer */
            fprintf(stderr, "Invalid Argument: %s\n", argv[i]);
            return 3;
        }
    }

    printf("%ld\n", sum);

    return 0;
}

As you can see, we check two basic things. The first is a simple check on the number of arguments. Then, as we loop through the arguments we check to make sure that each argument given is an actual integer. If we get an error in either of these cases, we simply print an error message to stderr and exit with a non-zero exit status.

The proper response to an error

The first thing to consider about any exceptional case is: What is an appropriate response? For many errors, particularly in command-line software, the best response is simply to give the user an error message and quit. These are referred to as fatal errors. For example, if malloc returns NULL, the system has run out of memory and there is probably nothing you can do. There are very few times that you need to “nicely” handle an out-of-memory error. On the other hand, suppose your program was processing a series of files one at a time. What should you do if one of the files fails to open or if there is an error processing it? In this case, you may want to print an error letting the user know that one of the files failed and then continue to process the file. It’s also possible that the best response is simply to print and error and die, canceling the entire operation.

The proper response to an error depends on the type of the error, the type of program, and how the user interacts with it. Programs that have graphical interfaces should almost never have fatal errors. This is because users would much rather see a nice little error box than to have the program crash. On the other hand, in a command-line program that is not meant to be interactive, fatal errors are much more acceptable. Suppose the only way to pass a filename to your program is via a command line argument. Then, if the file fails to open, the best thing to do is probably to simply tell the user and quit.

Errors that need to be handled gracefully are a lot more difficult to deal with than errors that should simply kill the program. This is because, unlike in our simple example above, errors can occur deep inside the program far away from the main function. In this case, each function has to detect the error and either do something with it or clean up and tell the function that called it that there was an error.

Propagating errors

Whenever you have an error that doesn’t just kill the program, you have to have some way of propagating it up the call stack. For example, suppose you have a function that opens a file. If the file fails to open, say, because it doesn’t exist, then you may want to tell the user and allow them to pick a different one. The problem is that the function that actually opens the file may be several function calls away from the point where the user is interacting with the program. Instead of handling the file open error and interacting with the user in a function that’s primarily I/O focused, it makes more sense to propagate the error back to the function interacting with the user. This way the program can ask the user for a new file and start the process over again.

People have come up with a number of different solutions to the error handling problem. Most object-oriented languages such as Java, C++, and C# have a concept of an exception that corresponds to an exceptional condition in the code and the language takes care of propagating the error up until the point where it is “caught”. The C language does not have an explicit exception concept. Instead, you have to manually propagate errors. While this can look like a lot of work, it isn’t usually too bad.

With any error, there are usually two pieces of information: that an error occurred, and what happened. Sometimes these are combined into a single error flag that has a value that indicates there is no error. The exact way that errors are handled will vary from library to library or program to program. In some of the C standard library, and in most system calls, errors are handled with a combination of return values and errno. Usually, this is done by having an integer return type and returning zero or a positive number in the standard case and a negative number (usually -1) if there is an error. In the case of an error, many functions will also set the errno variable (defined in errno.h) to indicate the nature of the error.

For example, in our array list example from before, we could change the data_array_list_add_array function to be more error-safe as follows:

int
data_array_list_add_array(struct data_array_list *list,
        struct data_struct *data, size_t num_elements)
{
    size_t new_alloc, new_size, i;
    struct data_struct *new_data;

    /* See if the list is big enough */
    if (list->alloc < list->size + num_elements) {

        new_alloc = list->alloc * 2;

        if (new_alloc < list->size + num_elements) {
            new_alloc = list->size + num_elements;
        }

        new_data = malloc(new_alloc * sizeof(*list->data));
        if (new_data == NULL) {
            errno = ENOMEM;
            return -1;
        }
        
        /* Copy the old data over to the new list */
        for (i = 0; i < list->size; ++i) {
            new_data[i] = list->data[i];
        }

        free(list->data);
        list->data = new_data;
    }

    /* Copy the data in */
    for (i = 0; i < num_elements; ++i) {
        list->data[i + list->size] = data[i];
    }
    list->size += num_elements;

    return 0;
}

This way, if your system runs out of memory, it will return an error value and set errno to ENOMEM instead of simply crashing. There are other ways of doing error propagation than return values and errno. Another error handling mechanism I have seen uses an extra pass-by-reference argument for the error as follows:

void my_function(int arg1, int arg2, int *error)
{
    /* do something */

    if (/* error condition */) {
        if (error)
            error = -1;
        return;
    }
}

There are many different mechanisms for error propagation. I’m not going to try and describe them all here. You will see many different ways of doing error handling as you gain more programming experience. The point here is to make you aware of them and to get you to think about error handling and propagation.

Printing error messages and killing the program

There are a lot of cases where there is no good response to an error other than to simply quit the program. In this case, we call the error fatal. For example, If your system runs out of memory, there would likely be nothing your program could do. In this case, you might simply print an error to the user and quit the program as follows:

success = data_array_list_add_array(&list, data, num_elements);
if (success < 0) {
    perror("function_name");
    abort();
}

There are a couple of new function calls there: perror and abort. The perror function prints an error message to the screen corresponding to the current value of errno. This way you can tell the user what happened without knowing all of the possible errno values. The second function, abort, makes the program quit with a non-zero exit status. This way you don’t have to bother propagating the error up to the main function. The exit function is also good for this purpose. From a command-line perspective, there is no difference between the main function returning the value n and calling exit(n).

In any case, it is almost always better to print an error message and quit than to simply let the program crash. This way you know what error occurred and where. Otherwise, you are liable to get a cascade of errors that build on each other until the program simply crashes. When this happens, it is much harder to find the problem because the program may crash in a very different piece of code than where the error originated.