Functions

In this section, we will discuss functions, one of C’s most subtly powerful features. Functions allow the programmer to break their code into smaller, easier to manage pieces. Each of these pieces has its own variables and accomplishes a specific task. When all of these tasks are put together you get the entire program. From a mathematical perspective, it’s like breaking a big proof into smaller theorems or even lemmas.

There are two primary ways you can think of functions. Some functions are more computational in nature: they take some inputs and compute a value. Others are more procedural in nature: they perform some action. An example of a computational function would be the strlen function in string.h that takes in a C string and counts the number of characters in it. An example of a procedural function is the printf function that we have used several times to write values to the terminal. The main function in most programs also tends to be procedural in nature. Most functions you will encounter will be somewhere in between.

The best way to explain the semantics of functions is with an example. Suppose we wanted to compute a binomial coefficient. One way to do that would be with the following two functions:

/* Computes the value n! */
unsigned int factorial(unsigned int n)
{
    unsigned int i;
    unsigned int product;

    product = 1;
    for (i = 1; i <= n; ++i) {
        product *= i;
    }

    return product;
}

/* Computes n choose k */
unsigned int binomial(unsigned int n, unsigned int k)
{
    return factorial(n) / (factorial(k) * factorial(n - k));
}

These two functions are examples of functions that are computational in nature. The first computes a factorial and the second computes a binomial coefficient using the factorial function. From this simple example, we can see all of the basic parts of a C function.

Function Declarations

The first part of any function is a function declaration:

unsigned int binomial(unsigned int n, unsigned int k)

The function declaration consists of three parts: a return type, a name, and a comma separated list of arguments. The first part of the function declaration is the return type. Calls to functions are, themselves, expressions with the value of the expression being the value returned from the function. The return type signifies the type of the returned value and, consequentially, the type of the function call as an expression. The fact that function calls are valid expressions can be seen in the binomial function above.

The second part is the function name. The function name can be anything so long as it follows the same rules as for variables. Function names can contain letters, numbers, and the underscore (_) character in any order so long as it does not start with a number. You also need to take care to make sure that function names are unique; you cannot have two functions with the same name. This can take some care because most header files define several functions such as stdio.h defining the printf function.

The third part of a function declaration is a comma-separated list of arguments surrounded by parentheses. The arguments are variable declarations for variables that act as inputs to the function. Each argument is a full-fledged variable and you can even assign a new value to it later in the function. (This is not usually recommended as it can make for error-prone and hard-to-read code.) When the body of the function is executed, these variables are already set to the values passed into the function when it was called.

Function Block

After the function declaration is a block that contains the body of the function.

{
    unsigned int i;
    unsigned int product;

    product = 1;
    for (i = 2; i <= n; ++i) {
        product *= i;
    }

    return product;
}

The body of the functions is a block and is surrounded by braces. The first part of the function body is a series of variable declarations. Some compilers let you declare variables anywhere in the function so long as it is before they are used. However, this capability is part of the C99 standard which may not be supported on older compilers. In order to ensure that your program will compile on older compilers, you should put your variable declarations at the top of the function body.

After the variable declarations come the rest of the statements of your function. In the case of our factorial function, we have a for loop that loops through all the numbers between 2 and n and multiplies them together. The final statement is the return statement that specifies the return value of the function.

Return Statements

The return statement requires a little discussion of its own. A return statement is a statement of the form return <expression>; where <expression> is any expression. The expression is evaluated and its value becomes the function’s return value. The code calling this function then receives this value as the value of the expression that is the function call. In order to make this all a little more clear, look at the second of our two functions:

unsigned int binomial(unsigned int n, unsigned int k)
{
    return factorial(n) / (factorial(k) * factorial(n - k));
}

Let’s say the statement binomial(5, 3); is executed somewhere in our program. The binomial function will receive the value 5 in the variable “n” and the value 3 in the variable “k”. This function only has one line in its body; when the computer runs that line, a number of things happen:

  1. In order to evaluate the expression, the factorial function is called three times:

    • Once with n = 5.
    • Once with n = 3.
    • Once with n = 2. (2 = 5 - 3)
  2. Each time the factorial function is called, it runs through its for loop and computes the factorial.

  3. The return values of the calls to the factorial functions are stored in temporary variables.

  4. The division and multiplication operation are carried out as per the expression in the binomial function. This yields the final result of

It should be noted that the factorial function in our example gets called 3 times for every computation of the binomial coefficient. Every time a function call occurs in your code, the function gets executed. This means that if the function is time-consuming and you use the value multiple times then you should probably store it in a variable instead of calling the function every time you need it. However, if the function is fast, this is probably unneeded.

Just like break and continue, the return statement may occur at any place in the function. As soon as a return statement is encountered, the function returns with the specified value even if it has not yet reached the end. This can be very useful in certain circumstances. For example, suppose we had a function is_prime that tested whether or not an integer was prime. Then we could write a next_prime function as follows to find the next prime number after a certain value:

unsigned int next_prime(unsigned int n)
{
    unsigned int i;

    for (i = n + 1;; ++i) {
        if (is_prime(i)) {
            return i;
        }
    }
}

We have to be careful with that function because if is_prime never returns a non-zero value, it will run forever. However, assuming our prime test works and that there is a next prime before we reach integer overflow, this function will indeed return the next prime number. In this case, we could use a break statement or something clever for the condition in our for loop. However, simply placing the return statement in the body of the loop is probably the most clear way to implement it.

Void Functions

The C language also allows for functions that do not return a value. These functions are called void functions. This is useful when writing functions that are purely procedural in nature. In order to specify that a function is a void function, you use void as the return type. There is no such thing as a void variable, the void type is just a placeholder variable type.

Because void functions do not return an actual value, they do not need a return statement. However, return statements can still be useful in the same way as break and continue statements are used in lists. They provide a way to prematurely leave the function. Inside a void function, a return statement does not have an expression associated with it. A void return statement is simply: return;.

Variable Scoping

I said at the beginning of this discussion that functions were one of C’s most powerful features. The thing that sets functions in C apart from similar concepts in languages like MATLAB or early versions of FORTRAN is variable scoping. The variables (and arguments) declared in a C function are local to that function. They are created before the function call begins and destroyed when it leaves.

On the one hand, this means that you can’t refer to variables in one function from another function without explicitly passing them in (as arguments) or out (as a return value). This also means that writing a function that computes multiple values is more difficult (but not impossible). However, it comes with the benefit of making your code much cleaner and less error-prone.

For example, both of our functions above use the variable name “n” for one of their input variables. If we didn’t have variable scoping, then the factorial function could change the value of “n” and that would make a mess out of the binomial function. Variable scoping allows a function to be able to run and know that none of the functions it calls will do anything to its variables. While this may not seem like a big deal at first, it becomes a necessity when writing more complex programs.

Recursion

Recursion in programming is where a function calls itself. If used carefully, this can be a very powerful feature. For example, we could rewrite our factorial function as follows:

unsigned int factorial(unsigned int n)
{
    if (n <= 1) {
        return 1;
    } else {
        return n * factorial(n - 1);
    }
}

A recursive function works on the exact same principle as mathematical induction. You define what the function should do on some base case and then define all of the other cases in terms of another call to the function. When writing recursive functions you have to be careful to make sure your base cases are correct, otherwise the function may run forever. Recursive functions may also run into trouble if there are too many nested function calls. It is possible for a recursive function, if it calls itself too many times, to make the computer run out of memory and have to kill the program. This is what’s called a stack overflow and we will discuss it later.

Another example of something that is usually defined recursively is the Fibonacci sequence. We can write a simple function to calculate the n’th Fibonacci number as follows:

unsigned int fibonacci(unsigned ing n)
{
    if (n <= 2) {
        return 1;
    } else {
        return fibonacci(n - 1) + fibonacci(n - 2);
    }
}

While the above function looks very simple on the face of things, it is actually a very good example of when recursion is a very bad idea. Look a little closer and start tracing through the function calls. For n = 1 and n = 2, there is only one function call because it returns immediately. However, fibonacci(3) makes two more calls to fibonacci and fibonacci(4) makes two calls to fibonacci, one of which is fibonacci(3) which makes two more calls. If you carry this a little further, you’ll see that when you call fibonacci, the number of function calls that result is approximately exponential in n which is very bad for performance. If we actually wanted to compute Fibonacci numbers, we would probably actually write a function more like this:

unsigned int fibonacci(unsigned ing n)
{
    unsigned int i, f1, f2, f3;

    f1 = 1;
    f2 = 1;
    for (i = 3; i <= n; ++i) {
        f3 = f1 + f2;
        f1 = f2;
        f2 = f3;
    }
    return f2;
}

While the examples of a recursive function given above is fairly trivial, recursive functions will be crucial for working with trees and similar data structures in the future.