In this section, we will discuss functions, one of C’s most subtly powerful features. Functions allow the programmer to break their code into smaller, easier to manage pieces. Each of these pieces has its own variables and accomplishes a specific task. When all of these tasks are put together you get the entire program. From a mathematical perspective, it’s like breaking a big proof into smaller theorems or even lemmas.
There are two primary ways you can think of functions. Some functions
are more computational in nature: they take some inputs and compute a
value. Others are more procedural in nature: they perform some action.
An example of a computational function would be the strlen
function in string.h
that takes in a C string and counts
the number of characters in it. An example of a procedural function is
the printf
function that we have used several times to
write values to the terminal. The main
function in most
programs also tends to be procedural in nature. Most functions you will
encounter will be somewhere in between.
The best way to explain the semantics of functions is with an example. Suppose we wanted to compute a binomial coefficient. One way to do that would be with the following two functions:
/* Computes the value n! */
unsigned int factorial(unsigned int n)
{
unsigned int i;
unsigned int product;
product = 1;
for (i = 1; i <= n; ++i) {
product *= i;
}
return product;
}
/* Computes n choose k */
unsigned int binomial(unsigned int n, unsigned int k)
{
return factorial(n) / (factorial(k) * factorial(n - k));
}
These two functions are examples of functions that are computational in nature. The first computes a factorial and the second computes a binomial coefficient using the factorial function. From this simple example, we can see all of the basic parts of a C function.
The first part of any function is a function declaration:
unsigned int binomial(unsigned int n, unsigned int k)
The function declaration consists of three parts: a return type, a
name, and a comma separated list of arguments. The first part of the
function declaration is the return type. Calls to functions are,
themselves, expressions with the value of the expression being the value
returned from the function. The return type signifies the type of the
returned value and, consequentially, the type of the function call as an
expression. The fact that function calls are valid expressions can be
seen in the binomial
function above.
The second part is the function name. The function name can be
anything so long as it follows the same rules as for variables. Function
names can contain letters, numbers, and the underscore (_) character in
any order so long as it does not start with a number. You also need to
take care to make sure that function names are unique; you cannot have
two functions with the same name. This can take some care because most
header files define several functions such as stdio.h
defining the printf
function.
The third part of a function declaration is a comma-separated list of arguments surrounded by parentheses. The arguments are variable declarations for variables that act as inputs to the function. Each argument is a full-fledged variable and you can even assign a new value to it later in the function. (This is not usually recommended as it can make for error-prone and hard-to-read code.) When the body of the function is executed, these variables are already set to the values passed into the function when it was called.
After the function declaration is a block that contains the body of the function.
{
unsigned int i;
unsigned int product;
product = 1;
for (i = 2; i <= n; ++i) {
product *= i;
}
return product;
}
The body of the functions is a block and is surrounded by braces. The first part of the function body is a series of variable declarations. Some compilers let you declare variables anywhere in the function so long as it is before they are used. However, this capability is part of the C99 standard which may not be supported on older compilers. In order to ensure that your program will compile on older compilers, you should put your variable declarations at the top of the function body.
After the variable declarations come the rest of the statements of
your function. In the case of our factorial function, we have a for loop
that loops through all the numbers between 2 and n and multiplies them
together. The final statement is the return
statement that
specifies the return value of the function.
The return
statement requires a little discussion of its
own. A return statement is a statement of the form
return <expression>;
where
<expression>
is any expression. The expression is
evaluated and its value becomes the function’s return value. The code
calling this function then receives this value as the value of the
expression that is the function call. In order to make this all a little
more clear, look at the second of our two functions:
unsigned int binomial(unsigned int n, unsigned int k)
{
return factorial(n) / (factorial(k) * factorial(n - k));
}
Let’s say the statement binomial(5, 3);
is executed
somewhere in our program. The binomial
function will
receive the value 5 in the variable “n” and the value 3 in the variable
“k”. This function only has one line in its body; when the computer runs
that line, a number of things happen:
In order to evaluate the expression, the factorial
function is called three times:
n = 5
.n = 3
.n = 2
. (2 = 5 - 3)Each time the factorial function is called, it runs through its for loop and computes the factorial.
The return values of the calls to the factorial functions are stored in temporary variables.
The division and multiplication operation are carried out as per
the expression in the binomial
function. This yields the
final result of
It should be noted that the factorial
function in our
example gets called 3 times for every computation of the binomial
coefficient. Every time a function call occurs in your code, the
function gets executed. This means that if the function is
time-consuming and you use the value multiple times then you should
probably store it in a variable instead of calling the function every
time you need it. However, if the function is fast, this is probably
unneeded.
Just like break
and continue
, the
return
statement may occur at any place in the function. As
soon as a return statement is encountered, the function returns with the
specified value even if it has not yet reached the end. This can be very
useful in certain circumstances. For example, suppose we had a function
is_prime
that tested whether or not an integer was prime.
Then we could write a next_prime
function as follows to
find the next prime number after a certain value:
unsigned int next_prime(unsigned int n)
{
unsigned int i;
for (i = n + 1;; ++i) {
if (is_prime(i)) {
return i;
}
}
}
We have to be careful with that function because if
is_prime
never returns a non-zero value, it will run
forever. However, assuming our prime test works and that there is a next
prime before we reach integer overflow, this function will indeed return
the next prime number. In this case, we could use a break
statement or something clever for the condition in our for loop.
However, simply placing the return statement in the body of the loop is
probably the most clear way to implement it.
The C language also allows for functions that do not return a value.
These functions are called void functions. This is
useful when writing functions that are purely procedural in nature. In
order to specify that a function is a void function, you use
void
as the return type. There is no such thing as a
void
variable, the void
type is just a
placeholder variable type.
Because void functions do not return an actual value, they do not
need a return
statement. However, return statements can
still be useful in the same way as break
and
continue
statements are used in lists. They provide a way
to prematurely leave the function. Inside a void function, a return
statement does not have an expression associated with it. A void return
statement is simply: return;
.
I said at the beginning of this discussion that functions were one of C’s most powerful features. The thing that sets functions in C apart from similar concepts in languages like MATLAB or early versions of FORTRAN is variable scoping. The variables (and arguments) declared in a C function are local to that function. They are created before the function call begins and destroyed when it leaves.
On the one hand, this means that you can’t refer to variables in one function from another function without explicitly passing them in (as arguments) or out (as a return value). This also means that writing a function that computes multiple values is more difficult (but not impossible). However, it comes with the benefit of making your code much cleaner and less error-prone.
For example, both of our functions above use the variable name “n”
for one of their input variables. If we didn’t have variable scoping,
then the factorial
function could change the value of “n”
and that would make a mess out of the binomial
function.
Variable scoping allows a function to be able to run and know that none
of the functions it calls will do anything to its variables. While this
may not seem like a big deal at first, it becomes a necessity when
writing more complex programs.
Recursion in programming is where a function calls itself. If used carefully, this can be a very powerful feature. For example, we could rewrite our factorial function as follows:
unsigned int factorial(unsigned int n)
{
if (n <= 1) {
return 1;
} else {
return n * factorial(n - 1);
}
}
A recursive function works on the exact same principle as mathematical induction. You define what the function should do on some base case and then define all of the other cases in terms of another call to the function. When writing recursive functions you have to be careful to make sure your base cases are correct, otherwise the function may run forever. Recursive functions may also run into trouble if there are too many nested function calls. It is possible for a recursive function, if it calls itself too many times, to make the computer run out of memory and have to kill the program. This is what’s called a stack overflow and we will discuss it later.
Another example of something that is usually defined recursively is the Fibonacci sequence. We can write a simple function to calculate the n’th Fibonacci number as follows:
unsigned int fibonacci(unsigned ing n)
{
if (n <= 2) {
return 1;
} else {
return fibonacci(n - 1) + fibonacci(n - 2);
}
}
While the above function looks very simple on the face of things, it
is actually a very good example of when recursion is a very bad
idea. Look a little closer and start tracing through the function calls.
For n = 1
and n = 2
, there is only one
function call because it returns immediately. However,
fibonacci(3)
makes two more calls to fibonacci
and fibonacci(4)
makes two calls to fibonacci
,
one of which is fibonacci(3)
which makes two more calls. If
you carry this a little further, you’ll see that when you call
fibonacci
, the number of function calls that result is
approximately exponential in n
which is very bad for
performance. If we actually wanted to compute Fibonacci numbers, we
would probably actually write a function more like this:
unsigned int fibonacci(unsigned ing n)
{
unsigned int i, f1, f2, f3;
f1 = 1;
f2 = 1;
for (i = 3; i <= n; ++i) {
f3 = f1 + f2;
f1 = f2;
f2 = f3;
}
return f2;
}
While the examples of a recursive function given above is fairly trivial, recursive functions will be crucial for working with trees and similar data structures in the future.