An example: Root finding with GSL

The [GNU Scientific Library][1] (or GSL) provides implementations of various mathematical and statistical algorithms for use in scientific applications. Here, we will explore using some of the single-variable root solvers provided by GSL. The root solvers in GSL are designed so that the actual solver to use is determined by a single run-time parameter so that the solvers can be easily swapped out as long as they are of the same form. First, we will demonstrate how to use their basic root finding algorithms that need only the functions value and then we will demonstrate their newton’s method solver that also requires a derivative.

Making a general quadratic function

For this example, we will be solving a quadratic equation in the general form ax2 + bx + c. To begin, we create header file for our quadratic function called quadratic.h:

#ifndef __EXAMPLES_LIBGSL_QUADRATIC_H__
#define __EXAMPLES_LIBGSL_QUADRATIC_H__

struct quadratic_params {
    double a, b, c;
};

/* Returns the value of the quadratic at x */
double quadratic(double x, void *params);

/* Returns the derivative of the quadratic at x */
double quadratic_deriv(double x, void *params);

/* Evaluates both the quadratic and its derivative at x and stores the
 * value in *y and the derivative in *dy. */
void quadratic_fdf(double x, void *params, double *y, double *dy);

#endif /* !defined __EXAMPLES_LIBGSL_QUADRATIC_H__ */

This defines a simple data structure called quadratic_params that stores the coefficients for a general quadratic along with functions that evaluate the quadratic at a point. The value passed into params will always be of the type struct quadratic_params * even though it is specified as void * (more on that in a bit). We also create a C source file called quadratic.c that implements these functions:

#include "quadratic.h"

double quadratic(double x, void *params)
{
    struct quadratic_params *p;
    
    p = (struct quadratic_params *)params;
    return (p->a * x + p->b) * x + p->c;
}

double quadratic_deriv(double x, void *params)
{
    struct quadratic_params *p;
    
    p = (struct quadratic_params *)params;
    return 2.0 * p->a * x + p->b;
}

void quadratic_fdf(double x, void *params, double *y, double *dy)
{
    *y = quadratic(x, params);
    *dy = quadratic_deriv(x, params);
}

By now, most of the above code should look familiar to you. The one thing you may not recognize is the pointer cast. The second argument of each of these functions is a void pointer called params which we immediately cast to a struct quadratic_params * and store in p. The cast operation doesn’t change the address stored int the pointer, just the type of data to which it is pointing. We have seen this before when we use malloc. The malloc function returns a void pointer because it just returns an unused block of memory which doesn’t have any type associated to it until we give it one. In this case, we are getting something that we assume is a pointer to a quadratic_params structure with meaningful values in it. Before we can see why this is a valid assumption, we need to take a look at how GSL represents functions.

Making a GSL function

In order to represent a mathematical function, GSL provides a gsl_function type. To turn what we did above into a gsl_function representing the quadratic x2 − 5 we create a gsl_function called F as follows:

gsl_function F;
struct quadratic_params params = { 1.0, 0.0, -5.0 };
F.function = &quadratic;
F.params = &params;

The first line just declares the variable F of type gsl_function. The gsl_function type is just a structure with two fields. The reason we don’t need the struct keyword is that there is a typedef inside of gsl_math.h that defines the type gsl_function to be a particular structure. The second line creates a quadratic_params structure and fills it with data. The braces notation is a nice short-hand you can use when declaring structure variables; it is precisely equivalent to the following:

struct quadratic_params params;
params.a = 1.0;
params.b = 0.0;
params.c = -0.5;

Declaring with the nice braces notation simply makes things a little shorter and sometimes easier to read. The third and fourth lines fill the gsl_function structure with the function to use and the parameters we created. Inside of mathematical code, you will probably see this sort of thing a lot because it allows for a large amount of generality (as opposed to hard-coding things). Sure we could write a specific function such as

double x_squared_minus_5(double x)
{
    return x * x - 5;
}

but that would then we would have to write a separate C function for every mathematical function. Instead, we define a general quadratic function that takes the variable x and a pointer so a structure that contains the coefficients. This way we can easily solve any number different quadratics without writing a separate C function for each one.

If you look at how the function quadratic was declared above, you will see that the second argument is of type void * instead of struct quadratic_params *. This is because GSL doesn’t have any idea what type of parameters your function takes, so it simply accepts any pointer whatsoever and trusts you, the programmer, to pass it the right one. We can know that the types match because GSL guarantees that the only thing that will ever be passed into the params argument of quadratic is the pointer we give it in the params field of gsl_function. This way, as long as we give it a pointer to a variable of the correct type, the value passed into quadratic will always be a pointer to a variable of the correct type.

Function pointers

If you have been paying attention, the statement F.function = &quadratic should have bothered you a little. Is quadratic a variable? Where did it come from? The only thing we have seen named quadratic is a function. How are we putting a function in a variable?

In C, there is a special type of pointer called a function pointer. Function pointers, instead of pointing to variables or data on the stack, point to functions. When the computer runs your program, all of its compiled code is loaded into memory. A function pointer simply points to the location in memory where the code for the given function is stored. Unfortunately, the syntax for declaring variables that are function pointers is kind of strange. It’s one of those odd corners of C that most programmers don’t know very well. (As you will soon see, there are very good reasons why most programmers avoid them.) The gsl_function type is actually defined as follows (note the typedef):

struct gsl_function_struct 
{
    double (* function) (double x, void * params);
    void * params;
};

typedef struct gsl_function_struct gsl_function ;

The first field is named function that takes a pointer to a function of a particular type. The type of function is one that returns a double and takes two arguments: one of type double and one of type void * just like our quadratic function. The actual type of the function field is double (*)(double, void *) (yes, that’s gross). It is important to note that the prototype of the function is built into the function pointer’s type. This is so that C knows how to properly call the function from just the pointer. However, this also means that if you have a variable that is a function pointer, it can only ever store a pointer to a function with that exact prototype. This is why every function we use for gsl_function has to return a double and take a double and a void *. Whatever type of function we put into the gsl_function structure has to be able to cover every type of real-valued mathematical function you would want to solve. By using a void * to store a sort of generic extra structure, we can pass anything we want to the function by simply putting it in a custom structure and passing in a pointer.

Setting up the solver

GSL provides a variety of solvers for finding roots. In our first example, we will use a bracketed solver that requires only the function’s value. Later we will look at a Newton’s method solver. The people who wrote GSL have done a decent job of hiding most of the details of the solvers behind a generic gsl_root_fsolver type. A given root solver algorithm may have any number of state variables to keep track of different things between iterations. Instead of exposing all of those nasty details to us, the programmers, they have hidden them behind this gsl_root_fsolver type and simply provided functions to work with it. The first thing we have to do is to ask GSL to create a solver of the appropriate type for us.

const gsl_root_fsolver_type *solver_type;
gsl_root_fsolver *solver;

solver_type = gsl_root_fsolver_bisection;
solver = gsl_root_fsolver_alloc(solver_type);

The solver_type variable holds a solver type. In this case, it is gsl_root_fsolver_bisection which is some value (again, we don’t care what it really is) that represents a bisection solver. The symbol gsl_root_fsolber_bisection is a global variable provided in gsl/gsl_roots.h that contains whatever data is needed to specify a bisection solver. When we call gsl_root_fsolver_alloc, GSL creates a new bisection solver and gives it to us. We will be responsible for destroying it with a call to gsl_root_fsolver_free when we are done using it.

The next thing to do is to set a few parameters in our solver. Specifically, it’s a bracketing solver so it needs an initial interval. Also, it will need to know what function to solve. We have already created a gsl_function representing x * x - 5, so we can set the solver up as follows:

double x_lo = 0.0, x_hi = 5.0;
gsl_root_fsolver_set(solver, &F, x_lo, x_hi);

The gsl_root_fsolver_set call sets the function as well as the initial interval for the solution. We make x_lo and x_hi variables because we will want to use them later during the iteration step to test for convergence.

Iterating

Now that we have the solver set up, we need to iterate over it. In order to do that we call the gsl_root_fsolver_iterate function that simply runs one step of the iteration. Once that step is run, we can get current best root and error interval out of the solver and check for convergence. A single iteration step looks like this:

gsl_root_fsolver_iterate(solver);

/* get the solver's current best solution and bounds */
r = gsl_root_fsolver_root(solver);
x_lo = gsl_root_fsolver_x_lower(solver);
x_hi = gsl_root_fsolver_x_upper(solver);

/* Check to see if the solution is within 0.001 */
status = gsl_root_test_interval(x_lo, x_hi, 0, 0.001);
if (status == GSL_SUCCESS)
    printf("Converged:\n");

The first line actually performs the iteration. Then the next three statements gets the guess and bounds generated by the iteration step. Then, we use the gsl_root_test_interval to see if our interval is within the given tolerance. If status == GSL_SUCCESS then it is within tolerance and we say that it has converged. Otherwise, we will have status == GSL_CONTINUE and we need to run another step of the iteration.

Putting it all together

Putting everything together and adding a few printf statements and some error-checking, we get the following main function:

#include <stdio.h>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_roots.h>

#include "quadratic.h"

#define MAX_ITERATIONS 100

int main()
{
    int status;
    int i;
    const gsl_root_fsolver_type *solver_type;
    gsl_root_fsolver *solver;
    gsl_function F;
    double r;

    /* We want to solve x^2 - 5 */
    struct quadratic_params params = { 1.0, 0.0, -5.0 };
    double r_expected = sqrt(5.0);
    /* On the interval [0, 5] */
    double x_lo = 0.0, x_hi = 5.0;

    /* Set up the function */
    F.function = &quadratic;
    F.params = &params;

    /* Allocate a bisection solver and set it to use F */
    solver_type = gsl_root_fsolver_bisection;
    solver = gsl_root_fsolver_alloc(solver_type);
    gsl_root_fsolver_set(solver, &F, x_lo, x_hi);

    printf("using %s method\n", gsl_root_fsolver_name(solver));

    printf("%5s [%9s, %9s] %9s %10s %9s\n",
           "iter", "lower", "upper", "root", "err", "err(est)");

    status = GSL_CONTINUE;
    for (i = 1; i <= MAX_ITERATIONS && status == GSL_CONTINUE; ++i) {
        /* iterate one step of the solver */
        status = gsl_root_fsolver_iterate(solver);
        if (status != GSL_SUCCESS)
            break;

        /* get the solver's current best solution and bounds */
        r = gsl_root_fsolver_root(solver);
        x_lo = gsl_root_fsolver_x_lower(solver);
        x_hi = gsl_root_fsolver_x_upper(solver);

        /* Check to see if the solution is within 0.001 */
        status = gsl_root_test_interval(x_lo, x_hi, 0, 0.001);
        if (status == GSL_SUCCESS)
            printf("Converged:\n");

        printf("%5d [%.7f, %.7f] %.7f %+.7f %.7f\n",
               i, x_lo, x_hi, r, r - r_expected, x_hi - x_lo);
    }

    /* Free the solver */
    gsl_root_fsolver_free(solver);

    if (status == GSL_CONTINUE) {
        printf("error: too many iterations");
    } else if (status != GSL_SUCCESS) {
        printf("error: %s\n", gsl_strerror(status));
    }

    return status;
}

Switching out solvers

Now let’s say that we want to use a more efficient Brent solver. Because of the way that GSL is designed, this is really easy. All we have to do is change one line in the above code. Specifically,

/* Allocate a bisection solver and set it to use F */
solver_type = gsl_root_fsolver_brent;
solver = gsl_root_fsolver_alloc(solver_type);
gsl_root_fsolver_set(solver, &F, x_lo, x_hi);

The line we changed is the one that assigns solver_type now sets the solver type to gsl_root_fsolver_brent. Now, when we call gsl_root_fsolver_alloc, it creates a Brent solver instead of a bisection solver. Because GSL does everything with these generic gsl_root_fsolver pointers, we can easily swap things out and let GSL figure out how to run the other solver.

Solving using Newton’s method

Newton’s method is a little different than bisection or Brent because Newton’s method requires derivatives. In order to do this, we have to create a gsl_function_fdf instead of a gsl_function. We can do this as follows:

gsl_function_fdf FDF;
FDF.f = &quadratic;
FDF.df = &quadratic_deriv;
FDF.fdf = &quadratic_fdf;
FDF.params = &params;

The gsl_function_fdf type has a few more fields. In addition to a function field (now called simply f), it also has fields that take a derivative and a function that returns both the derivative and the value of the function itself. You should also note that we still have the params field just like we did before. There is no need to have multiple params fields because it is assumed that all three of the given functions correspond to a single mathematical function and that that function can be described by a single structure. The params structure given here will be passed into the params argument of any of those three functions whenever the function is called.

Most of the changes required to use the newton solver are simply a matter of changing everything from gsl_root_fsolver to gsl_root_fdfsolver. However things also change just a bit at the iteration step. Like with the bracketing solvers, the iteration step is performed by gsl_root_fdfsolver_iterate. However, the result is not an interval, but a new guess which we compare to the previous guess. We do that as follows:

/* Get the new approximate solution */
x0 = x;
x = gsl_root_fdfsolver_root(solver);

/* Check to see if the solution is within 0.001 */
status = gsl_root_test_delta(x, x0, 0, 0.001);
if (status == GSL_SUCCESS)
    printf("Converged:\n");

The gsl_root_test_delta function checks the absolute difference of x and x0 to see if it is within our specified tolerance. Once you put all of those pieces together, a full Newton’s method solving function looks like this:

#include <stdio.h>
#include <gsl/gsl_errno.h>
#include <gsl/gsl_math.h>
#include <gsl/gsl_roots.h>

#include "quadratic.h"

#define MAX_ITERATIONS 100

int main()
{
    int status;
    int i;
    const gsl_root_fdfsolver_type *solver_type;
    gsl_root_fdfsolver *solver;
    gsl_function_fdf FDF;
    double x0;

    /* We want to solve x^2 - 5 */
    struct quadratic_params params = { 1.0, 0.0, -5.0 };
    double r_expected = sqrt(5.0);
    /* Starting from a guess of 5.0 */
    double x = 5;

    /* Set up the function */
    FDF.f = &quadratic;
    FDF.df = &quadratic_deriv;
    FDF.fdf = &quadratic_fdf;
    FDF.params = &params;

    /* Allocate a newton solver and set it to use FDF */
    solver_type = gsl_root_fdfsolver_newton;
    solver = gsl_root_fdfsolver_alloc(solver_type);
    gsl_root_fdfsolver_set(solver, &FDF, x);

    printf("using %s method\n", gsl_root_fdfsolver_name(solver));

    printf ("%-5s %10s %10s %10s\n", "iter", "root", "err", "err(est)");

    status = GSL_CONTINUE;
    for (i = 1; i <= MAX_ITERATIONS && status == GSL_CONTINUE; ++i) {
        /* Iterate one step of the solver */
        status = gsl_root_fdfsolver_iterate(solver);
        if (status != GSL_SUCCESS)
            break;

        /* Get the new approximate solution */
        x0 = x;
        x = gsl_root_fdfsolver_root(solver);

        /* Check to see if the solution is within 0.001 */
        status = gsl_root_test_delta(x, x0, 0, 0.001);
        if (status == GSL_SUCCESS)
            printf("Converged:\n");

        printf ("%5d %10.7f %+10.7f %10.7f\n", i, x, x - r_expected, x - x0);
    }

    /* Free the solver */
    gsl_root_fdfsolver_free(solver);

    if (status == GSL_CONTINUE) {
        printf("error: too many iterations");
    } else if (status != GSL_SUCCESS) {
        printf("error: %s\n", gsl_strerror(status));
    }

    return status;
}

A note on derivatives and efficiency

You may be wondering why we have f, df, and fdf functions on the gsl_function_fdf structure when the fdf function can easily be written based on the other two. The reason is one of efficiency and this is something you will see in many applied math applications. In a lot of algorithms for root finding, maximization, etc. most of the time spent is in evaluating the function (or its derivative) so we want this step to be as fast as possible. In the above example, computing the value of the function and its derivative are basically completely different operations. However, this is often not the case. The usual case is one where computing the derivative and computing the value of the function itself require many of the same operations. By making a single function that computes both, we can avoid duplicating work and make the solver faster. For example, suppose we were trying to solve 1 − ex2. In this case, we would write our functions as follows:

double f(double x, void *params)
{
    return 1 - exp(x * x);
}

double df(double x, void *params)
{
    return 2 * x * exp(x * x);
}

void quadratic_fdf(double x, void *params, double *y, double *dy)
{
    double ex2 = exp(x * x);
    *y = 1 - ex2;
    *dy = 2 * x * ex2;
}

By writing the function this way, we can only evaluate exp once and get both the function value and the derivative. This is a very good thing because evaluating the exponential function is, itself, a numeric process involving a power series. By cutting down the number of evaluations of the exponential function, we can significantly increase the speed of the root finding algorithm.

Downloads

Full source code for these examples can be found here.