Variables and Types

While our little “Hello World!” example was interesting, it doesn’t do much. The first step towards making more functional programs is variables. First, we’ll start with an example.

Variables by Example

Consider the following little program:

#include <stdio.h>

int main(int argc, char **argv)
{
    int number;

    number = 5;
    number = number + 7;
    number = number / 2;

    printf("number = %d\n", number);

    return 0;
}

If you compile and run this program it should print “number = 6” to the terminal. Let’s look at it line by line.

int number;

This line is a variable declaration. All variables used in a function have to be declared before they can be used. A variable is declared by a type followed by the name of the new variable and then a semicolon. The type int specifies that this variable is to be an integer. The new variable is named “number”.

Variables should be declared at the top of the function before any statements. Some compilers will allow you to declare variables anywhere in the function so long as you declare them before they are used. However, not all compilers support this, so you are better off simply declaring them at the top.

number = 5;
number = number + 7;
number = number / 2;

These three lines perform the following computations:

  1. Assign the value 5 to the variable “number”.
  2. Assign to “number” the sum of it’s value (currently 5) and 7.
  3. Assign to “number” it’s value (now 12) divided by 2.

Notice the use of the word “assign” here. In C and many other programming languages, variables are assigned and their values can change over time. It is not like algebra in which a variable is a place-holder that remains constant once you give it a value. In C, the “=” symbol denotes a variable assignment: the variable on the left side gets assigned the value of the expression on the right side. Another thing to observe is that the variable being assigned can show up in the expression on the right hand side. While number = number + 7 makes no mathematical sense, it makes perfect sense in programming. This is because “=” represents an assignment and value of the variable changes after the expression on the right is evaluated.

printf("number = %d\n", number);

return 0;

The last two statements are similar to our “Hello World!” example. The call to printf looks different because we are printing a number to the screen. We will discuss more about how this statement works in our section on input and output.

Expressions

Before we go too much further, we need to discuss the concept of an expression. In C, an expression is something that can be evaluated, i.e. that has a value. For example, basic arithmetic operations such as a + b - 7 are expressions. In fact, most things in C are expressions. For instance, a variable assignment is an expression that evaluates to the value assigned to the variable. Because of this x = y = z = 5 will assign the value 5 to all three of the variables x, y, and z.

Just like variables, each expression has an associated type. For variables and constants, the type of the expression is just the type of the variable or constant. For basic arithmetic operations, the type of the expression is automatically derived from the types of the inputs to the operation. For function calls, the type of the expression is the return type of the function.

If you want a more formal definition of an expression, it is not hard to define recursively. An expression is any of the following:

Data Types

In order to discuss data types, we need to know a little bit about how the computer stores data. While your program is running, all of the data (variables) are stored in Random Access Memory or RAM. The computer’s RAM is made up of billions of little switches that can be either on or off. Each of these switches is called a bit. Everything the computer stores, it has to store by flipping bits on and off in RAM.

The computer’s RAM is divided up into 8-bit blocks called bytes and each byte is given an address. This allows the computer to locate any single byte among the billions of bytes available. A variable then refers to a number of consecutive bytes in RAM in which the information is stored.

In order to store information, it has to be converted to a series of ons and offs or 1’s and 0’s. For example, we store positive integers in memory by simply writing them in base-2; so 23 would be written as 10111. The process of converting information to binary data is called encoding. In order for the computer to do anything useful with the binary data sitting in memory, it needs to know two things: how the data is encoded and how much space it consumes in memory. The purpose of data types is to provide these two pieces of information.

As you will see below, it can be very hard to know ahead of time what the exact size of type might be. For this reason, C provides a special symbol: sizeof. You can use sizeof to find out how much space a given type consumes in memory. The expression sizeof(<type>) tells you how much space that type consumes in bytes. The expression sizeof(<variable>) tells you how much space that variable consumes. The size of a variable and the size of its respective type are the same. The two methods of using sizeof are provided for convenience.

Character Types

The most basic type in C is char. The char type is designed for holding a single alphanumeric character. The exact size of char has varied from one platform to another over the years as different system designers have had different ideas how you should define a “character”. At this point, basically all systems define a char to be 8 bits.

In order to encode characters, a standard called ASCII has historically been used. The ASCII encoding can handle the standard English characters as well as punctuation and a few control characters. In order to handle other languages such as Mandarin Chinese, a much larger set of characters is needed. This us usually handled with one of the unicode standards which are far too complicated to discuss here. Everything we will be doing in this course will be done in ASCII for the sake of simplicity.

Character constants in C are written using sigle-quotes (e.g. c = 'a'). This is different from strings which are written using double-quotes.

At its core, char is actually an integer type and you perform all of the basic integer operations on it (adition, multiplication, etc.). Also, the ASCII encodings of the English characters are in alphabetical order. This can be very useful. For instance, if you have some variable “c” of type char, the expression c + 1 will give you the next letter in the alphabet. If you want to convert an upper-case character to lower-case, you can do c - 'A' + 'a'. You may also find yourself using char variables as just regular integers if you are concerned about size (char is the smallest integer type).

Integer Types

We already looked briefly at the example of positive integers and saw that we can store them by representing them in base-2. For the moment, consider the case of the unsigned char type which stores unsigned integers in a single byte. The number 23 would be stored as “00010111”. Notice that three zeros have been added so that it is exactly one byte long. What about the number 255? That would be stored as “11111111” (go ahead and check it). When we add 1 to 255 we get 256 which, written as binary, is 100000000. However, this presents a problem since 100000000 is 9 bits long. In this case we get something called integer overflow. Integer overflow is what happens when the binary representation of a number is too large to fit in the specified data type. When this happens, the computer simply throws away the digits that won’t fit; if you try to store 256 in an unsigned char you simply get 00000000 or 0. In general, an n-bit unsigned integer type can store any number between 0 and 2n without overflow; larger numbers get stored modulo 2n. When working with integer types you always have to be careful to choose the right size for your application.

Negative numbers are a little more complicated. In order to allow for negative numbers, something called two’s complement is usually used. In the two’s complement encoding, a number is negated by flipping all of the bits (1 to 0 and 0 to 1) and adding 1. If the data type has a size of n bits, this is equivalent to storing  − x as 2n − x. The reason for two’s complement is that, when you factor in integer overflow, adding and subtracting numbers encoded in two’s complement is the same as adding and subtracting unsigned integers (an easy exercise in modular arithmetic).

In C there are 4 basic integer types: char, short, int, long. Each of these comes in both signed and unsigned varieties. An integer data type is said to be signed if it can store negative values and is said to be unsigned otherwise. When two’s complement is used for signed data types, this distinction makes very little difference most of the time. However, it makes a huge difference in some cases. For example, u < 0 will always evaluate to false if “u” has an unsiged data type.

Each of the types short, int, and long are signed types. The unsigned versions are called unsigned short, unsigned int, and unsigned long respectively. Depending on the system, char may be signed or unsigned. If you want to be specific, you can use unsigned char or signed char.

Unfortunately, figuring out the size of an integer data type in C can get a bit complicated. The char type is usually eight bits although it can be longer. This is because computers have evolved over time and some have used a different number of bits to store a character. The short and int types are required to be at least twice the size of a char with int being at least as big as short. The long type is required to be at least four times the size of char and at least as large as int. Fortunately, most computers these days use an eight-bit byte, but you still have to watch out for short, int, and long.

Below is a table describing the basic integer data types in C. The data type sizes are what you can expect to find on an average PC or Mac.

Type Size (bits) [un]signed? Range
char 8 depends on system depends on system
signed char 8 signed  − 128 to 127
unsigned char 8 unsigned 0 to 256
short 16 signed  − 215 to 215 − 1
unsigned short 16 unsigned 0 to 216 − 1
int 32 signed  − 231 to 231 − 1
unsigned int 32 unsigned 0 to 232
long 64 signed  − 263 to 263 − 1
unsigned long 64 unsigned 0 to 264 − 1

In the C99 standard, they also defined a long long which is guaranteed to be at least eight times the size of char. If you are concerned about size and want a variable to have a specific size, the C standard library includes a header file called stdint.h that supplies the programmer with some size-guaranteed data types.

Floating-point Types

In C, we also have a couple of floating-point types that allow us to store numbers with a fractional or decimal part. Floating-point types are encoded in something similar to scientific notation. The reason why it is called floating-point is because the location of the decimal point is not fixed. In a fixed-point encoding, you would fix a number n (such as 100) and store x * n as an integer. This guarantees the amount of precision but significantly limits the range of values you can store.

The exact details of how floating-point numbers are encoded depends on the processor, but most use the IEEE 754 floating-point specification. Before going into the details, think about how you could write decimal numbers in binary. The principal of decimal expansion works equally well in any base, so we could write 3.25 as 11.01 in binary. To encode a real number x in IEEE 754 floating-point, x is first written as x = m2E where E is an integer chosen so that |m| ∈ [1, 2) (zero is handled as a special case). The part denoted by E is called the exponent and the part denoted by m is called the mantissa. The number is then stored in terms of a sign bit, a few bits for the exponent, and then the mantissa. As a slight optimization, since the mantissa is always in the range [1, 2), the integer part of the binary expansion is always 1, so it is simply thrown away.

Along with storing numbers, floating-point types have a couple of additional special values: infinity and NaN (Not a Number). These values usually occur when you have a division by zero or similar. Exactly when they occur is determined by the implementation, but you should be aware of them. Whenever you are doing mathematical computations, NaN and infinity are liable to crop up. Also, because floating-point numbers are stored in this exponential format, they can store numbers that are both extremely large and extremely small. However, if the extremes of the mantissa are reached, floating-point overflow can occur.

In C, three floating-point types are defined: float, double, and long double. As with the integer types, the exact sizes of these varies by implementation. On most PCs and Macs, float is 32-bit, double is 64-bit, and long double is 80-bit.

Variable Names

In C, you have a lot of freedom in choosing variable names. Variable names can contain letters (both upper and lower case), numbers, and underscores (“_“). The only additional requirements being that it cannot start with a number and it cannot be a keyword. A keyword is a word that is reserved for a special purpose in the C language. Examples of keywords include the basic types (short, int, float, etc.) as well as control-flow statements such as if, while, and return.

When choosing variable names, it is important to chose names that are descriptive. It is tempting to use names like “x” or “fd” throughout your code. However, it is often hard to remember what these variables represent when you come back later. Even though it is more typing, it is usually better to use descriptive names such as “median” or “left_value”.