While our little “Hello World!” example was interesting, it doesn’t do much. The first step towards making more functional programs is variables. First, we’ll start with an example.
Consider the following little program:
#include <stdio.h>
int main(int argc, char **argv)
{
int number;
number = 5;
number = number + 7;
number = number / 2;
printf("number = %d\n", number);
return 0;
}
If you compile and run this program it should print “number = 6” to the terminal. Let’s look at it line by line.
int number;
This line is a variable declaration. All variables used in a function
have to be declared before they can be used. A variable is declared by a
type followed by the name of the new variable and then a semicolon. The
type int
specifies that this variable is to be an integer.
The new variable is named “number”.
Variables should be declared at the top of the function before any statements. Some compilers will allow you to declare variables anywhere in the function so long as you declare them before they are used. However, not all compilers support this, so you are better off simply declaring them at the top.
number = 5;
number = number + 7;
number = number / 2;
These three lines perform the following computations:
Notice the use of the word “assign” here. In C and many other
programming languages, variables are assigned and their values can
change over time. It is not like algebra in which a variable is a
place-holder that remains constant once you give it a value. In C, the
“=” symbol denotes a variable assignment: the variable on the left side
gets assigned the value of the expression on the right side. Another
thing to observe is that the variable being assigned can show up in the
expression on the right hand side. While
number = number + 7
makes no mathematical sense, it makes
perfect sense in programming. This is because “=” represents an
assignment and value of the variable changes after the expression on the
right is evaluated.
printf("number = %d\n", number);
return 0;
The last two statements are similar to our “Hello World!” example. The call to
printf
looks different because we are printing a number to
the screen. We will discuss more about how this statement works in our
section on input and output.
Before we go too much further, we need to discuss the concept of an
expression. In C, an expression is something that can
be evaluated, i.e. that has a value. For example, basic arithmetic
operations such as a + b - 7
are expressions. In fact, most
things in C are expressions. For instance, a variable assignment is an
expression that evaluates to the value assigned to the variable. Because
of this x = y = z = 5
will assign the value 5 to all three
of the variables x
, y
, and z
.
Just like variables, each expression has an associated type. For variables and constants, the type of the expression is just the type of the variable or constant. For basic arithmetic operations, the type of the expression is automatically derived from the types of the inputs to the operation. For function calls, the type of the expression is the return type of the function.
If you want a more formal definition of an expression, it is not hard to define recursively. An expression is any of the following:
void
return valueIn order to discuss data types, we need to know a little bit about how the computer stores data. While your program is running, all of the data (variables) are stored in Random Access Memory or RAM. The computer’s RAM is made up of billions of little switches that can be either on or off. Each of these switches is called a bit. Everything the computer stores, it has to store by flipping bits on and off in RAM.
The computer’s RAM is divided up into 8-bit blocks called bytes and each byte is given an address. This allows the computer to locate any single byte among the billions of bytes available. A variable then refers to a number of consecutive bytes in RAM in which the information is stored.
In order to store information, it has to be converted to a series of ons and offs or 1’s and 0’s. For example, we store positive integers in memory by simply writing them in base-2; so 23 would be written as 10111. The process of converting information to binary data is called encoding. In order for the computer to do anything useful with the binary data sitting in memory, it needs to know two things: how the data is encoded and how much space it consumes in memory. The purpose of data types is to provide these two pieces of information.
As you will see below, it can be very hard to know ahead of time what
the exact size of type might be. For this reason, C provides a special
symbol: sizeof
. You can use sizeof
to find out
how much space a given type consumes in memory. The expression
sizeof(<type>)
tells you how much space that type
consumes in bytes. The expression sizeof(<variable>)
tells you how much space that variable consumes. The size of a variable
and the size of its respective type are the same. The two methods of
using sizeof
are provided for convenience.
The most basic type in C is char
. The char
type is designed for holding a single alphanumeric character. The exact
size of char
has varied from one platform to another over
the years as different system designers have had different ideas how you
should define a “character”. At this point, basically all systems define
a char
to be 8 bits.
In order to encode characters, a standard called ASCII has historically been used. The ASCII encoding can handle the standard English characters as well as punctuation and a few control characters. In order to handle other languages such as Mandarin Chinese, a much larger set of characters is needed. This us usually handled with one of the unicode standards which are far too complicated to discuss here. Everything we will be doing in this course will be done in ASCII for the sake of simplicity.
Character constants in C are written using sigle-quotes
(e.g. c = 'a'
). This is different from strings which are
written using double-quotes.
At its core, char
is actually an integer type and you
perform all of the basic integer operations on it (adition,
multiplication, etc.). Also, the ASCII encodings of the English
characters are in alphabetical order. This can be very useful. For
instance, if you have some variable “c” of type char
, the
expression c + 1
will give you the next letter in the
alphabet. If you want to convert an upper-case character to lower-case,
you can do c - 'A' + 'a'
. You may also find yourself using
char
variables as just regular integers if you are
concerned about size (char
is the smallest integer
type).
We already looked briefly at the example of positive integers and saw
that we can store them by representing them in base-2. For the moment,
consider the case of the unsigned char
type which stores
unsigned integers in a single byte. The number 23 would be stored as
“00010111”. Notice that three zeros have been added so that it is
exactly one byte long. What about the number 255? That would be stored
as “11111111” (go ahead and check it). When we add 1 to 255 we get 256
which, written as binary, is 100000000. However, this presents a problem
since 100000000 is 9 bits long. In this case we get something called
integer overflow. Integer overflow is what happens when
the binary representation of a number is too large to fit in the
specified data type. When this happens, the computer simply throws away
the digits that won’t fit; if you try to store 256 in an unsigned char
you simply get 00000000 or 0. In general, an n-bit unsigned integer type
can store any number between 0 and 2n without overflow;
larger numbers get stored modulo 2n. When working with
integer types you always have to be careful to choose the right size for
your application.
Negative numbers are a little more complicated. In order to allow for negative numbers, something called two’s complement is usually used. In the two’s complement encoding, a number is negated by flipping all of the bits (1 to 0 and 0 to 1) and adding 1. If the data type has a size of n bits, this is equivalent to storing − x as 2n − x. The reason for two’s complement is that, when you factor in integer overflow, adding and subtracting numbers encoded in two’s complement is the same as adding and subtracting unsigned integers (an easy exercise in modular arithmetic).
In C there are 4 basic integer types: char
,
short
, int
, long
. Each of these
comes in both signed and unsigned
varieties. An integer data type is said to be signed if
it can store negative values and is said to be unsigned
otherwise. When two’s complement is used for signed data types, this
distinction makes very little difference most of the time. However, it
makes a huge difference in some cases. For example,
u < 0
will always evaluate to false if “u” has an
unsiged data type.
Each of the types short
, int
, and
long
are signed types. The unsigned versions are called
unsigned short
, unsigned int
, and
unsigned long
respectively. Depending on the system,
char
may be signed or unsigned. If you want to be specific,
you can use unsigned char
or signed char
.
Unfortunately, figuring out the size of an integer data type in C can
get a bit complicated. The char
type is usually eight bits
although it can be longer. This is because computers have evolved over
time and some have used a different number of bits to store a character.
The short
and int
types are required to be at
least twice the size of a char
with int
being
at least as big as short
. The long
type is
required to be at least four times the size of char
and at
least as large as int
. Fortunately, most computers these
days use an eight-bit byte, but you still have to watch out for
short
, int
, and long
.
Below is a table describing the basic integer data types in C. The data type sizes are what you can expect to find on an average PC or Mac.
Type | Size (bits) | [un]signed? | Range |
---|---|---|---|
char | 8 | depends on system | depends on system |
signed char | 8 | signed | − 128 to 127 |
unsigned char | 8 | unsigned | 0 to 256 |
short | 16 | signed | − 215 to 215 − 1 |
unsigned short | 16 | unsigned | 0 to 216 − 1 |
int | 32 | signed | − 231 to 231 − 1 |
unsigned int | 32 | unsigned | 0 to 232 |
long | 64 | signed | − 263 to 263 − 1 |
unsigned long | 64 | unsigned | 0 to 264 − 1 |
In the C99 standard, they also defined a long long
which
is guaranteed to be at least eight times the size of char
.
If you are concerned about size and want a variable to have a specific
size, the C standard library includes a header file called stdint.h that
supplies the programmer with some size-guaranteed data types.
In C, we also have a couple of floating-point types that allow us to store numbers with a fractional or decimal part. Floating-point types are encoded in something similar to scientific notation. The reason why it is called floating-point is because the location of the decimal point is not fixed. In a fixed-point encoding, you would fix a number n (such as 100) and store x * n as an integer. This guarantees the amount of precision but significantly limits the range of values you can store.
The exact details of how floating-point numbers are encoded depends on the processor, but most use the IEEE 754 floating-point specification. Before going into the details, think about how you could write decimal numbers in binary. The principal of decimal expansion works equally well in any base, so we could write 3.25 as 11.01 in binary. To encode a real number x in IEEE 754 floating-point, x is first written as x = m2E where E is an integer chosen so that |m| ∈ [1, 2) (zero is handled as a special case). The part denoted by E is called the exponent and the part denoted by m is called the mantissa. The number is then stored in terms of a sign bit, a few bits for the exponent, and then the mantissa. As a slight optimization, since the mantissa is always in the range [1, 2), the integer part of the binary expansion is always 1, so it is simply thrown away.
Along with storing numbers, floating-point types have a couple of additional special values: infinity and NaN (Not a Number). These values usually occur when you have a division by zero or similar. Exactly when they occur is determined by the implementation, but you should be aware of them. Whenever you are doing mathematical computations, NaN and infinity are liable to crop up. Also, because floating-point numbers are stored in this exponential format, they can store numbers that are both extremely large and extremely small. However, if the extremes of the mantissa are reached, floating-point overflow can occur.
In C, three floating-point types are defined: float
,
double
, and long double
. As with the integer
types, the exact sizes of these varies by implementation. On most PCs
and Macs, float
is 32-bit, double
is 64-bit,
and long double
is 80-bit.
In C, you have a lot of freedom in choosing variable names. Variable
names can contain letters (both upper and lower case), numbers, and
underscores (“_“). The only additional requirements being that it cannot
start with a number and it cannot be a keyword. A
keyword is a word that is reserved for a special purpose in the C
language. Examples of keywords include the basic types
(short
, int
, float
, etc.) as well
as control-flow statements such as if
, while
,
and return
.
When choosing variable names, it is important to chose names that are descriptive. It is tempting to use names like “x” or “fd” throughout your code. However, it is often hard to remember what these variables represent when you come back later. Even though it is more typing, it is usually better to use descriptive names such as “median” or “left_value”.