In order to be able to develop in a GNU or other UNIX environment, you need to become familiar with the commandline. The commandline or shell is the heart and soul of a UNIX system. Even when you click an icon to run a graphical program, clicking the icon executes a command which starts the program; the only difference is that you don’t have to type.
Here I have only tried to provide a brief description of how a command line works. For more details on how to use the command line to get around in your system check out Learning the Shell.
First, we should introduce some terminology.
A terminal is a program or device that allows you to interact with the computer. In the old days this frequently refered to the physical keyboard and monitor or even to a type-writer. In more recent times this is usually a graphical program with a single window in which you can type.
An executable is a file that contains binary code that the computer can run. On windows, an executable has a “.exe” extension, on UNIX systems (including MacOS), an executable has no extension by default.
A shell is a program that talks to the terminal and makes sense out of what you type. It interprets the commands and runs the appropreate executable based on what you type.
A command is a line you type into the shell to tell it to what to do. A command may actually result in running multiple executables in some cases.
Throughout the material in this course you will see blocks like this:
$ gcc -o my_program src1.c src2.c
These blocks represent a command to be executed in the shell. The begining “$" character and space are not actually part of the command, they are simply there to indicate that it's a command that you would type into the shell. The reason that "$” is chosen is because many shells use “$” or “%” to indicate that it is ready for another command. This also serves to differentiate a command from the output it generates. For instance, running the hello world program would look like this:
$ ./hello
Hello World!
where the “Hello World!” is actually the output of the program.
Every command in a UNIX shell is made of a executable name followed by a series of arguments. For instance, the command to compile a program might look something like this:
$ gcc -o hello hello.c
The executable name is “gcc” and the arguments are “hello.c”, “-o”, and “hello” in that order. There is no actual specification that says how the arguments are specified except that they are separated by spaces. If you want to have an argument that contains spaces, you can put it in quotes:
$ cd "My Documents"
There is no formal specification for exactly how arguments are formed, but there are some conventions. In general, there are four types of arguments an executable may take (more explanation below):
A flag is an argument that simply turns on or off some boolean option. One common example is “-v” which many programs use to enable extra “verbose” output.
A named parameter is one that specifies some value that’s more complicated than just a boolean. In the example above “-o” followed by “hello” told gcc to set the output filename to “hello”.
An unnamed parameter is one that isn’t
associated with any other argument. Frequently the unnamed parameters
are simply appended to the end of the command but they can sometimes
also show up in the middle. Unnamed parameters are frequently used to
specify things like input files. For instance, if you wanted to print
multiple documents with “lpr” you could type
lpr file1.pdf file2.pdf file3.pdf
.
Sometimes, but less frequently, an executable will also take a command parameter that tells it what to do. This is usually the first argument after the executable name. While this doesn’t come up as often as the other types of arguments, you should still be aware of it.
Since there is no formal specification concerning how arguments have to be parsed, it varies from program to program. I have tried to provide some general concepts here but you have to look at the programs documentation in order to really know how to use it. Fortunately, most programs will accept one of the “-h”, “-help”, or “–help” flags and dump out a brief description of how they work. For a more detailed guide (frequently with examples) you can type “man program-name” on UNIX or simply google for “program-name man page”.
Flags are specified in a number of different ways. They almost always start with one or two hyphen characters such as “-o” or “–option”. Many programs follow the convention of one hyphen before a single-letter flag and two hyphens before longer flags. However, many do not so you shouldn’t assume that long flags have two hyphens.
Sometimes programs will also allow you to specify a bunch of one-letter flags at a time. With the “tar” command, the two commands below are the same:
$ tar xvjpf archive.tar.bz2
$ tar -x -v -j -p -f archive.tar.bz2
If you’d rather specify the flags explicitly, that’s ok. I’m mostly pointing this out because you will come across it in the real world.
Named parameters have probably the greatest variety in how they are specified. Named parameters usually look just like flags only they have an associated parameter. By “look just like flags”, I mean that they start with one or two hyphens followed by the parameter name and then the parameter’s value. There are several ways of specifying the parameter’s value; here are the three most common:
“-p text” or “–param text” where the first argument looks like a flag and specifies what parameter is being set and the second argument is the value to set.
“-p=text” or “–param=text” where an “=” character is used to specify that “param” is being set to “text”.
“-n5” where “-n” specifies the parameter to set and “5” is the value. This is really only ever used for one-letter parameter names.
It’s worth noting that named parameters frequently look very much like flags. To know which a particular argument represents you need to know what arguments the program takes and remember that arguments are almost always parsed left-to-right. For instance, while “-v” is a valid flag to gcc, the following command will create an executable named “-v” and will not provide verbose output:
$ gcc -o -v hello.c
Unnamed parameters are usually the extra arguments that are not flags and aren’t associated with any name. One frequent use of these is input files. Some programs require them to be at the end after all of the other arguments, others don’t care. For instance, gcc takes every argument that isn’t a flag or a named parameter and assumes that it is an input file. The following command would attempt to build “hello” out of “hello.c”, “world.c”, and “other-file.c” using verbose output.
$ gcc hello.c -o hello world.c -v other-file.c
Sometimes a command will have several “subcommands” that it can run. The “git” command that we will be using later has a variety of sub-commands. Running “git add”, “git commit”, or “git log” all do different things. It is as if “git” is multiple commands, each with its own parameters, packaged into one. This is frequently used if you have a bunch of tools that you want to package together. Usually this “subcommand” will be the first argument after the executable name itself.
In the command below, “hello.c” is a parameter that specifies a source code file while “-o” is an option specifying that the following argument, “hello”, is the output file. Usually in this case, you would think of “-o” and “hello” going together and maybe even say “-o hello”.
$ gcc hello.c -o hello
To tell gcc to print out some help information to the terminal, you can run it with just the “–help” option. With most programs, calling them with the “-h” or “–help” option will make it print out some information about what arguments it accepts.
$ gcc --help
Here is another command for compiling “hello.c” into “hello”. The “-o hello” does the same thing as before. However, we have two other arguments we haven’t seen. The “-v” argument tells gcc to use “verbose” mode where it will dump out extra information as it compiles. The “-std=ansi” tells gcc to use the ANSI C standard.
$ gcc -std=ansi -o hello -v hello.c
Another concept with which you may not be very familiar if you work on Windows or MacOS is that of a file path. While I’m sure you have seen them, you may never have noticed.
Most file systems are organized as a tree (or forest in the case of Windows). The file system is organized by directories (or folders) that can contain files as well as other directories. On UNIX systems such as MacOS or Linux, there is a root directory called “/” that contains everything on the system. Windows, on the other hand, separates things out into drives each of which is specified by a letter. The letters “A” and “B” are reserved for floppy disks while “C” is the primary hard drive, and everything else is arbitrary.
A file path is just a list of file or directory names separated by
slashes that specifies where in the tree a given file or directory is
located. The file or directory is located by reading the path
left-to-right. For instance, you might have a home directory
/home/bob
with a file called stuff.txt
and the
full path to this file would be /home/bob/stuff.txt
. This
provides a convenient way to specify the exact location in the file
system of any file.
Associated with this is a little more terminology:
A path is called absolute if it is specified
from the root of the tree. On UNIX systems, the root of the file system
tree is the “/” folder so a /home/bob/stuff.txt
is
absolute. On Windows, absolute paths start with a drive specification
such as C:
; an example would be
C:/windows/system32
.
A path is called relative if it is not absolute. In this case, the meaning is different depending on the directory from which the relative path is interpreted.
Relative and absolute paths each have their place. Most of the time
we use relative paths because they are frequently shorter. There are two
special directories: .
and ..
that refer to
the current and parent directories respectively. For instance, if the
current directory is /home/bob
, then ..
refers
to /home
, ../alice
refers to
/home/alice
, and ./
refers to
/home/bob
. In this way, you can refer to any file on the
file system by a relative path.
Earlier in our discussion of the command line we kept using simple one-word commands and I claimed that these referred to executables. How does the system know where to find them? Your shell has a variable called “PATH” that stores a list of directories to search for commands. If the executable name in the command is a single word with no slashes, then the shell will go look in the directories listed in PATH to try and find it. This way you don’t have to know where every executable in the system is stored. On the other hand, if the executable name looks like a full path, then the shell will simply try to execute the file with the specified path.
It is worth noting that if you want to call something in the current
directory, you have to use ./file
not file
.
This is because file
does not look like a path, so the
shell with search the system path and try to find it. In order to make
the shell execute exactly what you want, you have to use
./file
.