Best practices for organizing code

Unfortunately, this topic is impossible to cover thoroughly. That is because there is no right way to write a particular program and every rule has an exception somewhere. The biggest thing to understand about writing good code is that programming is very much an art. There are some guidelines, but most of it is a matter of aesthetic. Therefore, what I provide below are not rules, but rather some guidelines to help guide you in trying to code well.

Why should you care?

Probably the first reason is why should you bother trying to code well. There are a lot of reasons for this. First, well written code is much easier to read. While you may say to yourself, “I can read my code just fine.” If, after not looking at it for 2 months, you have to read back through your code in order to understand it again, then it could probably be written better. Also, it’s quite possible that someone else such as your adviser or a colleague will want to read your code and they should be able to do so easily.

Second, well-written code is less error-prone. There are a lot of things going on inside your program when it is running. Keeping track of all these things and making sure you didn’t forget anything can be difficult. Pointers and dynamic memory are classic examples of where it is really easy to make mistakes. You, as the programmer, are less likely to forget edge cases and make mistakes if your code is well-organized and easy to read.

Third, well-written code is easier to debug. You will make errors; every programmer does. However, the better organized and easier to read your code is, the easier it is to find the bugs. It is tempting to not bother to write well and just try to get it done as quickly as possible. However, you will spend a lot more time trying to track down bugs than you will gain by writing sloppy code. Writing bad code rarely pays off in the long run.

Writing modular code

Probably the most important thing to remember about good code is to try and break it into reasonable pieces that each do one thing. When trying to prove a result, you don’t just write one big long proof with piles of details. Instead, you break the result into smaller theorems and lemmas each with its own proof. Sometimes a lemma will be fairly closely tied to one or two theorems, but usually things you label as theorems have a clear purpose and provide an independently useful result. This helps a lot in allowing the reader to quickly get a good idea of the big picture of what your program is trying to do.

Writing good code is a lot like that. Your program should be split into functions, each of which have a different well-defined task. As an example, look at the array-based list example at the end of the previous page. At first, it may look like I’m waisting a lot of time writing those init and destroy functions. However, this has a number of benefits over doing it manually. The first is that it makes it much more obvious to the reader what is happening. If you simply see the program setting fields of our structure or freeing pointers in a higher-level function, it may not be obvious what is being accomplished. Calling data_array_list_init makes it obvious that the data structure is being initialized to its default values.

Also, while creating a struct data_aray_list variable is simple, it is a multi-step process. This means that you are likely to forget something when creating one. Also, if you ever change the structure and add a field, you have to change it every single place it’s used and are likely forget one. On the other hand, if you never actually touch the data structure directly and always use these helper functions, the data structure and associated functions can be completely rewritten without having to make a single change to the rest of the code.

Another reason is that it avoids unneeded copying and pasting. If your code does the same thing in two or three different places, you should probably make that thing its own function. Again, think about how you would use lemmas using lemmas when writing a paper.

Later, we will even learn how to break a program into multiple code files. This allows you to make things even more modular and sometimes re-use the same thing between multiple programming projects.

Comments and variable/function names

Another important way to keep your code readable is to use good comments and variable names. All programming languages provide you some way of leaving comments. Comments are simply lines or chunks of the code that the compiler ignores entirely. This allows you to leave notes for yourself or some future reader to help them know what is happening. For instance, you might put a comment at the top of a function saying what the function does and what the arguments and return value are. If you write these comments a certain way, there are even programs that will turn them into some nice web-based documentation for you.

The other use for comments is to make in-line notes as to what is happening in your code. This can help the reader better understand what your program is doing. However, comments are not a valid substitute for good code! As a rule, if a function needs a lot of comments in order to explain how it works, the function probably needs to be re-written. There are times when you need to do something very complex and there’s just no better way to do it. However, this does not happen often, and you should not assume that you’re code is the exception. Good code should not need many comments.

Another thing to keep in mind for readability is to use good variable and function names. The C language allow you to give variables basically any name you want and you should take advantage of that. For example, force is far more descriptive than f and size is better than s, or worse: n. It is easy to start complaining that your variable names are too long and start abbreviating. You need to be careful with this because you may forget what that abbreviation means. That being said, variable name length is a balance. If every variable name is 10 or 20 characters long, you’re code is going to get very verbose and that can also make it hard to read.

A good guideline for this is that the length of a variable or function name should be proportional to its life. If a variable is simply serving as a loop counter for a single loop, go ahead and call it i or j. A local variable that gets declared at the top of a function and only used for the duration of the function might have a name consisting of one or two words (abbreviated if the words are long). When choosing a name for something with global scope such as a structure, a function, or a global variable (we haven’t talked about those yet), the name should be fully qualified and very descriptive. By fully qualified, I mean that it should sufficiently detailed that you would never give anything the same name. For example: integrate, while descriptive, is a poor choice for a function name because there are several different methods for numeric integration.

Unfortunately, this balance gets even less obvious when dealing with mathematics. If you are explicitly implementing an algorithm from a paper, it might actually be easier to read if you try to closely follow the naming scheme in the paper. However, following a math paper is one of the few exceptions and when you’re not in the middle of that algorithm, you should probably try and have better names.

Memory ownership

In previous page, I talked quite a bit about several different dynamic memory pitfalls. Many of these can be avoided by simply keeping good track of your dynamically allocated memory. Whenever you are dealing with dynamic memory, you need to make sure you have a clear and well-documented concept of what part of the program owns that memory and is responsible for freeing it. In our data_array_list example, the block of memory pointed to by the data field was owned by the structure. The structure allocated memory as needed and when the structure was destroyed, it freed the memory.

This is particularly important when you have functions passing pointers to and from each other. For example, a function usually shouldn’t free a block of memory passed in as an argument unless it is very well documented that it does so. There are a couple reasons for this. First, the function that calls that function might want to use it again and may not know you deleted it; this can lead to a double-free. Second, that pointer may point to something on the stack or to the middle of an array; in either of those cases, freeing it is an error. Also, you have to be very explicit about who owns memory passed out of a function as a return value. Many times when a function returns a pointer, the calling function is responsible for freeing it, but this is not always the case.

Along with tracking ownership, you should try and have as few copies of a given pointer lying around as possible. Whoever owns the block of memory should have the “primary” pointer to it and everything else should get a copy of the pointer from the owner only when it needs it. By restricting how many pointers exist to a given block of memory, you can prevent segmentation faults and other errors that result in something trying to access it after it has been freed. Another thing you can do to help keep track of things is to set the “primary” pointer to null after the memory has been freed. This clearly indicates that the block of memory is no longer valid, so you can avoid using it.

How not to optimize your code

“Premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

It can be very tempting to sacrifice readability for the sake of efficiency. Writing code that runs fast is important, especially when doing heavy calculations. However, you should never be too eager to optimize your code particularly when doing so would sacrifice the quality and readability of your code. There are a number of reasons for this:

  1. Frequently, well-organized code will run faster than poorly organized code even if it doesn’t look that way.

  2. Unless your optimizations yield huge benefits, you may spend more time debugging your over-complicated code than you will actually spend waiting for it to run.

  3. Your compiler already has a code optimizer built in, and it’s probably better at it than you are.

  4. You probably don’t actually know what parts of your code are slowing it down. You may be optimizing in the wrong place.

When you’re just developing your program, you’re going to want to turn off optimizations and turn on debugging information. Doing so makes it much easier to find the bugs. However, when it comes time to let your thesis code run for a week, you can turn on compiler optimizations (the -O flag in GCC), and the compiler will restructure things to make it faster for you. Many times the optimizations you would make yourself, the compiler already does and you simply don’t know about it. Also, a good compiler may know things about the system it is building for that you don’t.

All that being said, If your code is running more slowly than you want, there are a number of things you can do. First, is to make sure you are using the appropriate data structures and algorithms. We will talk more about these towards the end of the class. Frequently, the biggest effect you have on the speed of your code is the data structures and algorithms you select. Another thing you can do is to use a profiler. A profiler is a tool that watches your program as it runs and records where it spends most of its time. This way you can optimize the parts of your code that are taking a significant amount of time and not waste time optimizing some bit that only actually consume a few milliseconds of running time. However, profiling should not be done until after most of the code is written and you can get a decent idea of what parts take the most time.