Last week I finished reading Kernighan & Plauger‘s beautiful book The Elements of Programming Style, the classic that pioneered the term programming style. I’ve excerpted below some rules of style from that book. I hope these get you excited to reading the book too!
How the Book Works
Here’s how the book works: The authors pick up a simple “real-world” program (mostly from other programming texts) and comment critically on its style. For analyzing the style, they look at the expressions and statements (at the lowest level), program and control structures, I/O-handling, program efficiency and documentation. In addition to suggesting what’s wrong with the code, the authors fix it, sometimes rewriting the whole thing, to produce a simpler, cleaner, sometimes more efficient, and always a more obviously correct program.
The experience is that of watching two master programmers doing a code-review!
You may read the text in one run. Or, you may challenge yourself and use it as a “problem-book” by studying the programs and analyzing them for style and fixing them before reading what the masters have to comment.
In all, it is a very pragmatic book — full of useful, practical advice.
A word of caution is due: The program fragments are in Fortran and PL/I. And while they use only the basic features of the language, it is still somewhat of a quest to figure out the meaning of the longer Fortran programs infested with GOTOs and arithmetic-IFs. PL/I is much easier to read.
Reaffirm Your Own Beliefs of Good Style
There’s probably not much new stuff you’ll find in this book. Most things the authors say are perhaps ones you already knew, believed, and agreed with (if only subconsciously). And yet it is a delight to read the whole thing if only because it reaffirms those beliefs. This is close to how I felt when reading The Pragmatic Programmer — that book hardly had anything new, and yet it was a sheer pleasure to read.
Programming is far from being a exact science — perhaps it is an art (or engineering?). As we, programmers, work day after day doing what we do, we build many notions (or guiding principles) about our craft. From reading books to blogging, from studying other people’s code to writing our own to fixing bugs, from discussions with colleagues — the entire software development experience helps us build various concepts of what constitutes good practices of programming. And once in a while, it is reassuring to have all these notions validated by the masters — expert programmers, like Kernighan and Plauger, who we can safely trust to know about the art.
Reading this book will give such a reaffirmation to your own principles of programming. And if (alas!) you had been harboring some truly deviant ideas about programming, reading it would hopefully help set them right.
What has Changed in These Three Decades?
It’s been more than three decades since the second edition of the book came out in 1978. So it is natural to wonder what, if anything, has changed about how we program and what we consider good programs. To what extent have our programming languages and programming principles advanced through the years? Is this book still relevant?
To be honest, I wasn’t even born until many years after the book came out, so I’m naturally not qualified to comment. But studying how the programs in the text, perhaps typical of their age, were written, I couldn’t help but notice the following.
There has been definite improvement in the following respects:
- Structured programming and structured control statements (like if/then/else, for, while, break, continue, yield) have made some of the points (mainly about goto, but also some others) in this book less relevant.
- Support for recursion appears to be univerally available in all “modern” languages I can think of. (To contrast, the original Fortran didn’t allow recursion.)
- Python/Haskell-style layout has solved many problems related to block-indentation.
- Functional languages like Haskell have made certain class of problems, like forgetting initialization, close to impossible. (And while functional languages definitely existed many decades ago, I think they are much more widely known — I may be wrong in this.)
- Profilers and other instrumentation tools for measuring timing performance and “hotspots” are available in plenty, though perhaps not used enough.
- Module/Package/Namespace systems are available in many mainstream languages.
But we still have to routinely solve the same old problems and we still make the same old mistakes when solving them: What is the best way to design/structure correct and reliable programs? How to best validate input? How to correctly program with floating point numbers? … The book would perhaps provide some guidance with respect to these questions.
The Rest of the Post…
I’ve tried to collect some of the timeless wisdom of Kernighan and Plauger’s words as quotations in the remainder of the post.
Each chapter in their book is sprinkled with several terse lines summarizing the essence of the discussion. I’ve also collected some of those towards the end. If you enjoy and appreciate these, you would definitely want to read the whole book. As far as programming books go, this is quite thin (the size of K&R), so you have a good chance of actually finishing it!
On Obscure Code
The introductory chapter gives an example of how a Fortran program used the expression (I/J)*(J/I) to initialize V(I,J) to an identity matrix! Notice that with integer arithmetic, assume i and j are nonzero, (I/J)*(J/I) is same as I == J ? 1 : 0.
The author’s point out why such code is wrong:
The problem with obscure code is that debugging and modification become much more difficult, and these are already the hardest aspects of computer programming. Besides, there is the added danger that a too-clever program may not say what you thought it said. (Page 2)
On Healthy Skepticism
The first chapter ends with an advice on healthy skepticism:
Nevertheless, mistakes can occur. We encourage you to view with suspicion anything we say that looks peculiar. Test it, try it out. Don’t treat computer output as gospel. If you learn to be wary of everyone else’s programs, you will be better able to check your own. (Page 7)
This is reminiscent of Feynman’s words “You should, in science, believe logic and arguments, carefully drawn, and not authorities. … I am not sure how I did it, but I goofed. And you goofed, too, for believing me.” (Page x, Feynman Lectures in Physics; Vol I).
On Clever Programming
The chapter on expressions has the famous words on clever programming that are often quoted:
Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it? (Page 10)
Simplicity and clarity trump stray microseconds:
Simplicity and clarity are often of more value than the microseconds possibly saved by clever coding… Trivia rarely affect efficiency. Are all the machinations worth it, when their primary effect is to make the code less readable? (Page 127)
On Temporary Variables
Here’s some advice on the demerits of arbitrary temporary variables:
The fewer temporary variables in a program, the less chance there is that one will not be properly initialized, or that one will be altered unexpectedly before it is used. “Temporary” is a dirty word in programming — it suggests that a variable can be used with less thought than a “normal” (permanent?) one, and it encourages the use of one variable for several unrelated calculations. Both are dangerous practices. (Page 11)
The Telephone Test
The authors present a peculiar test, to which Elevator Test bears some resemblance, to assess code readability:
A useful way to decide if some piece of code is clear or not is the “telephone test.” If someone could understand your code when read aloud over the telephone, it’s clear enough. If not, then it needs rewriting. Use the “telephone test” for readability. (Page 21)
On the Shape of Programs
The text of the program should be close to the process it evokes:
It is a good rule of thumb that a program should read from top to bottom in the order that it will be executed; if this is not true, watch out for the bugs that often accompany poor structure. (Page 37)
On Program Structuring
Write short functions/classes with well defined purpose:
When a program is not broken up into small enough pieces, the larger modules often fail to deliver on these promises. They try to do too much, or too many different things, and hence are difficult to maintain and are too specialized for general use. (Page 59) … Combining too many functions in one module is a sure way to limit its usefulness, while at the same time making it more complex and harder to maintain. (Page 64)
On Premature Optimization
“Optimizing” too early in the life of a program can kill its chances for growth. (Page 61)
On Loosely Coupled Modules
It must be possible to describe the function performed by a module in the briefest of terms and it is necessary to minimize whatever relationships exist with other modules, and display those that remain as explicitly as possible. This is how we obtain the minimum “coupling”, and hence maximum independence, between modules. (Page 62)
As we have said several times, the hard part of programming is controlling complexity — keeping the pieces decoupled so they can be dealt with separately instead of all at once. And the need to separate into pieces is not some academically interesting point, but a practical necessity, to keep things from interacting with each other in unexpected ways. (Page 95)
One good test of the worth of a module, in fact, is how good a job it does of hiding some aspect of the problem from the rest of the code. (Page 65)
On Functions as Black-boxes
… break the job into five small functions, each one of which can be assimilated separately, then treated as a black box that does some part of the job. Once it works, we need no longer concern ourselves with how it does something, only with the fact that it does. We thus have some assurance that we can deal with the program a small section at a time without much concern for the rest of the code. There is no other way to retain control of a large program. (Page 77)
On Top-down design
One of the better ways of [planning program structure] is what is often called “top-down design.” In a top-down design, we start with a very general pseudo-code statement of the program … and then elaborate this statement in stages, filling in details until we ultimately reach executable code. Not only does this help to keep the structure fairly well organized, and avoid getting bogged down in coding too early, but it also means that we can back up and alter bad decisions without losing too much investment. (Page 71)
Learning to think recursively takes some effort, but it is repaid with smaller and simpler programs. Not every problem benefits from a recursive approach, but those that deal with data that is recursively defined often lead to very complicated programs unless the code is also recursive. (Page 77)
I/O Programming — Never Trust Any Data & Remember the User
Input/output is the interface between a program and its environment. Two rules govern all I/O programming: NEVER TRUST ANY DATA, and REMEMBER THE USER. This requires that a program be as foolproof as is reasonably possible, so that it behaves intelligently even when used incorrectly, and that it be easy to use correctly. Ask yourself: Will it defend itself against the stupidity and ignorance of its users (including myself)? Would I want to have to use it myself? (Page 97)
Some compilers allow a check during execution that subscripts do not exceed array dimensions. This is a help … many programmers do not use such compilers because “They’re not efficient.” (Presumably this means that it is vital to get the wrong answers quickly.) (Page 85)
On Bug Infestation
Where there are two bugs, there is likely to be a third. (Page 102)
Floating Point Numbers Are Like Sandpiles
Floating point arithmetic adds a new spectrum of errors, all based on the fact that the machine can represent numbers only to a finite precision. (Page 115)
As a wise programmer once said, “Floating point numbers are like sandpiles: every time you move one, you lose a little sand and you pick up a little dirt.” And after a few computations, things can get pretty dirty. (Page 117)
Concerns of efficiency must strike a balance with those of overall cost.
Machines have become increasingly cheap compared to people; any discussion of computer efficiency that fails to take this into account is shortsighted. “Efficiency” involves the reduction of overall cost — not just machine time over the life of the program, but also time spent by the programmer and by the users of the program.
A clean design is more easily modified as requirements change or as more is learned about what parts of the code consume significant amounts of execution time. A “clever” design that fails to work or to run fast enough can often be salvaged only at great cost. Efficiency does not have to be sacrificed in the interest of writing readable code — rather, writing readable code is often the only way to ensure efficient programs that are also easy to maintain and modify.
To begin, let us state the obvious. If a program doesn’t work, it doesn’t matter how fast it runs. (Page 123)
Algorithmic Improvements versus Tuning
How can we really speed it up? Fundamental improvements in performance are most often made by algorithm changes, not by tuning … There are two lessons. First, time spent selecting a good algorithm is certain to pay larger dividends than time spent polishing an implementation of a poor method. Second, for any given algorithm, polishing is not likely to significantly improve a fundamentally sound, clean implementation. It may even make things worse. (Page 133–134)
Profile and measure your code before making performance improvements.
Beware of preconceptions about where a program spends its time. This avoids the error of looking in the wrong place for improvements. Of course, you have to have some working idea of which part of a program has the most effect on overall speed, but changes designed to improve efficiency should be based on solid measurement, not intuition.
A useful and cheap way to measure how a program spends its time is to count how many times each statement is executed. The resulting set of counts is called the program’s “profile” (a term first used by D. E. Knuth in an article in Software Practice and Experience, April, 1971). Some enlightened computer centers make available a “profiler” … (Page 136)
The sole truth about a program is its text.
The only reliable documentation of a computer program is the code itself. The reason is simple — whenever there are multiple representations of a program, the chance for discrepancy exists. If the code is in error, artistic flowcharts and detailed comments are to no avail. Only by reading the code can the programmer know for sure what the program does. (Page 141)
On What Documentation Should Comprise
In a project of any size it is vital to maintain readable descriptions of what each program is supposed to do, how it is used, how it interacts with other parts of the system, and on what principles it is based. These form useful guides to the code. What is not useful is a narrative description of what a given routine actually does on a line-by-line basis. Anything that contributes no new information, but merely echoes the code, is superfluous. (Page 141)
On Following the Rules
The book ends with the following paragraph on following the rules of programming style:
To paraphrase an observation in The Elements of Style, rules of programming style, like those of English, are sometimes broken, even by the best writers. When a rule is broken, however, you will usually find in the program some compensating merit, attained at the cost of the violation. Unless you are certain of doing as well, you will probably do best to follow the rules. (Page 159)
A Treasure Trove of Pithy Rules
You would surely have heard of programming maxims like “make it right before you make it faster” or “don’t comment bad code — rewrite it”. Well, this book is generously sprinkled with such short witty one-lines capturing the essence of the section. Below are some of those words of wisdom.
- From the Introduction: Write clearly – don’t be too clever.
- On Expressions: Say what you mean, simply and directly. Use library functions. Avoid temporary variables. Trying to outsmart a compiler defeats much of the purpose of using one. Write clearly – don’t sacrifice clarity for “efficiency”. Let the machine do the dirty work. Replace repetitive expressions by calls to a common function. Parenthesize to avoid ambiguity. Choose variable names that won’t be confused. Use the good features of a language; avoid the bad ones.
- On Control Structures: Use DO-END and indenting to delimit groups of statements. Use IF-ELSE to emphasize that only one of two actions is to be performed. Use DO and DO-WHILE to emphasize the presence of loops. Make your programs read from top to bottom. Use IF ... ELSE IF ... ELSE IF ... ELSE ... to implement multi-way branches. Use the fundamental control flow constructs. Write first in an easy-to-understand pseudo-language; then translate into whatever language you have to use. Avoid THEN-IF and null-ELSE. Avoid ELSE GOTO and ELSE RETURN. Follow each decision as closely as possible with its associated action. Use data arrays to avoid repetitive control sequences. Choose a data representation that makes the program simple. Don’t stop with your first draft.
- On Program Structures: Modularize. Use subroutines. Make the coupling between modules visible. Each module should do one thing well. Make sure every module hides something. Let the data structure the program. Don’t patch bad code – rewrite it. Write and test a big program in small pieces. Use recursive procedures for recursively-defined data structures.
- On Input/Output: Test input for validity and plausibility. Make sure input cannot violate the limits of the program. Terminate input by end-of-file or marker, not by count. Identify bad input; recover if possible. Treat end-of-file conditions in a uniform manner. Make input easy to prepare and output self-explanatory. Use uniform input formats. Make input easy to proofread. Use free-form input when possible. Use self-identifying input. Allow defaults. Echo both on output. Localize input and output in subroutines.
- On Common Blunders: Make sure all variables are initialized before use. Don’t stop at one bug. Use debugging compilers. Watch out for off-by-one errors. Take care to branch the right way on equality. Avoid multiple exits from loops. Make sure your code “does nothing” gracefully. Test programs at their boundary values. Program defensively. 10.0 times 0.1 is hardly ever 1.0. Don’t compare floating point numbers just for equality.
- On Efficiency and Instrumentation: Make it right before you make it faster. Keep it right when you make it faster. Make it clear before you make it faster. Don’t sacrifice clarity for small gains in “efficiency.”Let your compiler do the simple optimizations. Don’t strain to re-use code; reorganize instead. Make sure special cases are truly special. Keep it simple to make it faster. Don’t diddle code to make it faster — find a better algorithm. Instrument your programs. Measure before making “efficiency” changes.
- On Documentation: Make sure comments and code agree. Don’t just echo the code with comments — make every comment count. Don’t comment bad code — rewrite it. Use variable names that mean something. Format a program to help the reader understand it. Indent to show the logical structure of a program. Document your data layouts. Don’t over-comment.