======Deconstructing K&RC======
When I was a kid I read this awesome book called "The C Programming
Language" by the language's creators, Brian Kernighan and Dennis
Ritchie. This book taught me and many people of my generation, and a
generation before, how to write C code. You talk to anyone, whether
they know C or not, and they'll say, "You can't beat K&RC. It's the
best C book." It is an established piece of programmer lore that is not
soon to die.
I myself believed that until I started writing this book. You see, K&RC
is actually riddled with bugs and bad style. Its age is no excuse.
These were bugs when they wrote the first printing, and the 42nd
printing. I hadn't actually realized just how bad most of the code was
in this book and recommended it to many people. After reading through
it for just an hour I decided that it needs to be taken down from its
pedestal and relegated to history rather than vaunted as state of the
art.
I believe it is time to lay this book to rest, but I want to use it as
an exercise for you in finding hacks, attacks, defects, and bugs by
going through K&RC to break all the code. That's right, you are going
to destroy this sacred cow for me, and you're going to have no problem
doing it. When you are done doing this, you will have a finely honed
eye for defect. You will also have an informed opinion of the book's
actual quality, and will be able to make your own decisions on how to
use the knowledge it contains.
In this chapter we will use all the knowledge you've gained from this
book, and spend it reviewing the code in K&RC. What we will do is take
many pieces of code from the book, find all the bugs in it, and write a
unit test that exercises the bugs. We'll then run this test under
Valgrind to get statistics and data, and then we'll fix the bugs with a
redesign.
This will obviously be a long chapter so I'm going to only do a handful
of these and then I'm going have you do the rest. I'll provide a guide
that is each page, with the code on it, and hints to the bugs that it
has. Your job is to then tear that piece of code apart and try to think
like an attacker trying to break the code.
Note
As you read this, if you feel that I am being disrespectful to the
authors, then that's not my intent. I respect the authors more than
anything you know and owe them a debt of gratitude for writing their
book. My criticisms here are both for educational purposes of teaching
people modern C code, and to destroy the belief in their work as a item
of worship that cannot be questioned.
However, if when you read this you have feelings of me insulting you
then just stop reading. You will gain nothing from this chapter but
personal grief because you've attached your identity to K&RC and my
criticisms will only be taken personally.
======An Overall Critique Of Correctness======
The primary problem K&RC has is its view of "correctness" comes from
the first system it was used on: Unix. In the world of Unix software
programs have a particular set of properties:
* Programs are started and then exit, making resource allocation
easier.
* Most functions are only called by other parts of the same program
in set ways.
* The inputs to the program are limited to "expert" restricted users.
In the context of this 1970's computing style, K&RC is actually
correct. As long as only trusted people run complete cohesive programs
that exit and clean up all their resources then their code is fine.
Where K&RC runs into problems is when the functions or code snippets
are taken out of the book and used in other programs. Once you take
many of these code snippets and try use them in some other program they
fall apart. They then have blatant buffer overflows, bugs, and problems
that a beginner will trip over.
Another problem is that software these days doesn't exit right away,
but instead it stays running for long periods of time because they're
servers, desktop applications and mobile applications. The old style of
"leaving the cleanup to the OS" doesn't work in the modern world the
way it did back in the day.
The final problem though is that no software lives in a vacuum anymore.
Software is now frequently attacked by people over network connections
in an attempt to gain special privilege or simple street cred. The idea
that "nobody will ever do that" is dead, and actually that's probably
the first thing somebody will do.
The best way to summarize the problem of K&RC "correctness" is with an
example from English. Imagine if you have the pair of sentences, "Jack
and Jill went up the hill. He fell down." Well, from context clues you
know that "He" means Jack. However, if you have that sentence on its
own it's not clear who "He" is. Now, if you put that sentence at the
end of another sentence you can get an unclear pronoun reference: "Jack
and Frank went up the hill. He fell down." Which "He" are we talking
about in that sentence?
This is how the code in K&RC works. As long as that code is not used in
other programs without serious analysis of the entire software then it
works. The second you take many of the functions out and try to use
them in other systems they fall apart. And, what's the point of a book
full of code you can't actually use in your own programs?
======A First Demonstration Defect======
The following copy function is found in the very first chapter and is
an example of copying two strings. Here's a new source file to
demonstrate the defects in this function.
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#define MAXLINE 10 // in the book this is 1000
void copy(char to[], char from[])
{
int i;
i = 0;
while((to[i] = from[i]) != '\0')
++i;
}
int main(int argc, char *argv[])
{
int i;
// use heap memory as many modern systems do
char *line = malloc(MAXLINE);
char *longest = malloc(MAXLINE);
assert(line != NULL && longest != NULL && "memory error");
// initialize it but make a classic "off by one" error
for(i = 0; i < MAXLINE; i++) {
line[i] = 'a';
}
// cause the defect
copy(longest, line);
free(line);
free(longest);
return 0;
}
In the above example, I'm doing something that is fairly common:
switching from using stack allocation to heap allocation with malloc.
What happens is, typically malloc returns memory from the heap, and so
the bytes after it are not initialized. Then you see me use a loop to
accidentally initialize it wrong. This is a common defect, and one of
the reasons we avoided classic style C strings in this book. You could
also have this bug in programs that read from files, sockets, or other
external resources. It is a very common bug, probably the most common
in the world.
Before the switch to heap memory, this program probably ran just fine
because the stack allocated memory will probably have a '\0' character
at the end on accident. In fact, it would appear to run fine almost
always since it just runs and exits quickly.
What's the effect of running this new program with copy used wrong?
$ make 1.9-1
cc 1.9-1.c -o 1.9-1
$ ./1.9-1
$
$ valgrind ./1.9-1
==2162== Memcheck, a memory error detector
==2162== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al.
==2162== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright
info
==2162== Command: ./1.9-1
==2162==
==2162== Invalid read of size 1
==2162== at 0x4005C0: copy (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== by 0x400651: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== Address 0x51b104a is 0 bytes after a block of size 10 alloc'd
==2162== at 0x4C2815C: malloc (vg_replace_malloc.c:236)
==2162== by 0x4005E6: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162==
==2162== Invalid write of size 1
==2162== at 0x4005C3: copy (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== by 0x400651: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== Address 0x51b109a is 0 bytes after a block of size 10 alloc'd
==2162== at 0x4C2815C: malloc (vg_replace_malloc.c:236)
==2162== by 0x4005F4: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162==
==2162== Invalid read of size 1
==2162== at 0x4005C5: copy (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== by 0x400651: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162== Address 0x51b109a is 0 bytes after a block of size 10 alloc'd
==2162== at 0x4C2815C: malloc (vg_replace_malloc.c:236)
==2162== by 0x4005F4: main (in /home/zedshaw/projects/books/learn-c-the-hard-
way/code/krc/1.9-1)
==2162==
==2162==
==2162== HEAP SUMMARY:
==2162== in use at exit: 0 bytes in 0 blocks
==2162== total heap usage: 2 allocs, 2 frees, 20 bytes allocated
==2162==
==2162== All heap blocks were freed -- no leaks are possible
==2162==
==2162== For counts of detected and suppressed errors, rerun with: -v
==2162== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 4 from 4)
$
As you've already learned, Valgrind will show you all of your sins in
full color. In this case, a perfectly harmless seeming program has a
ton of "Invalid read of size 1". If you kept running it you'd find
other errors pop up at random.
Now, in the context of the entire program in the original K&RC example,
this function will work correctly. However, the second this function is
called with longest and line uninitialized, initialized wrong, without
a trailing '\0' character, then you'll hit difficult to debug errors.
This is the failing of the book. While the code works in the book, it
does not work in many other situations leading to difficult to spot
defects, and those are the worst kind of defects for a beginner (or
expert). Instead of code that only works in this delicate balance, we
will strive to create code that has a higher probability of working in
any situation.
======Why copy() Fails======
Many people have looked at this copy function and thought that it is
not defective. They claim that, as long as it's used correctly, it is
correct. One person even went so far as to say, "It's not defective,
it's just unsafe." Odd, since I'm sure this person wouldn't get into a
car if the manufacturer said, "Our car is not defective, it's just
unsafe."
However, there is a way to formally prove that this function is
defective by enumerating the possible inputs and then seeing if any of
them cause the while loop to never terminate.
What we'll do is have two strings, A and B, and figure out what copy()
does with them:
* A = {'a','b','\0'}; B = {'a', 'b', '\0'}; copy(A,B);
* A = {'a','b'}; B = {'a', 'b', '\0'}; copy(A,B);
* A = {'a','b','\0'}; B = {'a', 'b'}; copy(A,B);
* A = {'a','b'}; B = {'a', 'b'}; copy(A,B);
This is all the basic permutations of strings that can be passed to the
function based on whether they are terminated with a '\0' or not. To be
complete I'm covering all possible permutations, even if they seem
irrelevant. You may think there's no need to include permutations on A,
but as you'll see in the analysis, not including A fails to find buffer
overflows that are possible.
We can then go through each case and determine if the while loop in
copy() terminates:
* while-loop finds '\0' in B, copy fits in A, terminates.
* while-loop finds '\0' in B, overflows A, terminates.
* while-loop does not find '\0' in B, overflows A, does not
terminate.
* while-loop does not find '\0' in B, overflows A, does not
terminate.
This provides a formal proof that the function is defective because
there are possible inputs that causes the while-loop to run forever or
overflow the target. If you were to try and use this function safely,
you would need to follow all paths to its usage, and confirm that the
data is correct along every path. That gives every path to this
function a 50% to 75% chance it will fail with just the inputs above.
You could find some more permutations of failure but these are the most
basic ones.
Let's now compare this to a copy function that knows the lengths of all
the inputs to see what it's probability of failure is:
* A = {'a','b','\0'}; B = {'a', 'b', '\0'}; safercopy(2, A, 2, B);
* A = {'a','b'}; B = {'a', 'b', '\0'}; safercopy(2, A, 2, B);
* A = {'a','b','\0'}; B = {'a', 'b'}; safercopy(2, A, 2, B);
* A = {'a','b'}; B = {'a', 'b'}; safercopy(2, A, 2, B);
Also assume that the safercopy() function uses a for-loop that does not
test for a '\0' only, but instead uses the given lengths to determine
the amount to copy. With that we can then do the same analysis:
* for-loop processes 2 characters of A, terminates.
* for-loop processes 2 characters of A, terminates.
* for-loop processes 2 characters of A, terminates.
* for-loop processes 2 characters of A, terminates.
In every case the for-loop variant with string length given as
arguments will terminate no matter what. To really test the for-loop
variant we'd need to add some permutations for differing lengths of
strings A and B, but in every case the for-loop will always stop
because it will only go through a fixed previously known finite number
of characters.
That means the for-loop will never loop forever, and as long as it
handles all the possible differing lengths of A and B, never overflow
either side. The only way to break safercopy() is to lie about the
lengths of the strings, but even then it will still always terminate.
The worst possible scenario for the safercopy() function is that you
are given an erroneous length for one of the strings and that string
does not have a '\0' properly, so the function buffer overflows.
This shows exactly why the copy() function is defective, because it
does not terminate cleanly for most possible inputs, and is only
reliable for one of the conditions: B terminated and A the right size.
It also shows why a for-loop variant with a given fixed length for each
input is superior.
Finally, the significance of this is that I've effectively done a
formal proof (well, mostly formal) that shows what you should be doing
to analyze code. Each function has to stand on its own and not have any
defects such as while-loops that do not terminate. In the above
discussion I've shown that the original K&RC is defective, and fatally
so since there is no way to fix it given the inputs. There's no way
from just a pointer to ask if a string is properly formed since the
only way to test that is to scan it, and scanning it runs into this
same problem.
======But, That's Not A C String======
Some folks then defend this function (despite the proof above) by
claiming that the strings in the proof aren't C strings. They want to
apply an artful dodge that says "the function is not defective because
you aren't giving it the right inputs", but I'm saying the function is
defective because most of the possible inputs cause it to crash the
software.
The problem with this mindset is there's no way to confirm that a C
string is valid. Imagine you wanted to write a little
assert_good_string function that checks if a C string is correctly
terminated before using it. This function needs to go to the end of the
string and see if there's a '\0' terminator. How does it do this? This
function would also have to scan the target function to confirm that it
ended in '\0', which means it has the same problem as copy() because
the input may not be terminated.
This may seem silly, but people actually do this with strlen(). They
take an input and think that they just have to run strlen() on the
input to confirm that it's the right length, but strlen() itself has
the same fatal flaw because it has to scan and if the string isn't
terminated it will also overflow.
This means any attempt to fix the problem using just C strings also has
this problem. The only way to solve it is to include the length of
every string and use that to scan it.
If you can't validate a C string in your function, then your only
choice is to do full code reviews manually. This introduces human error
and no matter what you do the error will happen.
======Just Don't Do That======
Another argument in favor of this copy() function is when the
proponents of K&RC state that you are "just supposed to not use bad
strings". Despite the mountains of empirical evidence that this is
impossible in C code, they are basically correct and that's what I'm
teaching in this exercise. But, instead of saying "just don't do that
by checking all possible inputs", I'm advocating "just don't do that by
not using this kind of function". I'll explain further.
In order to confirm that all inputs to this function are valid I have
to go through a code review process that involves this:
* Find all the places the copy() function is called.
* Trace backwards from that call point to where the inputs are
created.
* Confirm that the data is created correctly.
* Follow the path from the creation point of the data to where it's
used and confirm that no line of code alters the data.
* Repeat this for all paths and all branches, including all loops and
if-statements involving the data.
In my experience this is only possible in small programs like the
little ones that K&RC has. In real software the number of possible
branches you'd need to check is much too high for most people to
validate, especially in a team environment where individuals have
varying degrees of capability. A way to quantify this difficulty is
that each branch in the code leading to a function like copy() has a
50-70% chance of causing the defect.
However, if you can use a different function and avoid all of these
checks then doesn't that mean the copy() function is defective by
comparison? These people are right, the solution is to "just not do
that" by just not using the copy() function. You can change the
function to one that includes the sizes of the two strings and the
problem is solved. If that's the case then the people who think "just
don't do that" have just proved that the function is defective, because
the simpler way to "not do that" is to use a better function.
If you think copy() is valid as long as you avoid the errors I outline,
and if safercopy() avoids the errors, then safercopy() is superior and
copy() is defective by comparison.
======Stylistic Issues======
A more minor critique of the book is that the style is not only old,
but just error prone and annoyingly "clever". Take the code you just
saw again and look at the while-loop in copy. There's no reason to
write this loop this way, as the compiler can just as easily work with
a for-loop and without the clever triple-equality trick. The original
code also has a while-loop without braces, but an if-statement with
braces, which leads to even more confusion:
/* bad use of while loop with compound if-statement */
while ((len = getline(line, MAXLINE)) > 0)
if (len > max) {
max = len;
copy(longest, line);
}
if (max > 0) /* there was a line */
printf("%s", longest);
This code is incredibly error prone because you can't easily tell where
the pair of if-statements and the while-loop are paired. A quick glance
makes it seem like this while-loop will loop both if-statements, but it
doesn't. In modern C code you would instead just use braces all the
time and avoid the confusion completely.
While the book could be forgiven for this because of its age, it has
been republished in this form 42 times, and it was updated for the ANSI
standard. At some point in its history you'd think the authors or some
publisher ghostwriter could have been bothered to update the book's
style. However, this is the problem with sacred cows. Once they become
idols of worship people are reluctant to question them or modify them.
In the rest of this chapter though we will be modernizing the code in
K&RC to fit the style you've been learning throughout this book. It
will be more verbose, but it will be clearer and less error prone
because of this slight increase in verbosity.
======Chapter 1 Examples======
Now we begin...
Copyright (C) 2010 Zed. A. Shaw
Credits