Purchase  Copyright © 2002 Paul Sheer. Click here for copying permissions.  Home 

next up previous contents
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents

Subsections

22. Trivial Introduction to C

 C was invented for the purpose of writing an operating system that could be recompiled (ported) to different hardware platforms (different CPUs). Because the operating system is written in C, this language is the first choice for writing any kind of application that has to communicate efficiently with the operating system.

Many people who don't program very well in C think of C as an arbitrary language out of many. This point should be made at once: C is the fundamental basis of all computing in the world today. UNIX, Microsoft Windows, office suites, web browsers and device drivers are all written in C. Ninety-nine percent of your time spent at a computer is probably spent using an application written in C. About 70% of all ``open source'' software is written in C, and the remaining 30% written in languages whose compilers or interpreters are written in C. [C++ is also quite popular. It is, however, not as fundamental to computing, although it is more suitable in many situations.]

Further, there is no replacement for C. Since it fulfills its purpose almost flawlessly, there will never be a need to replace it. Other languages may fulfill other purposes, but C fulfills its purpose most adequately. For instance, all future operating systems will probably be written in C for a long time to come.

It is for these reasons that your knowledge of UNIX will never be complete until you can program in C. On the other hand, just because you can program in C does not mean that you should. Good C programming is a fine art which many veteran C programmers never manage to master, even after many years. It is essential to join a Free software project to properly master an effective style of C development.

22.1 C Fundamentals

We start with a simple C program and then add fundamental elements to it. Before going too far, you may wish to review bash functions in Section 7.7.

22.1.1 The simplest C program

A simple C program is:

 
 
 
 
5 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    printf ("Hello World!\n");
    return 3;
}

Save this program in a file hello.c. We will now compile the program. [Compiling is the process of turning C code into assembler instructions. Assembler instructions are the program code that your 80?86/SPARC/RS6000 CPU understands directly. The resulting binary executable is fast because it is executed natively by your processor--it is the very chip that you see on your motherboard that does fetch Hello byte for byte from memory and executes each instruction. This is what is meant by million instructions per second (MIPS). The megahertz of the machine quoted by hardware vendors is very roughly the number of MIPS. Interpreted languages (like shell scripts) are much slower because the code itself is written in something not understandable to the CPU. The /bin/bash program has to interpret the shell program. /bin/bash itself is written in C, but the overhead of interpretation makes scripting languages many orders of magnitude slower than compiled languages. Shell scripts do not need to be compiled.]Run the command

 
gcc -Wall -o hello hello.c

The -o hello option tells gcc [GNU C Compiler. cc on other UNIX systems.] to produce the binary file hello instead of the default binary file named a.out. [Called a.out for historical reasons.]The -Wall option means to report all Warnings during the compilation. This is not strictly necessary but is most helpful for correcting possible errors in your programs. More compiler options are discussed on page [*].

Then, run the program with

 
./hello

Previously you should have familiarized yourself with bash functions. In C all code is inside a function. The first function to be called (by the operating system) is the main function.

Type echo $? to see the return code of the program. You will see it is 3, the return value of the main function.

Other things to note are the " on either side of the string to be printed. Quotes are required around string literals. Inside a string literal, the \n escape sequence indicates a newline character. ascii(7) shows some other escape sequences. You can also see a proliferation of ; everywhere in a C program. Every statement in C is terminated by a ; unlike statements in shell scripts where a ; is optional.

Now try:

 
 
 
 
5 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    printf ("number %d, number %d\n", 1 + 2, 10);
    exit (3);
}

printf can be thought of as the command to send output to the terminal. It is also what is known as a standard C library function. In other words, it is specified that a C implementation should always have the printf function and that it should behave in a certain way.

The %d specifies that a decimal should go in at that point in the text. The number to be substituted will be the first argument to the printf function after the string literal--that is, the 1 + 2. The next %d is substituted with the second argument--that is, the 10. The %d is known as a format specifier. It essentially converts an integer number into a decimal representation. See printf(3) for more details.

22.1.2 Variables and types

With bash, you could use a variable anywhere, anytime, and the variable would just be blank if it had never been assigned a value. In C, however, you have to explicitly tell the compiler what variables you are going to need before each block of code. You do this with a variable declaration:

 
 
 
 
5 
 
 
 
 
10 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
    int y;
    x = 10;
    y = 2:
    printf ("number %d, number %d\n", 1 + y, x);
    exit (3);
}

The int x is a variable declaration. It tells the program to reserve space for one integer variable that it will later refer to as x. int is the type of the variable. x = 10 assigned a value of 10 to the variable. There are types for each kind of number you would like to work with, and format specifiers to convert them for printing:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    char a;
    short b;
    int c;
    long d;
    float e;
    double f;
    long double g;
    a = 'A';
    b = 10;
    c = 10000000;
    d = 10000000;
    e = 3.14159;
    f = 10e300;
    g = 10e300;
    printf ("%c, %hd, %d, %ld, %f, %f, %Lf\n", a, b, c, d, e, f, g);
    exit (3);
}

You will notice that %f is used for both floats and doubles. The reason is that a float is always converted to a double before an operation like this. Also try replacing %f with %e to print in exponential notation--that is, less significant digits.

22.1.3 Functions

Functions are implemented as follows:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
void mutiply_and_print (int x, int y)
{
    printf ("%d * %d = %d\n", x, y, x * y);
}
 
int main (int argc, char *argv[])
{
    mutiply_and_print (30, 5);
    mutiply_and_print (12, 3);
    exit (3);
}

Here we have a non-main function called by the main function. The function is first declared with

 
void mutiply_and_print (int x, int y)

This declaration states the return value of the function ( void for no return value), the function name ( mutiply_and_print), and then the arguments that are going to be passed to the function. The numbers passed to the function are given their own names, x and y, and are converted to the type of x and y before being passed to the function--in this case, int and int. The actual C code that comprises the function goes between curly braces { and }.

In other words, the above function is equivalent to:

 
 
 
 
5 
 
 
 
void mutiply_and_print ()
{
    int x;
    int y;
    x = <first-number-passed>
    y = <second-number-passed>
    printf ("%d * %d = %d\n", x, y, x * y);
}

22.1.4 for, while, if, and switch statements

As with shell scripting, we have the for, while, and if statements:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
25 
 
 
 
 
30 
 
 
 
 
35 
 
 
 
 
40 
 
 
 
 
45 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
 
    x = 10;
 
    if (x == 10) {
        printf ("x is exactly 10\n");
        x++;
    } else if (x == 20) {
        printf ("x is equal to 20\n");
    } else {
        printf ("No, x is not equal to 10 or 20\n");
    }
 
    if (x > 10) {
        printf ("Yes, x is more than 10\n");
    }
 
    while (x > 0) {
        printf ("x is %d\n", x);
        x = x - 1;
    }
 
    for (x = 0; x < 10; x++) {
        printf ("x is %d\n", x);
    }
 
    switch (x) {
        case 9:
            printf ("x is nine\n");
            break;
        case 10:
            printf ("x is ten\n");
            break;
        case 11:
            printf ("x is eleven\n");
            break;
        default:
            printf ("x is huh?\n");
            break;
    }
 
    return 0;
}

It is easy to see the format that these statements take, although they are vastly different from shell scripts. C code works in statement blocks between curly braces, in the same way that shell scripts have do's and done's.

Note that with most programming languages when we want to add 1 to a variable we have to write, say, x = x + 1. In C, the abbreviation x++ is used, meaning to increment a variable by 1.

The for loop takes three statements between ( ... ): a statement to start things off, a comparison, and a statement to be executed on each completion of the statement block. The statement block after the for is repeatedly executed until the comparison is untrue.

The switch statement is like case in shell scripts. switch considers the argument inside its ( ... ) and decides which case line to jump to. In this example it will obviously be printf ("x is ten\n"); because x was 10 when the previous for loop exited. The break tokens mean that we are through with the switch statement and that execution should continue from Line 46.

Note that in C the comparison == is used instead of =. The symbol = means to assign a value to a variable, whereas == is an equality operator.

22.1.5 Strings, arrays, and memory allocation

You can define a list of numbers with:

 
int y[10];

This list is called an array:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
    int y[10];
    for (x = 0; x < 10; x++) {
        y[x] = x * 2;
    }
    for (x = 0; x < 10; x++) {
        printf ("item %d is %d\n", x, y[x]);
    }
    return 0;
}

If an array is of type character, then it is called a string:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
    char y[11];
    for (x = 0; x < 10; x++) {
        y[x] = 65 + x * 2;
    }
    for (x = 0; x < 10; x++) {
        printf ("item %d is %d\n", x, y[x]);
    }
    y[10] = 0;
    printf ("string is %s\n", y);
    return 0;
}

Note that a string has to be null-terminated. This means that the last character must be a zero. The code y[10] = 0 sets the 11th item in the array to zero. This also means that strings need to be one char longer than you would think.

(Note that the first item in the array is y[0], not y[1], as with some other programming languages.)

In the preceding example, the line char y[11] reserved 11 bytes for the string. But what if you want a string of 100,000 bytes? C allows you to request memory from the kernel. This is called allocate memory. Any non-trivial program will allocate memory for itself and there is no other way of getting large blocks of memory for your program to use. Try:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
    char *y;
    y = malloc (11);
    printf ("%ld\n", y);
    for (x = 0; x < 10; x++) {
        y[x] = 65 + x * 2;
    }
    y[10] = 0;
    printf ("string is %s\n", y);
    free (y);
    return 0;
}

The declaration char *y means to declare a variable (a number) called y that points to a memory location. The * (asterisk) in this context means pointer. For example, if you have a machine with perhaps 256 megabytes of RAM + swap, then y potentially has a range of this much. The numerical value of y is also printed with printf ("%ld\n", y);, but is of no interest to the programmer.

When you have finished using memory you must give it back to the operating system by using free. Programs that don't free all the memory they allocate are said to leak memory.

Allocating memory often requires you to perform a calculation to determine the amount of memory required. In the above case we are allocating the space of 11 chars. Since each char is really a single byte, this presents no problem. But what if we were allocating 11 ints? An int on a PC is 32 bits--four bytes. To determine the size of a type, we use the sizeof keyword:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int a;
    int b;
    int c;
    int d;
    int e;
    int f;
    int g;
    a = sizeof (char);
    b = sizeof (short);
    c = sizeof (int);
    d = sizeof (long);
    e = sizeof (float);
    f = sizeof (double);
    g = sizeof (long double);
    printf ("%d, %d, %d, %d, %d, %d, %d\n", a, b, c, d, e, f, g);
    return 0;
}

Here you can see the number of bytes required by all of these types. Now we can easily allocate arrays of things other than char.

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    int x;
    int *y;
    y = malloc (10 * sizeof (int));
    printf ("%ld\n", y);
    for (x = 0; x < 10; x++) {
        y[x] = 65 + x * 2;
    }
    for (x = 0; x < 10; x++) {
        printf ("%d\n", y[x]);
    }
    free (y);
    return 0;
}

On many machines an int is four bytes (32 bits), but you should never assume this. Always use the sizeof keyword to allocate memory.

22.1.6 String operations

 C programs probably do more string manipulation than anything else. Here is a program that divides a sentence into words:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
25 
 
 
 
 
30 
 
 
 
 
35 
 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
 
int main (int argc, char *argv[])
{
    int length_of_word;
    int i;
    int length_of_sentence;
    char p[256];
    char *q;
 
    strcpy (p, "hello there, my name is fred.");
 
    length_of_sentence = strlen (p);
 
    length_of_word = 0;
 
    for (i = 0; i <= length_of_sentence; i++) {
        if (p[i] == ' ' || i == length_of_sentence) {
            q = malloc (length_of_word + 1);
            if (q == 0) {
                perror ("malloc failed");
                abort ();
            }
            strncpy (q, p + i - length_of_word, length_of_word);
            q[length_of_word] = 0;
            printf ("word: %s\n", q);
            free (q);
            length_of_word = 0;
        } else {
            length_of_word = length_of_word + 1;
        }
    }
    return 0;
}

Here we introduce three more standard C library functions. strcpy stands for string co py. It copies bytes from one place to another sequentially, until it reaches a zero byte (i.e., the end of string). Line 13 of this program copies text into the character array p, which is called the target of the copy.

strlen stands for string length. It determines the length of a string, which is just a count of the number of characters up to the null character.

We need to loop over the length of the sentence. The variable i indicates the current position in the sentence.

Line 20 says that if we find a character 32 (denoted by ' '), we know we have reached a word boundary. We also know that the end of the sentence is a word boundary even though there may not be a space there. The token || means OR. At this point we can allocate memory for the current word and copy the word into that memory. The strncpy function is useful for this. It copies a string, but only up to a limit of length_of_word characters (the last argument). Like strcpy, the first argument is the target, and the second argument is the place to copy from.

To calculate the position of the start of the last word, we use p + i - length_of_word. This means that we are adding i to the memory location p and then going back length_of_word counts thereby pointing strncpy to the exact position.

Finally, we null-terminate the string on Line 27. We can then print q, free the used memory, and begin with the next word.

For a complete list of string operations, see string(3).

22.1.7 File operations

Under most programming languages, file operations involve three steps: opening a file, reading or writing to the file, and then closing the file. You use the command fopen to tell the operating system that you are ready to begin working with a file:

The following program opens a file and spits it out on the terminal:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
 
int main (int argc, char *argv[])
{
    int c;
    FILE *f;
 
    f = fopen ("mytest.c", "r");
    if (f == 0) {
        perror ("fopen");
        return 1;
    }
    for (;;) {
        c = fgetc (f);
        if (c == -1)
            break;
        printf ("%c", c);
    }
    fclose (f);
    return 0;
}

A new type is presented here: FILE *. It is a file operations variable that must be initialized with fopen before it can be used. The fopen function takes two arguments: the first is the name of the file, and the second is a string explaining how we want to open the file--in this case "r" means reading from the start of the file. Other options are "w" for writing and several more described in fopen(3).

If the return value of fopen is zero, it means that fopen has failed. The perror function then prints a textual error message (for example, No such file or directory). It is essential to check the return value of all library calls in this way. These checks will constitute about one third of your C program.

The command fgetc gets a character from the file. It retrieves consecutive bytes from the file until it reaches the end of the file, when it returns a -1. The break statement says to immediately terminate the for loop, whereupon execution will continue from line 21. break statements can appear inside while loops as well.

You will notice that the for statement is empty. This is allowable C code and means to loop forever.

Some other file functions are fread, fwrite, fputc, fprintf, and fseek. See fwrite(3), fputc(3), fprintf(3), and fseek(3).

22.1.8 Reading command-line arguments inside C programs

Up until now, you are probably wondering what the (int argc, char *argv[]) are for. These are the command-line arguments passed to the program by the shell. argc is the total number of command-line arguments, and argv is an array of strings of each argument. Printing them out is easy:

 
 
 
 
5 
 
 
 
 
10 
 
 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
 
int main (int argc, char *argv[])
{
    int i;
    for (i = 0; i < argc; i++) {
        printf ("argument %d is %s\n", i, argv[i]);
    }
    return 0;
}

22.1.9 A more complicated example

Here we put this altogether in a program that reads in lots of files and dumps them as words. Here are some new notations you will encounter: != is the inverse of == and tests if not-equal-to; realloc reallocates memory--it resizes an old block of memory so that any bytes of the old block are preserved; \n, \t mean the newline character, 10, or the tab character, 9, respectively (see ascii(7)).

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
25 
 
 
 
 
30 
 
 
 
 
35 
 
 
 
 
40 
 
 
 
 
45 
 
 
 
 
50 
 
 
 
 
55 
 
 
 
 
60 
 
 
 
 
65 
 
 
 
 
70 
 
 
 
 
75 
 
 
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
 
void word_dump (char *filename)
{
    int length_of_word;
    int amount_allocated;
    char *q;
    FILE *f;
    int c;
 
    c = 0;
 
    f = fopen (filename, "r");
    if (f == 0) {
        perror ("fopen failed");
        exit (1);
    }
 
    length_of_word = 0;
 
    amount_allocated = 256;
    q = malloc (amount_allocated);
    if (q == 0) {
        perror ("malloc failed");
        abort ();
    }
 
    while (c != -1) {
        if (length_of_word >= amount_allocated) {
            amount_allocated = amount_allocated * 2;
            q = realloc (q, amount_allocated);
            if (q == 0) {
                perror ("realloc failed");
                abort ();
            }
        }
 
        c = fgetc (f);
        q[length_of_word] = c;
 
        if (c == -1 || c == ' ' || c == '\n' || c == '\t') {
            if (length_of_word > 0) {
                q[length_of_word] = 0;
                printf ("%s\n", q);
            }
            amount_allocated = 256;
            q = realloc (q, amount_allocated);
            if (q == 0) {
                perror ("realloc failed");
                abort ();
            }
            length_of_word = 0;
        } else {
            length_of_word = length_of_word + 1;
        }
    }
 
    fclose (f);
}
 
int main (int argc, char *argv[])
{
    int i;
 
    if (argc < 2) {
        printf ("Usage:\n\twordsplit <filename> ...\n");
        exit (1);
    }
 
    for (i = 1; i < argc; i++) {
        word_dump (argv[i]);
    }
 
    return 0;
}

This program is more complicated than you might immediately expect. Reading in a file where we are sure that a word will never exceed 30 characters is simple. But what if we have a file that contains some words that are 100,000 characters long? GNU programs are expected to behave correctly under these circumstances.

To cope with normal as well as extreme circumstances, we start off assuming that a word will never be more than 256 characters. If it appears that the word is growing over 256 characters, we reallocate the memory space to double its size (lines 32 amd 33). When we start with a new word, we can free up memory again, so we realloc back to 256 again (lines 48 and 49). In this way we are using the minimum amount of memory at each point in time.

We have hence created a program that can work efficiently with a 100-gigabyte file just as easily as with a 100-byte file. This is part of the art of C programming.

Experienced C programmers may actually scoff at the above listing because it really isn't as ``minimalistic'' as is absolutely possible. In fact, it is a truly excellent listing for the following reasons:

Readability in C is your first priority--it is imperative that what you do is obvious to anyone reading the code.

22.1.10 #include statements and prototypes

At the start of each program will be one or more #include statements. These tell the compiler to read in another C program. Now, ``raw'' C does not have a whole lot in the way of protecting against errors: for example, the strcpy function could just as well be used with one, three, or four arguments, and the C program would still compile. It would, however, wreak havoc with the internal memory and cause the program to crash. These other .h C programs are called header files. They contain templates for how functions are meant to be called. Every function you might like to use is contained in one or another template file. The templates are called function prototypes. [C++ has something called ``templates.'' This is a special C++ term having nothing to do with the discussion here.]

A function prototype is written the same as the function itself, but without the code. A function prototype for word_dump would simply be:

 
void word_dump (char *filename);

The trailing ; is essential and distinguishes a function prototype from a function.

After a function prototype is defined, any attempt to use the function in a way other than intended--say, passing it to few arguments or arguments of the wrong type--will be met with fierce opposition from gcc.

You will notice that the #include <string.h> appeared when we started using string operations. Recompiling these programs without the #include <string.h> line gives the warning message

 
mytest.c:21: warning: implicit declaration of function `strncpy'

which is quite to the point.

The function prototypes give a clear definition of how every function is to be used. Man pages will always first state the function prototype so that you are clear on what arguments are to be passed and what types they should have.

22.1.11 C comments

A C comment is denoted with /* <comment lines> */ and can span multiple lines. Anything between the /* and */ is ignored. Every function should be commented, and all nonobvious code should be commented. It is a good maxim that a program that needs lots of comments to explain it is badly written. Also, never comment the obvious, and explain why you do things rather that what you are doing. It is advisable not to make pretty graphics between each function, so rather:

 
 
 
 
/* returns -1 on error, takes a positive integer */
int sqr (int x)
{
    <...>

than

 
 
 
 
5 
 
 
 
 
/***************************----SQR----******************************
 *                x = argument to make the square of                *
 *    return value  =                                               *
 *                         -1 (on error)                            *
 *                         square of x (on success)                 *
 ********************************************************************/
int sqr (int x)
{
    <...>

which is liable to cause nausea. In C++, the additional comment // is allowed, whereby everything between the // and the end of the line is ignored. It is accepted under gcc, but should not be used unless you really are programming in C++. In addition, programmers often ``comment out'' lines by placing a #if 0 ... #endif around them, which really does exactly the same thing as a comment (see Section 22.1.12) but allows you to have comments within comments. For example

 
 
 
 
5 
 
 
    int x;
    x = 10;
#if 0
    printf ("debug: x is %d\n", x);     /* print debug information */
#endif
    y = x + 10;
    <...>

comments out Line 4.

22.1.12 #define and #if -- C macros

Anything starting with a # is not actually C, but a C preprocessor directive. A C program is first run through a preprocessor that removes all spurious junk, like comments, #include statements, and anything else beginning with a #. You can make C programs much more readable by defining macros instead of literal values. For instance,

 
#define START_BUFFER_SIZE 256

in our example program, #defines the text START_BUFFER_SIZE to be the text 256. Thereafter, wherever in the C program we have a START_BUFFER_SIZE, the text 256 will be seen by the compiler, and we can use START_BUFFER_SIZE instead. This is a much cleaner way of programming because, if, say, we would like to change the 256 to some other value, we only need to change it in one place. START_BUFFER_SIZE is also more meaningful than a number, making the program more readable.

Whenever you have a literal constant like 256, you should replace it with a macro defined near the top of your program.

You can also check for the existence of macros with the #ifdef and #ifndef directive. # directives are really a programming language all on their own:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
25 
 
 
 
/* Set START_BUFFER_SIZE to fine-tune performance before compiling: */
#define START_BUFFER_SIZE 256
/* #define START_BUFFER_SIZE 128 */
/* #define START_BUFFER_SIZE 1024 */
/* #define START_BUFFER_SIZE 16384 */
 
#ifndef START_BUFFER_SIZE
#error This code did not define START_BUFFER_SIZE. Please edit
#endif
 
#if START_BUFFER_SIZE <= 0
#error Wooow! START_BUFFER_SIZE must be greater than zero
#endif
 
#if START_BUFFER_SIZE < 16
#warning START_BUFFER_SIZE to small, program may be inefficient
#elif START_BUFFER_SIZE > 65536
#warning START_BUFFER_SIZE to large, program may be inefficient
#else
/* START_BUFFER_SIZE is ok, do not report */
#endif
 
void word_dump (char *filename)
{
    <...>
    amount_allocated = START_BUFFER_SIZE;
    q = malloc (amount_allocated);
    <...>

22.2 Debugging with gdb and strace

Programming errors, or bugs, can be found by inspecting program execution. Some developers claim that the need for such inspection implies a sloppy development process. Nonetheless it is instructive to learn C by actually watching a program work.

22.2.1 gdb

The GNU debugger, gdb, is a replacement for the standard UNIX debugger, db. To debug a program means to step through its execution line-by-line, in order to find programming errors as they happen. Use the command gcc -Wall -g -O0 -o wordsplit wordsplit.c to recompile your program above. The -g option enables debugging support in the resulting executable and the -O0 option disables compiler optimization (which sometimes causes confusing behavior). For the following example, create a test file readme.txt with some plain text inside it. You can then run gdb -q wordsplit. The standard gdb prompt will appear, which indicates the start of a debugging session:

 
(gdb) 

At the prompt, many one letter commands are available to control program execution. The first of these is run which executes the program as though it had been started from a regular shell:

 
 
 
 
5 
 
(gdb) r
Starting program: /homes/src/wordsplit/wordsplit 
Usage:
        wordsplit <filename> ...
 
Program exited with code 01.

Obviously, we will want to set some trial command-line arguments. This is done with the special command, set args:

 
(gdb) set args readme.txt readme2.txt

The break command is used like b [[<file>:]<line>|<function>], and sets a break point at a function or line number:

 
 
(gdb) b main
Breakpoint 1 at 0x8048796: file wordsplit.c, line 67.

A break point will interrupt execution of the program. In this case the program will stop when it enters the main function (i.e., right at the start). Now we can run the program again:

 
 
 
 
5 
 
(gdb) r
Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt
 
Breakpoint 1, main (argc=3, argv=0xbffff804) at wordsplit.c:67
67          if (argc < 2) {
(gdb) 

As specified, the program stops at the beginning of the main function at line 67.

If you are interested in viewing the contents of a variable, you can use the print command:

 
 
 
 
(gdb) p argc
$1 = 3
(gdb) p argv[1]
$2 = 0xbffff988 "readme.txt"

which tells us the value of argc and argv[1]. The list command displays the lines about the current line:

 
 
 
 
5 
 
 
 
 
(gdb) l
63      int main (int argc, char *argv[])
64      {
65          int i;
66      
67          if (argc < 2) {
68              printf ("Usage:\n\twordsplit <filename> ...\n");
69              exit (1);
70          }

The list command can also take an optional file and line number (or even a function name):

 
 
 
 
5 
 
 
 
 
(gdb) l wordsplit.c:1
1       #include <stdlib.h>
2       #include <stdio.h>
3       #include <string.h>
4       
5       void word_dump (char *filename)
6       {
7           int length_of_word;
8           int amount_allocated;

Next, we can try setting a break point at an arbitrary line and then using the continue command to proceed with program execution:

 
 
 
 
5 
 
 
 
(gdb) b wordsplit.c:48
Breakpoint 2 at 0x804873e: file wordsplit.c, line 48.
(gdb) c
Continuing.
Zaphod
 
Breakpoint 2, word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48
48                  amount_allocated = 256;

Execution obediently stops at line 48. At this point it is useful to run a back trace. This prints out the current stack which shows the functions that were called to get to the current line. This output allows you to trace the history of execution.

 
 
 
 
5 
 
 
(gdb) bt
#0  word_dump (filename=0xbffff988 "readme.txt") at wordsplit.c:48
#1  0x80487e0 in main (argc=3, argv=0xbffff814) at wordsplit.c:73
#2  0x4003db65 in __libc_start_main (main=0x8048790 <main>, argc=3, ubp_av=0xbf
fff814, init=0x8048420 <_init>, 
    fini=0x804883c <_fini>, rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff8
0c) at ../sysdeps/generic/libc-start.c:111

The clear command then deletes the break point at the current line:

 
 
(gdb) clear
Deleted breakpoint 2 

The most important commands for debugging are the next and step commands. The n command simply executes one line of C code:

 
 
 
 
5 
 
(gdb) n
49                  q = realloc (q, amount_allocated);
(gdb) n
50                  if (q == 0) {
(gdb) n
54                  length_of_word = 0;

This activity is called stepping through your program. The s command is identical to n except that it dives into functions instead of running them as single line. To see the difference, step over line 73 first with n, and then with s, as follows:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
25 
 
(gdb) set args readme.txt readme2.txt
(gdb) b main
Breakpoint 1 at 0x8048796: file wordsplit.c, line 67.
(gdb) r
Starting program: /home/src/wordsplit/wordsplit readme.txt readme2.txt
 
Breakpoint 1, main (argc=3, argv=0xbffff814) at wordsplit.c:67
67          if (argc < 2) {
(gdb) n
72          for (i = 1; i < argc; i++) {
(gdb) n
73              word_dump (argv[i]);
(gdb) n
Zaphod
has
two
heads
72          for (i = 1; i < argc; i++) {
(gdb) s
73              word_dump (argv[i]);
(gdb) s
word_dump (filename=0xbffff993 "readme2.txt") at wordsplit.c:13
13          c = 0;
(gdb) s
15          f = fopen (filename, "r");
(gdb) 

An interesting feature of gdb is its ability to attach onto running programs. Try the following sequence of commands:

 
 
 
 
5 
 
 
 
 
10 
[root@cericon]# lpd
[root@cericon]# ps awx | grep lpd
28157 ?        S      0:00 lpd Waiting
28160 pts/6    S      0:00 grep lpd
[root@cericon]# gdb -q /usr/sbin/lpd
(no debugging symbols found)...
(gdb) attach 28157
Attaching to program: /usr/sbin/lpd, Pid 28157
0x40178bfe in __select () from /lib/libc.so.6
(gdb) 

The lpd daemon was not compiled with debugging support, but the point is still made: you can halt and debug any running process on the system. Try running a bt for fun. Now release the process with

 
 
(gdb) detach
Detaching from program: /usr/sbin/lpd, Pid 28157

The debugger provides copious amounts of online help. The help command can be run to explain further. The gdb info pages also elaborate on an enormous number of display features and tracing features not covered here.

22.2.2 Examining core files

If your program has a segmentation violation (``segfault'') then a core file will be written to the current directory. This is known as a core dump. A core dump is caused by a bug in the program--its response to a SIGSEGV signal sent to the program because it tried to access an area of memory outside of its allowed range. These files can be examined using gdb to (usually) reveal where the problem occurred. Simply run gdb <executable> ./core and then type bt (or any gdb command) at the gdb prompt. Typing file ./core will reveal something like

 
/root/core: ELF 32-bit LSB core file of '<executable>' (signal 11), Intel 80386, version 1

22.2.3 strace

The strace command prints every system call performed by a program. A system call is a function call made by a C library function to the LINUX kernel. Try

 
 
strace ls
strace ./wordsplit

If a program has not been compiled with debugging support, the only way to inspect its execution may be with the strace command. In any case, the command can provide valuable information about where a program is failing and is useful for diagnosing errors.

22.3 C Libraries

We made reference to the Standard C library. The C language on its own does almost nothing; everything useful is an external function. External functions are grouped into libraries. The Standard C library is the file /lib/libc.so.6. To list all the C library functions, run:

 
 
nm /lib/libc.so.6
nm /lib/libc.so.6 | grep ' T ' | cut -f3 -d' ' | grep -v '^_' | sort -u | less

many of these have man pages, but some will have no documentation and require you to read the comments inside the header files (which are often most explanatory). It is better not to use functions unless you are sure that they are standard functions in the sense that they are common to other systems.

To create your own library is simple. Let's say we have two files that contain several functions that we would like to compile into a library. The files are simple_math_sqrt.c

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
static int abs_error (int a, int b)
{
    if (a > b)
        return a - b;
    return b - a;
}
 
int simple_math_isqrt (int x)
{
    int result;
    if (x < 0) {
        fprintf (stderr, 
         "simple_math_sqrt: taking the sqrt of a negative number\n");
        abort ();
    }
    result = 2;
    while (abs_error (result * result, x) > 1) {
        result = (x / result + result) / 2;
    }
    return result;
}

and simple_math_pow.c

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int simple_math_ipow (int x, int y)
{
    int result;
    if (x == 1 || y == 0)
        return 1;
    if (x == 0 && y < 0) {
        fprintf (stderr,
         "simple_math_pow: raising zero to a negative power\n");
        abort ();
    }
    if (y < 0)
        return 0;
    result = 1;
    while (y > 0) {
        result = result * x;
        y = y - 1;
    }
    return result;
}

We would like to call the library simple_math. It is good practice to name all the functions in the library simple_math_??????. The function abs_error is not going to be used outside of the file simple_math_sqrt.c and so we put the keyword static in front of it, meaning that it is a local function.

We can compile the code with:

 
 
gcc -Wall -c simple_math_sqrt.c
gcc -Wall -c simple_math_pow.c

The -c option means compile only. The code is not turned into an executable. The generated files are simple_math_sqrt.o and simple_math_pow.o. These are called object files.

We now need to archive these files into a library. We do this with the ar command (a predecessor of tar):

 
 
ar libsimple_math.a simple_math_sqrt.o simple_math_pow.o
ranlib libsimple_math.a

The ranlib command indexes the archive.

The library can now be used. Create a file mytest.c:

 
 
 
 
5 
 
 
 
 
#include <stdlib.h>
#include <stdio.h>
 
int main (int argc, char *argv[])
{
    printf ("%d\n", simple_math_ipow (4, 3));
    printf ("%d\n", simple_math_isqrt (50));
    return 0;
}

and run

 
 
gcc -Wall -c mytest.c
gcc -o mytest mytest.o -L. -lsimple_math

The first command compiles the file mytest.c into mytest.o, and the second function is called linking the program, which assimilates mytest.o and the libraries into a single executable. The option L. means to look in the current directory for any libraries (usually only /lib and /usr/lib are searched). The option -lsimple_math means to assimilate the library libsimple_math.a ( lib and .a are added automatically). This operation is called static [Nothing to do with the `` static'' keyword.] linking because it happens before the program is run and includes all object files into the executable.

As an aside, note that it is often the case that many static libraries are linked into the same program. Here order is important: the library with the least dependencies should come last, or you will get so-called symbol referencing errors.

We can also create a header file simple_math.h for using the library.

 
 
 
 
5 
/* calculates the integer square root, aborts on error */
int simple_math_isqrt (int x);
 
/* calculates the integer power, aborts on error */
int simple_math_ipow (int x, int y);

Add the line #include "simple_math.h" to the top of mytest.c:

 
 
 
#include <stdlib.h>
#include <stdio.h>
#include "simple_math.h"

This addition gets rid of the implicit declaration of function warning messages. Usually #include <simple_math.h> would be used, but here, this is a header file in the current directory--our own header file--and this is where we use "simple_math.h" instead of <simple_math.h>.

22.4 C Projects -- Makefiles

What if you make a small change to one of the files (as you are likely to do very often when developing)? You could script the process of compiling and linking, but the script would build everything, and not just the changed file. What we really need is a utility that only recompiles object files whose sources have changed: make is such a utility.

make is a program that looks inside a Makefile in the current directory then does a lot of compiling and linking. Makefiles contain lists of rules and dependencies describing how to build a program.

Inside a Makefile you need to state a list of what-depends-on-what dependencies that make can work through, as well as the shell commands needed to achieve each goal.

22.4.1 Completing our example Makefile

Our first (last?) dependency in the process of completing the compilation is that mytest depends on both the library, libsimple_math.a, and the object file, mytest.o. In make terms we create a Makefile line that looks like:

 
mytest:   libsimple_math.a mytest.o

meaning simply that the files libsimple_math.a mytest.o must exist and be updated before mytest. mytest: is called a make target. Beneath this line, we also need to state how to build mytest:

 
        gcc -Wall -o $@ mytest.o -L. -lsimple_math

The $@ means the name of the target itself, which is just substituted with mytest. Note that the space before the gcc is a tab character and not 8 space characters.

The next dependency is that libsimple_math.a depends on simple_math_sqrt.o simple_math_pow.o. Once again we have a dependency, along with a shell script to build the target. The full Makefile rule is:

 
 
 
 
libsimple_math.a: simple_math_sqrt.o simple_math_pow.o
        rm -f $@
        ar rc $@ simple_math_sqrt.o simple_math_pow.o
        ranlib $@

Note again that the left margin consists of a single tab character and not spaces.

The final dependency is that the files simple_math_sqrt.o and simple_math_pow.o depend on the files simple_math_sqrt.c and simple_math_pow.c. This requires two make target rules, but make has a short way of stating such a rule in the case of many C source files,

 
 
.c.o:
        gcc -Wall -c -o $*.o $<

which means that any .o files needed can be built from a .c file of a similar name by means of the command gcc -Wall -c -o $*.o $<, where $*.o means the name of the object file and $< means the name of the file that $*.o depends on, one at a time.

22.4.2 Putting it all together

Makefiles can, in fact, have their rules put in any order, so it's best to state the most obvious rules first for readability.

There is also a rule you should always state at the outset:

 
all:    libsimple_math.a mytest

The all: target is the rule that make tries to satisfy when make is run with no command-line arguments. This just means that libsimple_math.a and mytest are the last two files to be built, that is, they are the top-level dependencies.

Makefiles also have their own form of environment variables, like shell scripts. You can see that we have used the text simple_math in three of our rules. It makes sense to define a macro for this so that we can easily change to a different library name.

Our final Makefile is:

 
 
 
 
5 
 
 
 
 
10 
 
 
 
 
15 
 
 
 
 
20 
 
 
 
# Comments start with a # (hash) character like shell scripts.
# Makefile to build libsimple_math.a and mytest program.
# Paul Sheer <psheer@cranzgot.co.za> Sun Mar 19 15:56:08 2000
 
OBJS    = simple_math_sqrt.o simple_math_pow.o
LIBNAME = simple_math
CFLAGS  = -Wall
 
all:    lib$(LIBNAME).a mytest
 
mytest:   lib$(LIBNAME).a mytest.o
        gcc $(CFLAGS) -o $@ mytest.o -L. -l${LIBNAME}
 
lib$(LIBNAME).a: $(OBJS)
        rm -f $@
        ar rc $@ $(OBJS)
        ranlib $@
 
.c.o:
        gcc $(CFLAGS) -c -o $*.o $<
 
clean:
        rm -f *.o *.a mytest

We can now easily type

 
make

in the current directory to cause everything to be built.

You can see we have added an additional disconnected target clean:. Targets can be run explictly on the command-line like this:

 
make clean

which removes all built files.

Makefiles have far more uses than just building C programs. Anything that needs to be built from sources can employ a Makefile to make things easier.


next up previous contents
Next: 23. Shared Libraries Up: rute Previous: 21. System Services and   Contents