Thursday, 1 March 2012

Chapter 4: More about Declarations (and Initialization)

4.1 Arrays

So far, we’ve been declaring simple variables: the declaration
int i;
declares a single variable, named i, of type int. It is also possible to declare an array of several elements. The declaration
int a[10];
declares an array, named a, consisting of ten elements, each of type int. Simply speaking, an array is a variable that can hold more than one value. You specify which of the several values you’re referring to at any given time by using a numeric subscript. (Arrays in programming are similar to vectors or matrices in mathematics.) We can represent the array a above with a picture like this: 

In C, arrays are zero-based: the ten elements of a 10-element array are numbered from 0 to 9. The subscript which specifies a single element of an array is simply an integer expressionin square brackets. The first element of the array is a[0], the second element is a[1], etc. You can use these “array subscript expressions” anywhere you can use the name of a simple variable, for example:
a[0] = 10; a[1] = 20; a[2] = a[0] + a[1];
Notice that the subscripted array references (i.e. expressions such as a[0] and a[1]) can appear on either side of the assignment operator.

The subscript does not have to be a constant like 0 or 1; it can be any integral expression. For example, it’s common to loop over all elements of an array:
int i; for(i = 0; i < 10; i = i + 1) a[i] = 0;
This loop sets all ten elements of the array a to 0.

Arrays are a real convenience for many problems, but there is not a lot that C will do with them for you automatically. In particular, you can neither set all elements of an array at once nor assign one array to another; both of the assignments
a = 0;  /* WRONG */
and
int b[10]; b = a;  /* WRONG */
are illegal.

To set all of the elements of an array to some value, you must do so one by one, as in the loop example above. To copy the contents of one array to another, you must again do so one by one:
int b[10]; for(i = 0; i < 10; i = i + 1) b[i] = a[i];
Remember that for an array declared
int a[10];
there is no element a[10]; the topmost element is a[9]. This is one reason that zero-based loops are also common in C. Note that the for loop
for(i = 0; i < 10; i = i + 1) ...
does just what you want in this case: it starts at 0, the number 10 suggests (correctly) that it goes through 10 iterations, but the less-than comparison means that the last trip through the loop has i set to 9. (The comparison i <= 9 would also work, but it would be less clear and therefore poorer style.)

In the little examples so far, we’ve always looped over all 10 elements of the sample array a. It’s common, however, to use an array that’s bigger than necessarily needed, and to use a second variable to keep track of how many elements of the array are currently in use. For example, we might have an integer variable
int na;  /* number of elements of a[] in use */
Then, when we wanted to do something with a (such as print it out), the loop would run from 0 to na, not 10 (or whatever a‘s size was):
for(i = 0; i < na; i = i + 1) printf("%d\n", a[i]);
Naturally, we would have to ensure ensure that na‘s value was always less than or equal to the number of elements actually declared in a.

Arrays are not limited to type int; you can have arrays of char or double or any other type.
Here is a slightly larger example of the use of arrays. Suppose we want to investigate the behavior of rolling a pair of dice. The total roll can be anywhere from 2 to 12, and we want to count how often each roll comes up. We will use an array to keep track of the counts: a[2] will count how many times we’ve rolled 2, etc.
We’ll simulate the roll of a die by calling C’s random number generation function, rand(). Each time you call rand(), it returns a different, pseudo-random integer. The values that rand() returns typically span a large range, so we’ll use C’s modulus (or “remainder”) operator % to produce random numbers in the range we want. The expression rand() % 6 produces random numbers in the range 0 to 5, and rand() % 6 + 1 produces random numbers in the range 1 to 6.
Here is the program:
#include <stdio.h> #include <stdlib.h> main() { int i; int d1, d2; int a[13]; /* uses [2..12] */ for(i = 2; i <= 12; i = i + 1) a[i] = 0; for(i = 0; i < 100; i = i + 1) { d1 = rand() % 6 + 1; d2 = rand() % 6 + 1; a[d1 + d2] = a[d1 + d2] + 1; } for(i = 2; i <= 12; i = i + 1) printf("%d: %d\n", i, a[i]); return 0; }
We include the header <stdlib.h> because it contains the necessary declarations for the rand() function. We declare the array of size 13 so that its highest element will be a[12]. (We’re wasting a[0] and a[1]; this is no great loss.) The variables d1 and d2 contain the rolls of the two individual dice; we add them together to decide which cell of the array to increment, in the line
a[d1 + d2] = a[d1 + d2] + 1;
After 100 rolls, we print the array out. Typically (as craps players well know), we’ll see mostly 7′s, and relatively few 2′s and 12′s.

(By the way, it turns out that using the % operator to reduce the range of the rand function is not always a good idea. We’ll say more about this problem in an exercise.)
4.1.1 Array Initialization
4.1.2 Arrays of Arrays (“Multidimensional” Arrays)

4.2 Visibility and Lifetime (Global Variables, etc.)

We haven’t said so explicitly, but variables are channels of communication within a program. You set a variable to a value at one point in a program, and at another point (or points) you read the value out again. The two points may be in adjoining statements, or they may be in widely separated parts of the program.
How long does a variable last? How widely separated can the setting and fetching parts of the program be, and how long after a variable is set does it persist? Depending on the variable and how you’re using it, you might want different answers to these questions.
The visibility of a variable determines how much of the rest of the program can access that variable. You can arrange that a variable is visible only within one part of one function, or in one function, or in one source file, or anywhere in the program. (We haven’t really talked about source files yet; we’ll be exploring them soon.)
Why would you want to limit the visibility of a variable? For maximum flexibility, wouldn’t it be handy if all variables were potentially visible everywhere? As it happens, that arrangement would be too flexible: everywhere in the program, you would have to keep track of the names of all the variables declared anywhere else in the program, so that you didn’t accidentally re-use one. Whenever a variable had the wrong value by mistake, you’d have to search the entire program for the bug, because any statement in the entire program could potentially have modified that variable. You would constantly be stepping all over yourself by using a common variable name like i in two parts of your program, and having one snippet of code accidentally overwrite the values being used by another part of the code. The communication would be sort of like an old party line–you’d always be accidentally interrupting other conversations, or having your conversations interrupted.
To avoid this confusion, we generally give variables the narrowest or smallest visibility they need. A variable declared within the braces {} of a function is visible only within that function; variables declared within functions are called local variables. If another function somewhere else declares a local variable with the same name, it’s a different variable entirely, and the two don’t clash with each other.
On the other hand, a variable declared outside of any function is a global variable, and it is potentially visible anywhere within the program. You use global variables when you do want the communications path to be able to travel to any part of the program. When you declare a global variable, you will usually give it a longer, more descriptive name (not something generic like i) so that whenever you use it you will remember that it’s the same variable everywhere.
Another word for the visibility of variables is scope.
How long do variables last? By default, local variables (those declared within a function) have automatic duration: they spring into existence when the function is called, and they (and their values) disappear when the function returns. Global variables, on the other hand, have static duration: they last, and the values stored in them persist, for as long as the program does. (Of course, the values can in general still be overwritten, so they don’t necessarily persist forever.)
Finally, it is possible to split a function up into several source files, for easier maintenance. When several source files are combined into one program (we’ll be seeing how in the next chapter) the compiler must have a way of correlating the global variables which might be used to communicate between the several source files. Furthermore, if a global variable is going to be useful for communication, there must be exactly one of it: you wouldn’t want one function in one source file to store a value in one global variable named globalvar, and then have another function in another source file read from a different global variable named globalvar. Therefore, a global variable should have exactly one defining instance, in one place in one source file. If the same variable is to be used anywhere else (i.e. in some other source file or files), the variable is declared in those other file(s) with an external declaration, which is not a defining instance. The external declaration says, “hey, compiler, here’s the name and type of a global variable I’m going to use, but don’t define it here, don’t allocate space for it; it’s one that’s defined somewhere else, and I’m just referring to it here.” If you accidentally have two distinct defining instances for a variable of the same name, the compiler (or the linker) will complain that it is “multiply defined.”
It is also possible to have a variable which is global in the sense that it is declared outside of any function, but private to the one source file it’s defined in. Such a variable is visible to the functions in that source file but not to any functions in any other source files, even if they try to issue a matching declaration.
You get any extra control you might need over visibility and lifetime, and you distinguish between defining instances and external declarations, by using storage classes. A storage class is an extra keyword at the beginning of a declaration which modifies the declaration in some way. Generally, the storage class (if any) is the first word in the declaration, preceding the type name. (Strictly speaking, this ordering has not traditionally been necessary, and you may see some code with the storage class, type name, and other parts of a declaration in an unusual order.)
We said that, by default, local variables had automatic duration. To give them static duration (so that, instead of coming and going as the function is called, they persist for as long as the function does), you precede their declaration with the static keyword:
static int i;

By default, a declaration of a global variable (especially if it specifies an initial value) is the defining instance. To make it an external declaration, of a variable which is defined somewhere else, you precede it with the keyword extern:
extern int j;

Finally, to arrange that a global variable is visible only within its containing source file, you precede it with the static keyword:
static int k;

Notice that the static keyword can do two different things: it adjusts the duration of a local variable from automatic to static, or it adjusts the visibility of a global variable from truly global to private-to-the-file.
To summarize, we’ve talked about two different attributes of a variable: visibility and duration. These are orthogonal, as shown in this table:


duration:
visibility:automaticstatic
localnormal local variablesstatic local variables
globalN/Anormal global variables
We can also distinguish between file-scope global variables and truly global variables, based on the presence or absence of the static keyword.
We can also distinguish between external declarations and defining instances of global variables, based on the presence or absence of the extern keyword.

4.3 Default Initialization

The duration of a variable (whether static or automatic) also affects its default initialization.
If you do not explicitly initialize them, automatic-duration variables (that is, local, non-static ones) are not guaranteed to have any particular initial value; they will typically contain garbage. It is therefore a fairly serious error to attempt to use the value of an automatic variable which has never been initialized or assigned to: the program will either work incorrectly, or the garbage value may just happen to be “correct” such that the program appears to work correctly! However, the particular value that the garbage takes on can vary depending literally on anything: other parts of the program, which compiler was used, which hardware or operating system the program is running on, the time of day, the phase of the moon. (Okay, maybe the phase of the moon is a bit of an exaggeration.) So you hardly want to say that a program which uses an uninitialized variable “works”; it may seem to work, but it works for the wrong reason, and it may stop working tomorrow.
Static-duration variables (global and static local), on the other hand, are guaranteed to be initialized to 0 if you do not use an explicit initializer in the definition.
(Once upon a time, there was another distinction between the initialization of automatic vs. static variables: you could initialize aggregate objects, such as arrays, only if they had static duration. If your compiler complains when you try to initialize a local array, it’s probably an old, pre-ANSI compiler. Modern, ANSI-compatible compilers remove this limitation, so it’s no longer much of a concern.)

4.4 Examples

Here is an example demonstrating almost everything we’ve seen so far:
int globalvar = 1; extern int anotherglobalvar; static int privatevar; f() { int localvar; int localvar2 = 2; static int persistentvar; }

Here we have six variables, three declared outside and three declared inside of the function f().
globalvar is a global variable. The declaration we see is its defining instance (it happens also to include an initial value). globalvar can be used anywhere in this source file, and it could be used in other source files, too (as long as corresponding external declarations are issued in those other source files).
anotherglobalvar is a second global variable. It is not defined here; the defining instance for it (and its initialization) is somewhere else.
privatevar is a “private” global variable. It can be used anywhere within this source file, but functions in other source files cannot access it, even if they try to issue external declarations for it. (If other source files try to declare a global variable called “privatevar”, they’ll get their own; they won’t be sharing this one.) Since it has static duration and receives no explicit initialization, privatevar will be initialized to 0.
localvar is a local variable within the function f(). It can be accessed only within the function f(). (If any other part of the program declares a variable named “localvar”, that variable will be distinct from the one we’re looking at here.) localvar is conceptually “created” each time f() is called, and disappears whenf() returns. Any value which was stored in localvar last time f() was running will be lost and will not be available next time f() is called. Furthermore, since it has no explicit initializer, the value of localvar will in general be garbage each time f() is called.
localvar2 is also local, and everything that we said about localvar applies to it, except that since its declaration includes an explicit initializer, it will be initialized to 2 each time f() is called.
Finally, persistentvar is again local to f(), but it does maintain its value between calls to f(). It has static duration but no explicit initializer, so its initial value will be 0.
The defining instances and external declarations we’ve been looking at so far have all been of simple variables. There are also defining instances and external declarations of functions, which we’ll be looking at in the next chapter.
(Also, don’t worry about static variables for now if they don’t make sense to you; they’re a relatively sophisticated concept, which you won’t need to use at first.)
The term declaration is a general one which encompasses defining instances and external declarations; defining instances and external declarations are two different kinds of declarations. Furthermore, either kind of declaration suffices to inform the compiler of the name and type of a particular variable (or function). If you have the defining instance of a global variable in a source file, the rest of that source file can use that variable without having to issue any external declarations. It’s only in source files where the defining instance hasn’t been seen that you need external declarations.
You will sometimes hear a defining instance referred to simply as a “definition,” and you will sometimes hear an external declaration referred to simply as a “declaration.” These usages are mildly ambiguous, in that you can’t tell out of context whether a “declaration” is a generic declaration (that might be a defining instance or an external declaration) or whether it’s an external declaration that specifically is not a defining instance. (Similarly, there are other constructions that can be called “definitions” in C, namely the definitions of preprocessor macros, structures, and typedefs, none of which we’ve met.) In these notes, we’ll try to make things clear by using the unambiguous terms defining instance and external declaration. Elsewhere, you may have to look at the context to determine how the terms “definition” and “declaration” are being used.

No comments:

Post a Comment