Thursday, 1 March 2012

Chapter 12: Input and Output

So far, we’ve been calling printf to print formatted output to the “standard output” (wherever that is). We’ve also been calling getchar to read single characters from the “standard input,” and putchar to write single characters to the standard output. “Standard input” and “standard output” are two predefined I/O streamswhich are implicitly available to us. In this chapter we’ll learn how to take control of input and output by opening our own streams, perhaps connected to data files, which we can read from and write to.

12.1 File Pointers and fopen

[This section corresponds to K&R Sec. 7.5]
How will we specify that we want to access a particular data file? It would theoretically be possible to mention the name of a file each time it was desired to read from or write to it. But such an approach would have a number of drawbacks. Instead, the usual approach (and the one taken in C’s stdio library) is that you mention the name of the file once, at the time you open it. Thereafter, you use some little token–in this case, the file pointer–which keeps track (both for your sake and the library’s) of which file you’re talking about. Whenever you want to read from or write to one of the files you’re working with, you identify that file by using its file pointer (that is, the file pointer you obtained when you opened the file). As we’ll see, you store file pointers in variables just as you store any other data you manipulate, so it is possible to have several files open, as long as you use distinct variables to store the file pointers.
You declare a variable to store a file pointer like this:
FILE *fp;
The type FILE is predefined for you by <stdio.h>. It is a data structure which holds the information the standard I/O library needs to keep track of the file for you. For historical reasons, you declare a variable which is a pointer to this FILE type. The name of the variable can (as for any variable) be anything you choose; it is traditional to use the letters fp in the variable name (since we’re talking about a file pointer). If you were reading from two files at once you’d probably use two file pointers:
FILE *fp1, *fp2;
If you were reading from one file and writing to another you might declare and input file pointer and an output file pointer:
FILE *ifp, *ofp;

Like any pointer variable, a file pointer isn’t any good until it’s initialized to point to something. (Actually, no variable of any type is much good until you’ve initialized it.) To actually open a file, and receive the “token” which you’ll store in your file pointer variable, you call fopenfopen accepts a file name (as a string) and a modevalue indicating among other things whether you intend to read or write this file. (The mode variable is also a string.) To open the file input.dat for reading you might call
ifp = fopen("input.dat", "r");
The mode string "r" indicates reading. Mode "w" indicates writing, so we could open output.dat for output like this:
ofp = fopen("output.dat", "w");
The other values for the mode string are less frequently used. The third major mode is "a" for append. (If you use "w" to write to a file which already exists, its old contents will be discarded.) You may also add a + character to the mode string to indicate that you want to both read and write, or a b character to indicate that you want to do “binary” (as opposed to text) I/O.

One thing to beware of when opening files is that it’s an operation which may fail. The requested file might not exist, or it might be protected against reading or writing. (These possibilities ought to be obvious, but it’s easy to forget them.) fopen returns a null pointer if it can’t open the requested file, and it’s important to check for this case before going off and using fopen‘s return value as a file pointer. Every call to fopen will typically be followed with a test, like this:
ifp = fopen("input.dat", "r"); if(ifp == NULL) { printf("can't open file\n"); exit or return }
If fopen returns a null pointer, and you store it in your file pointer variable and go off and try to do I/O with it, your program will typically crash.

It’s common to collapse the call to fopen and the assignment in with the test:
if((ifp = fopen("input.dat", "r")) == NULL) { printf("can't open file\n"); exit or return }
You don’t have to write these “collapsed” tests if you’re not comfortable with them, but you’ll see them in other people’s code, so you should be able to read them.

12.2 I/O with File Pointers

For each of the I/O library functions we’ve been using so far, there’s a companion function which accepts an additional file pointer argument telling it where to read from or write to. The companion function to printf is fprintf, and the file pointer argument comes first. Toprint a string to the output.dat file we opened in the previous section, we might call
fprintf(ofp, "Hello, world!\n");

The companion function to getchar is getc, and the file pointer is its only argument. To read a character from the input.dat file we opened in the previous section, we might call
int c; c = getc(ifp);

The companion function to putchar is putc, and the file pointer argument comes last. To write a character to output.dat, we could call
putc(c, ofp);

Our own getline function calls getchar and so always reads the standard input. We could write a companion fgetline function which reads from an arbitrary file pointer:
#include <stdio.h> /* Read one line from fp, */ /* copying it to line array (but no more than max chars). */ /* Does not place terminating \n in line array. */ /* Returns line length, or 0 for empty line, or EOF for end-of-file. */ int fgetline(FILE *fp, char line[], int max) { int nch = 0; int c; max = max - 1;  /* leave room for '\0' */ while((c = getc(fp)) != EOF) { if(c == '\n') break; if(nch < max) { line[nch] = c; nch = nch + 1; } } if(c == EOF && nch == 0) return EOF; line[nch] = '\0'; return nch; }

Now we could read one line from ifp by calling
char line[MAXLINE]; ... fgetline(ifp, line, MAXLINE);

12.3 Predefined Streams

Besides the file pointers which we explicitly open by calling fopen, there are also three predefined streams. stdin is a constant file pointer corresponding to standard input, and stdout is a constant file pointer corresponding to standard output. Both of these can be used anywhere a file pointer is called for; for example, getchar()is the same as getc(stdin) and putchar(c) is the same as putc(c, stdout). The third predefined stream is stderr. Like stdoutstderr is typically connected to the screen by default. The difference is that stderr is not redirected when the standard output is redirected. For example, under Unix or MS-DOS, when you invoke
program > filename
anything printed to stdout is redirected to the file filename, but anything printed to stderr still goes to the screen. The intent behind stderr is that it is the “standard erroroutput”; error messages printed to it will not disappear into an output file. For example, a more realistic way to print an error message when a file can’t be opened would be
if((ifp = fopen(filename, "r")) == NULL) { fprintf(stderr, "can't open file %s\n", filename); exit or return }
where filename is a string variable indicating the file name to be opened. Not only is the error message printed to stderr, but it is also more informative in that it mentions the name of the file that couldn’t be opened. (We’ll see another example in the next chapter.)

12.4 Closing Files

Although you can open multiple files, there’s a limit to how many you can have open at once. If your program will open many files in succession, you’ll want to close each one as you’re done with it; otherwise the standard I/O library could run out of the resources it uses to keep track of open files. Closing a file simply involves callingfclose with the file pointer as its argument:
fclose(fp);
Calling fclose arranges that (if the file was open for output) any last, buffered output is finally written to the file, and that those resources used by the operating system (and the C library) for this file are released. If you forget to close a file, it will be closed automatically when the program exits.

12.5 Example: Reading a Data File

Suppose you had a data file consisting of rows and columns of numbers:
1 2 34 5 6 78 9 10 112
Suppose you wanted to read these numbers into an array. (Actually, the array will be an array of arrays, or a “multidimensional” array; see section 4.1.2.) We can write code to do this by putting together several pieces: the fgetline function we just showed, and the getwords function from chapter 10. Assuming that the data file is named input.dat, the code would look like this:
#define MAXLINE 100 #define MAXROWS 10 #define MAXCOLS 10 int array[MAXROWS][MAXCOLS]; char *filename = "input.dat"; FILE *ifp; char line[MAXLINE]; char *words[MAXCOLS]; int nrows = 0; int n; int i; ifp = fopen(filename, "r"); if(ifp == NULL) { fprintf(stderr, "can't open %s\n", filename); exit(EXIT_FAILURE); } while(fgetline(ifp, line, MAXLINE) != EOF) { if(nrows >= MAXROWS) { fprintf(stderr, "too many rows\n"); exit(EXIT_FAILURE); } n = getwords(line, words, MAXCOLS); for(i = 0; i < n; i++) array[nrows][i] = atoi(words[i]); nrows++; }
Each trip through the loop reads one line from the file, using fgetline. Each line is broken up into “words” using getwords; each “word” is actually one number. The numbers are however still represented as strings, so each one is converted to an int by calling atoi before being stored in the array. The code checks for two different error conditions (failure to open the input file, and too many lines in the input file) and if one of these conditions occurs, it prints an error message, and exits. The exit function is a Standard library function which terminates your program. It is declared in <stdlib.h>, and accepts one argument, which will be the exit status of the program. EXIT_FAILURE is a code, also defined by <stdlib.h>, which indicates that the program failed. Success is indicated by a code ofEXIT_SUCCESS, or simply 0. (These values can also be returned from main(); calling exit with a particular status value is essentially equivalent to returning that same status value from main.)

No comments:

Post a Comment