eTech Arena: Chapter 2: Basic Data Types and Operators

The type of a variable determines what kinds of values it may take on. An operator computes new values out of old ones. An expression consists of variables, constants, and operators combined to perform some useful computation. In this chapter, we’ll learn about C’s basic types, how to write constants and declare variables of these types, and what the basic operators are.

As Kernighan and Ritchie say, “The type of an object determines the set of values it can have and what operations can be performed on it.” This is a fairly formal, mathematical definition ofwhat a type is, but it is traditional (and meaningful). There are several implications to remember:

The “set of values” is finite. C’s int type can not represent all of the integers; its float type can not represent all floating-point numbers.
When you’re using an object (that is, a variable) of some type, you may have to remember what values it can take on and what operations you can perform on it. For example, there are several operators which play with the binary (bit-level) representation of integers, but these operators are not meaningful for and may not beapplied to floating-point operands.
When declaring a new variable and picking a type for it, you have to keep in mind the values and operations you’ll be needing.

In other words, picking a type for a variable is not some abstract academic exercise; it’s closely connected to the way(s) you’ll be using that variable.

2.1 Types

[This section corresponds to K&R Sec. 2.2]

There are only a few basic data types in C. The first ones we’ll be encountering and using are:

char a character
int an integer, in the range -32,767 to 32,767
long int a larger integer (up to +-2,147,483,647)
float a floating-point number
double a floating-point number, with more precision and perhaps greater range than float

If you can look at this list of basic types and say to yourself, “Oh, how simple, there are only a few types, I won’t have to worry much about choosing among them,” you’ll have an easy time with declarations. (Some masochists wish that the type system were more complicated so that they could specify more things about each variable, but those of us who would rather not have to specify these extra things each time are glad that we don’t have to.)

The ranges listed above for types int and long int are the guaranteed minimum ranges. On some systems, either of these types (or, indeed, any C type) may be able to hold larger values, but a program that depends on extended ranges will not be as portable. Some programmers become obsessed with knowing exactly what the sizes of data objects will be in various situations, and go on to write programs which depend on these exact sizes. Determining or controlling the size of an object is occasionally important, but most of the time we can sidestep size issues and let the compiler do most of the worrying.

(From the ranges listed above, we can determine that type int must be at least 16 bits, and that type long int must be at least 32 bits. But neither of these sizes is exact; many systens have 32-bit ints, and some systems have 64-bit long ints.)

You might wonder how the computer stores characters. The answer involves a character set, which is simply a mapping between some set of characters and some set of small numericcodes. Most machines today use the ASCII character set, in which the letter A is represented by the code 65, the ampersand & is represented by the code 38, the digit 1 is represented by the code 49, the space character is represented by the code 32, etc. (Most of the time, of course, you have no need to know or even worry about these particular code values; they’re automatically translated into the right shapes on the screen or printer when characters are printed out, and they’re automatically generated when you type characters on the keyboard. Eventually, though, we’ll appreciate, and even take some control over, exactly when these translations–from characters to their numeric codes–are performed.) Character codes are usually small–the largest code value in ASCII is 126, which is the ~ (tilde or circumflex) character. Characters usually fit in a byte, which is usually 8 bits. In C, type char is defined as occupying one byte, so it is usually 8 bits.

Most of the simple variables in most programs are of types int, long int, or double. Typically, we’ll use int and double for most purposes, and long int any time we need to hold integer values greater than 32,767. As we’ll see, even when we’re manipulating individual characters, we’ll usually use an int variable, for reasons to be discussed later. Therefore, we’ll rarely use individual variables of type char; although we’ll use plenty of arrays of char.

2.2 Constants

[This section corresponds to K&R Sec. 2.3]

A constant is just an immediate, absolute value found in an expression. The simplest constants are decimal integers, e.g. 0, 1, 2, 123 . Occasionally it is useful to specify constants in base 8 or base 16 (octal or hexadecimal); this is done by prefixing an extra 0 (zero) for octal, or 0x for hexadecimal: the constants 100, 0144, and 0x64 all represent the same number. (If you’re not using these non-decimal constants, just remember not to use any leading zeroes. If you accidentally write0123 intending to get one hundred and twenty three, you’ll get 83 instead, which is 123 base 8.)

We write constants in decimal, octal, or hexadecimal for our convenience, not the compiler’s. The compiler doesn’t care; it always converts everything into binary internally, anyway. (There is, however, no good way to specify constants in source code in binary.)

A constant can be forced to be of type long int by suffixing it with the letter L (in upper or lower case, although upper case is strongly recommended, because a lower case l looks too much like the digit 1).

A constant that contains a decimal point or the letter e (or both) is a floating-point constant: 3.14, 10., .01, 123e4, 123.456e7 . The e indicates multiplication by a power of 10; 123.456e7 is 123.456 times 10 to the 7th, or 1,234,560,000. (Floating-point constants are of type double by default.)

We also have constants for specifying characters and strings. (Make sure you understand the difference between a character and a string: a character is exactly one character; a string is a set of zero or more characters; a string containing one character is distinct from a lone character.) A character constant is simply a single character between single quotes: 'A', '.', '%'. The numeric value of a character constant is, naturally enough, that character’s value in the machine’s character set. (In ASCII, for example, 'A' has the value 65.)

A string is represented in C as a sequence or array of characters. (We’ll have more to say about arrays in general, and strings in particular, later.) A string constant is a sequence of zero or more characters enclosed in double quotes: "apple", "hello, world", "this is a test".

Within character and string constants, the backslash character \ is special, and is used to represent characters not easily typed on the keyboard or for various reasons not easily typed in constants. The most common of these “character escapes” are:

\n a ``newline'' character \b a backspace \r a carriage return (without a line feed) \' a single quote (e.g. in a character constant) \" a double quote (e.g. in a string constant) \\ a single backslash

For example, "he said \"hi\"" is a string constant which contains two double quotes, and '\'' is a character constant consisting of a (single) single quote. Notice once again that the character constant 'A' is very different from the string constant "A".

2.3 Declarations

[This section corresponds to K&R Sec. 2.4]

Informally, a variable (also called an object) is a place you can store a value. So that you can refer to it unambiguously, a variable needs a name. You can think of the variables in your program as a set of boxes or cubbyholes, each with a label giving its name; you might imagine that storing a value “in” a variable consists of writing the value on a slip of paper and placing it in the cubbyhole.

A declaration tells the compiler the name and type of a variable you’ll be using in your program. In its simplest form, a declaration consists of the type, the name of the variable, and a terminating semicolon:

char c; int i; float f;

You can also declare several variables of the same type in one declaration, separating them with commas:

int i1, i2;

Later we’ll see that declarations may also contain initializers, qualifiers and storage classes, and that we can declare arrays, functions, pointers, and other kinds of data structures.

The placement of declarations is significant. You can’t place them just anywhere (i.e. they cannot be interspersed with the other statements in your program). They must either be placed at the beginning of a function, or at the beginning of a brace-enclosed block of statements (which we’ll learn about in the next chapter), or outside of any function. Furthermore, the placement of a declaration, as well as its storage class, controls several things about its visibility and lifetime, as we’ll see later.

You may wonder why variables must be declared before use. There are two reasons:

It makes things somewhat easier on the compiler; it knows right away what kind of storage to allocate and what code to emit to store and manipulate each variable; it doesn’t have to try to intuit the programmer’s intentions.
It forces a bit of useful discipline on the programmer: you cannot introduce variables willy-nilly; you must think about them enough to pick appropriate types for them. (The compiler’s error messages to you, telling you that you apparently forgot to declare a variable, are as often helpful as they are a nuisance: they’re helpful when they tell you that you misspelled a variable, or forgot to think about exactly how you were going to use it.)

Although there are a few places where declarations can be omitted (in which case the compiler will assume an implicit declaration), making use of these removes the advantages of reason 2 above, so I recommend always declaring everything explicitly.

Most of the time, I recommend writing one declaration per line. For the most part, the compiler doesn’t care what order declarations are in. You can order the declarations alphabetically, or in the order that they’re used, or to put related declarations next to each other. Collecting all variables of the same type together on one line essentially orders declarations by type, which isn’t a very useful order (it’s only slightly more useful than random order).

A declaration for a variable can also contain an initial value. This initializer consists of an equals sign and an expression, which is usually a single constant:

int i = 1; int i1 = 10, i2 = 20;

2.4 Variable Names

[This section corresponds to K&R Sec. 2.1]

Within limits, you can give your variables and functions any names you want. These names (the formal term is “identifiers”) consist of letters, numbers, and underscores. For our purposes, names must begin with a letter. Theoretically, names can be as long as you want, but extremely long ones get tedious to type after a while, and the compiler is not required to keep track of extremely long ones perfectly. (What this means is that if you were to name a variable, say,supercalafragalisticespialidocious, the compiler might get lazy and pretend that you’d named it supercalafragalisticespialidocio, such that if you later misspelled it supercalafragalisticespialidociouz, the compiler wouldn’t catch your mistake. Nor would the compiler necessarily be able to tell the difference if for some perverse reason you deliberately declared a second variable named supercalafragalisticespialidociouz.)

The capitalization of names in C is significant: the variable names variable, Variable, and VARIABLE (as well as silly combinations like variAble) are all distinct.

A final restriction on names is that you may not use keywords (the words such as int and for which are part of the syntax of the language) as the names of variables or functions (or as identifiers of any kind).

2.5 Arithmetic Operators

[This section corresponds to K&R Sec. 2.5]

The basic operators for performing arithmetic are the same in many computer languages:

+ addition - subtraction * multiplication / division % modulus (remainder)

The - operator can be used in two ways: to subtract two numbers (as in a - b), or to negate one number (as in -a + b or a + -b).

When applied to integers, the division operator / discards any remainder, so 1 / 2 is 0 and 7 / 4 is 1. But when either operand is a floating-point quantity (typefloat or double), the division operator yields a floating-point result, with a potentially nonzero fractional part. So 1 / 2.0 is 0.5, and 7.0 / 4.0 is 1.75.

The modulus operator % gives you the remainder when two integers are divided: 1 % 2 is 1; 7 % 4 is 3. (The modulus operator can only be applied to integers.)

An additional arithmetic operation you might be wondering about is exponentiation. Some languages have an exponentiation operator (typically ^ or **), but C doesn’t. (To square or cube a number, just multiply it by itself.)

Multiplication, division, and modulus all have higher precedence than addition and subtraction. The term “precedence” refers to how “tightly” operators bind to their operands (that is, to the things they operate on). In mathematics, multiplication has higher precedence than addition, so 1 + 2 * 3 is 7, not 9. In other words, 1 + 2 * 3 is equivalent to 1 + (2 * 3). C is the same way.

All of these operators “group” from left to right, which means that when two or more of them have the same precedence and participate next to each other in an expression, the evaluation conceptually proceeds from left to right. For example, 1 - 2 - 3 is equivalent to (1 - 2) - 3 and gives -4, not +2. (“Grouping” is sometimes called associativity, although the term is used somewhat differently in programming than it is in mathematics. Not all C operators group from left to right; a few group from right to left.)

Whenever the default precedence or associativity doesn’t give you the grouping you want, you can always use explicit parentheses. For example, if you wanted to add 1 to 2 and then multiply the result by 3, you could write (1 + 2) * 3.

By the way, the word “arithmetic” as used in the title of this section is an adjective, not a noun, and it’s pronounced differently than the noun: the accent is on the third syllable.

2.6 Assignment Operators

[This section corresponds to K&R Sec. 2.10]

The assignment operator = assigns a value to a variable. For example,

x = 1

sets x to 1, and

a = b

sets a to whatever b‘s value is. The expression

i = i + 1

is, as we’ve mentioned elsewhere, the standard programming idiom for increasing a variable’s value by 1: this expression takes i‘s old value, adds 1 to it, and stores it back into i. (C provides several “shortcut” operators for modifying variables in this and similar ways, which we’ll meet later.)

We’ve called the = sign the “assignment operator” and referred to “assignment expressions” because, in fact, = is an operator just like + or -. C does not have “assignment statements”; instead, an assignment like a = b is an expression and can be used wherever any expression can appear. Since it’s an expression, the assignment a = b has a value, namely, the same value that’s assigned to a. This value can then be used in a larger expression; for example, we might write

c = a = b

which is equivalent to

c = (a = b)

and assigns b‘s value to both a and c. (The assignment operator, therefore, groups from right to left.) Later we’ll see other circumstances in which it can be useful to use the value of an assignment expression.

It’s usually a matter of style whether you initialize a variable with an initializer in its declaration or with an assignment expression near where you first use it. That is, there’s no particular difference between

int a = 10;

and

int a; /* later... */ a = 10;

2.7 Function Calls

We’ll have much more to say about functions in a later chapter, but for now let’s just look at how they’re called. (To review: what a function is is a piece of code, written by you or by someone else, which performs some useful, compartmentalizable task.) You call a function by mentioning its name followed by a pair of parentheses. If the function takes any arguments, you place the arguments between the parentheses, separated by commas. These are all function calls:

printf("Hello, world!\n") printf("%d\n", i) sqrt(144.) getchar()

The arguments to a function can be arbitrary expressions. Therefore, you don't have to say things like

int sum = a + b + c; printf("sum = %d\n", sum);

if you don't want to; you can instead collapse it to

printf("sum = %d\n", a + b + c);

Many functions return values, and when they do, you can embed calls to these functions within larger expressions:

c = sqrt(a * a + b * b) x = r * cos(theta) i = f1(f2(j))

The first expression squares a and b, computes the square root of the sum of the squares, and assigns the result to c. (In other words, it computes a * a + b * b, passes that number to the sqrt function, and assigns sqrt's return value to c.) The second expression passes the value of the variable theta to the cos (cosine) function, multiplies the result by r, and assigns the result to x. The third expression passes the value of the variable j to the function f2, passes the return value of f2immediately to the function f1, and finally assigns f1's return value to the variable i.

eTech Arena

Thursday, 1 March 2012

Chapter 2: Basic Data Types and Operators