CS 111 Assignment 6
String class objects as parameters to functions



    Some of the examples below, and some of the homework problems, will require a little more than the usual amount of disk space. So, please be sure to get rid of all a.out, l.out, and core files before you begin. To find where they are, type the following at the "forbin>" prompt, in your home directory:

       find . | grep a.out
       find . | grep l.out
       find . | grep core
    

    Then go to the indicated directories and remove the files.


  1. Brief review of classification of characters.   In your hw06 folder, compile classifyChar1A.cpp and run it. When prompted to enter a string, you may enter any string of up to 15 characters, including letters, digits, punctuation marks, control characters, and even spaces. Also, try entering a string containing some non-ASCII characters. (To enter a non-ASCII character, hold down the [Alt] key while typing a number between 128 and 255 on the numeric keypad -- NOT the row of digit keys at the top of the main alphanumeric keypad. To enter an ASCII control character, To enter a control character, type the backspace or the tab, or hold down the [Ctrl] key while typing any letter.) The program will then display, to the terminal window, a table classifying each character as "ASCII letter," "ASCII digit," "ASCII space," "ASCII control character," "ASCII punctuation mark," or "non-ASCII character."

    Look now at the source code. The characters are classified based on the numeric values of the binary codes that are used to represent the characters in the computer's memory. Study this program for a brief review of how a program can detect letters, digit characters, non-ASCII characters, control characters, and so on.

    One of the branches of the nested if/else checks for a space (' ') character. Note that the entered string can indeed contain a space character. This would not be possible if the string were entered via command-line arguments, for which spaces separate command-line arguments and hence cannot be part of them. Nor would it be possible using cin's extraction operator (<≶). However, a entered via the getline function can contain any character that can be represented by a 1-byte code.


  2. More about right-justifying columns of numbers.   Compile classifyChar1B.cpp and run it. This program is similar to classifyChar1A.cpp, except that it sends the output to a text file classification.txt, rather than to the terminal window. After running the program, you can view classification.txt by typing either:

       cat classification.txt
    
    or:
       or classification.txt
    

    Now generate a detailed file listing, by typing:

       ls -l
    

    (The character after the hyphen should be a lowercase L, not a digit 1.)

    In the file listing, observe how large a.out is, relative to the source code files. That's because the executable file includes not only a machine code version of your source code files, but also the C++ library's definitions of all the functions and other things declared in the header files. In particular, classifyChar2B.cpp includes four library header files:

       #include <iostream>
       #include <string>
       #include <iomanip>
       #include <fstream>
    

    Your executable file contains definitions of ALL the functions and other things declared in the header files, including a lot of functions and other things that your program does not actually use.

    We can make the executable file a little smaller (though not a lot smaller) by defining our own home-grown files to handle right-justification of the columns, rather than using the setw manipulater, thereby eliminating the need to include the <iomanip> header file.

    Look now at classifyChar2A.cpp and classifyChar2B.cpp. These programs behave exactly like classifyChar1A.cpp and classifyChar1B.cpp. But, instead of using the setw manipulator, they define and use their own printRightJustified functions, which in turn call a function digitCount, also defined in the program.

    In both classifyChar2A.cpp and classifyChar2B.cpp, look carefully at the printRightJustified and digitCount functions, and make sure you understand how they work. The two programs have identical digitCount functions. The two printRightJustified functions are almost identical, differing only in that one uses cout, whereas the other one uses an ofstream object that has been passed in as a parameter.


  3. Programmer-defined libraries   So far, we have used various functions, classes, etc. that are defined in the C++ standard library. We can also define our own libraries of functions we want to reuse, supplementing the standard library.

    For example, we might want to be able to reuse the printRightJustified and digitCount functions. To be able to re-use them without making endless copies of the source code, we will need to define them in a separate file, to be compiled together with other files containing the programs that use these functions.

    To compile a program consisting of more than one file, type the names of all the relevant .cpp files as command-line arguments to g++. For example:

       g++ classifyChar3A.cpp textUtility.cpp
    

    where classifyChar3A.cpp is a program which calls functions defined in the separate file textUtility.cpp. The functions defined in textUtility.cpp can be reused in other programs, such as classifyChar3B.cpp, which can be compiled by typing:

       g++ classifyChar3B.cpp textUtility.cpp
    

    thereby saving the programmer the trouble of physically copying the functions defined in textUtility.cpp to other source code files.

    If a program consists of more than one file, each of the files must contain, at the top, function prototypes (declarations) for BOTH (1) the functions that are defined in that file and (2) the functions that are called in that file, though possibly defined in a different file. To save us the trouble of physically typing identical prototypes in multiple files, we can include a header file instead. The header file contains the set of prototypes we need in more than one file. When we compile our programs, the preprocessor automatically copies the contents of the "included" header files into our source code files before the source code files are compiled.

    Look now at the source code of classifyChar3A.cpp and classifyChar3B.cpp. These programs are intended to behave exactly like classifyChar2A.cpp and classifyChar2B.cpp.

    In both classifyChar3A.cpp and classifyChar3B.cpp, the main function calls a function named printRightJustified. However, these files do not define or declare the printRightJustified function, nor do they define or declare a digitCount function. Instead, the printRightJustified and digitCount functions are defined in source code file A href="textUtility.cpp">textUtility.cpp and declared in header file textUtility.h.

    In both classifyChar3A.cpp and classifyChar3B.cpp, the characters in the entered string are classified via calls to boolean functions isAscii, isAsciiDigit, isAsciiLetter, isSpace, and isAsciiControl. These functions too are defined in textUtility.cpp and declared in textUtility.h.

    The header file textUtility.h contains prototypes (declarations) of all the functions defined in textUtility.cpp.

    The source code files textUtility.cpp, classifyChar3A.cpp, and classifyChar3B.cpp all include the header file textUtility.h at the top. Observe that the name of the header file is enclosed in quote marks, not angle brackets, in the include directives. We use quote marks for a programmer-defined header file, whereas we use angle brackets for a C++ standard library header file.

    Observe that textUtility.cpp does NOT contain a main function. Every program must contain a main function, but textUtility.cpp is not a program. Instead, textUtility.cpp just contains a collection of functions intended to be used in various programs.

    Observe also that the header file textUtility.h, itself, contains the following include statements for C++ standard library header files:

       #include <iostream>
       #include <string>
    

    Because the entire contents of textUtility.h will be copied, automatically, into textUtility.cpp, classifyChar3A.cpp, and classifyChar3B.cpp, it is not necessary for the programmer to put include statments for <iostream> and <string> directly in textUtility.cpp, classifyChar3A.cpp, and classifyChar3B.cpp. The preprocessor will automatically copy the contents of <iostream> and <string> into textUtility.cpp, classifyChar3A.cpp, and classifyChar3B.cpp, along with the other contents of textUtility.h.


  4. A very brief introduction to class hierarchies and polymorphism.   As we saw earlier, the printRightJustified functions defined in classifyChar2A.cpp and classifyChar2B.cpp are almost identical, differing only in that one uses cout, whereas the other one uses an ofstream object that has been passed in as a parameter.

    We could, if we wanted to, define both versions of the printRightJustified function in textUtility.cpp. However, it would be better if we could write just one function which does both jobs and avoid writing almost-identical multiple versions of functions to do essentially the same job. Writing just one function is better for two reasons. First, it saves disk space in the executable file. Second, if we ever have to modify the function, we don't have to remember to modify all the multiple versions.

    So, instead of defining two separate printRightJustified functions, one of which uses cout and the other of which uses an object of class ostream, our source code file textUtility.cpp defines only one version using an object of class ostream. This one printRightJustified function can accommodate both cout and an ofstream object, because (1) cout, itself, is an object of class ostream, and (2) class ofstream is a subclass of class ostream.

    You'll learn a lot more next semester about what a "subeclass" is. For now, just think of "subclass" as meaning "subcategory" or "subset." To say that class ofstream is a subclass of ostream means that every object of class ofstream is also an object of class ostream. But the converse is not necessarily true. Although every object of class ofstream is also an object of class ostream, not every object of class ostream is necessarily an object of class ofstream. For example, cout is an object of class ostream, but not of class ofstream. An ofstream object is a special kind of ostream object, just as cout is another special kind of ostream object.

    Because cout and an ofstream object are both ostream objects, a parameter of type ostream can refer to either cout or an ofstream object as an argument.

    Thus we can define one function to handle both cout and an ofstream object. However, exactly what the function does will differ, depending on whether the ostream parameter refers to cout or to an ofstream object when the function is called. If the actual argument to the function is cout, it will print to the terminal window, whereas, if the argument to the function is an ofstream object, it will write to the file.

    Thus we can use one object reference to do two different (though similar) jobs, depending on the actual object to which it refers. Our ability to do this is known as polymorphism. You'll learn much more about polymorphism next semester. Polymorphism is one of the most powerful features of object-oriented programming languages such as C++ and Java.


  5. Examples of boolean expressions and boolean functions.   Recall that type bool is a primitive data type having only two possible values: true and false. Recall also that the condition of an if branch structure, a while loop, or a for loop is an expression whose value is either true or false. Thus the conditions are boolean expressions, i.e. expressions of type bool. Any expression of type boolean can be used as the condition of a branch or loop. This includes bool variables, calls to boolean functions, and other kinds of boolean expressions you've seen more often.

    For example, to test whether the numeric value of character's binary code is within the ASCII range (0 to 127), you might use an expression like:

          if ( x < 0 )
    

    as we did in classifyChar1A.cpp. To be able to make this test routinely, you could define and use a boolean function do the test, so you could call the function when needed, as we did in classifyChar3A.cpp:

          if ( isAscii(x) )
    

    where the method isAscii is defined in textUtility.cpp. Here's the definition of the isAscii function:

    bool isAscii(char x)
    {
       return( x >= 0 );
    }  // function isAscii
    

    Boolean expressions can also be combined using logical operators such as && (AND) and || (OR). For example, suppose you wanted to test whether a character was a digit. You would test to see whether the numeric value of its binary code was between the ASCII values of the digit characters '0' and '9', i.e. you would use an expression like:

          if ( x >= '0' && x <= '9' )
    

    To do this test routinely, you could define and use a function like isAsciiDigit, defined textUtility.cpp:

    bool isAsciiDigit(char x)
    {
       return( (x >= '0' && x <= '9'));
    }  // function isAsciiDigit
    

    To test whether a character is a letter, you would test to see whether it is either a capital letter OR a lower case letter, as is done in function isAsciiLetter in textUtility.cpp:

    bool isAsciiLetter(char x)
    {
       return( (x >= 'A' && x <= 'Z') ||
               (x >= 'a' && x <= 'z'));
    }  // function isAsciiLetter
    

    To test whether a character is one of the control characters within the ASCII range (e.g. the newline, the backspace, or the tab key), we can use the isAsciiControl function:

    bool isAsciiControl(char x)
    {
       return( x < 32 || x == 127 );
    }  // function isAsciiControl
    


  6. Searching a string for a particular kind of character.   Compile checkNonAscii.cpp and run it. Enter any line of text, as prompted. This program detects whether your line of text contains any non-ASCII characters. Try it both with and without non-ASCII characters.

    Then examine the source code. This program uses a variable of type bool:

       bool nonAsciiFound = false;
    
    The bool variable nonAaciiFound is used within a loop to indicate whether a non-ASCII character has been found so far:
       for ( int i = 0; i < text.length() && !nonAsciiFound; i++ )  {
          char x = text[i];
          if ( x < 0 )     // i.e. if the leading bit is a 1
             nonAsciiFound = true;
       }  // for i
    

    Note the exclamation point (!) in front of nonAaciiFound in the for loop condition. In this context, the exclamation point is the NOT operator, negating the boolean expression that immediately follows it. Thus !nonAsciiFound is true whenever nonAsciiFound is false, and it is false whenever nonAsciiFound is true.

    The value of nonAaciiFound is initially false. Each character in the string is then tested, and the value of nonAaciiFound becomes true when an ASCII control character is found. This technique is sometimes called "innocent until proven guilty."

    Now compile testContainsAsciiControl.cpp and run it. This program uses a function defined in textUtility.cpp, so, to compile it, you must type:

       g++ testContainsAsciiControl.cpp textUtility.cpp
    

    When you run the program, enter a string. Try it both for strings containing control characters and for strings containing no control characters.

    Source code file textUtility.cpp defines a function containsAsciiControl which uses the isAsciiControl function to determine whether a String contains any ASCII control characters:

    bool containsAsciiControl(const string& text)
    {
       // Search for an ASCII control character,
       // and return true when one is found:
       for ( int i = 0; i < text.length(); i++ )
          if ( isAsciiControl(text[i]) )
              return true;
    
       // Assertion:  If we have reached this point,
       // text contains no ASCII control characters.
    
       return false;
    }  // function containsAsciiControl
    

    Observe the use of more than one return statement so that the method quits, returning a value of true, as soon as an ASCII control character is found. Compare the above technique with the "innocent until proven guilty" technique that was used in checkNonAscii.cpp.

    In your homework, you will be asked to write a function a boolean function isNaturalNumber which tests whether a string represents a non-negative integer. If the string is non-empty, i.e. if the string contains at least one character, then testing whether a string represents a non-negative integer is equivalent to testing whether it contains only digit characters, i.e. testing whether it contains any non-digit characters. You can use techniques similar to what is shown above.


  7. Constant parameters.   In the function containsAsciiControl in textUtility.cpp, observe that the string reference parameter has been declared const. Declaring a parameter const ensures that its value cannot be changed within the function. It is a good idea to declare a reference parameter const when it is being used as an in-parameter only, as explained below.

    We want the function to test a string WITHOUT changing the contents of the string argument in any way. Thus, we want the string parameter text to be an in-parameter only. If the parameter were of a simple data type, rather than a string, we would declare it as a value parameter, because value parameters are guaranteed to act as in-parameters only. However, a value parameter makes a COPY of the argument when the function is called. However, if possible, we don't want to make a copy of a string because, in case the string happens to be very long, making the copy might use up a lot of extra memory. So we've declared it a reference parameter, because a reference parameter is just another name for the argument's already-existing region in memory; it does not make a copy.

    However, if the value of a reference parameter is changed within the function, then the value of the argument variable is likewise changed outside the function, again because the parameter and the argument are two names for the same region in memory. So, to ensure that the value of the argument variable corresponding to a reference parameter is not changed, we must ensure that the reference parameter's value is not changed within the function. This can be accomplished by declaring the reference parameter const.

    In the comment above the containsAsciiControl function heading, we have not bothered to specify preconditions and postconditions for the parameter text, even though it's a reference parameter. Because the parameter has been declared const, its value cannot be changed; hence anything we say about it is automatically a precondition, not a postcondition.


  8. Driver programs and stubs.   The program testContainsAsciiControl does nothing but test the function containsAsciiControl, defined in textUtility.cpp.

    Likewise, as we will see later, testToDigitValue.cpp is a program which does nothing but test the function toDigitValue, defined in textUtility.cpp.

    A simple program whose sole purpose is to test a function is known as a test program, or driver program. When writing a longer program, it is strongly recommended that, as an intermediate step, you also write simple driver programs to test your functions individually. Doing so will make your longer programs much easier to debug.

    In textUtility.cpp, look now at the function parseNaturalNumber. The body of this function begins with the following call to another function defined in textUtility.cpp:

       if ( !isNaturalNumber(text) )    // stub - doesn't work yet
          return false;
    

    The isNaturalNumber function has not been written yet. You will be asked to write it for homework. But, because the isNaturalNumber function is already being called in parseNaturalNumber, we do already need the isNaturalNumber function to be defined somehow, or else our programs which use textUtility.cpp won't link. So, we've temporarily defined the function isNaturalNumber as follows, in anticipation of you writing the real thing later:

    bool isNaturalNumber(const string& text)
    {
       // This function has not yet been written.
       // You will write it for homework.
    
       cout << "Function isNaturalNumber has not "
             << "yet been implemented." << endl;
       return true;
    }  // function isNaturalNumber
    

    Such a trivial preliminary function definition is known as a stub. When you write a large program little by little, compiling and running after each step -- as you always should when writing a large program -- stube are often a useful intermediate step.


  9. Converting a digit character to its intended numeric value.   In textUtility.cpp, look now at the toDigitValue function, which converts a digit character (in the range '0' to '9') to its intended numeric integer value (in the range 0 to 9). If the character is not a digit, it returns -1.
    int toDigitValue(char x)
    {
       if ( isAsciiDigit(x) )
          return ( x - '0' );
       else
          return  -1;
    }  // function toDigitValue
    

    First, the toDigitValue function calls the function isAsciiDigit to check whether the character x is a valid digit character. If so, it computes the intended numeric value of the digit character. Otherwise, it returns -1.

    The expression x - '0' should be understood to mean "the numeric value of the binary representation of the character represented by the variable x, minus the numeric value of the binary representation of the digit character '0'." Note that the ASCII value of the digit character '0' is not 0, it is 48. Recall from the ASCII chart that the ASCII values of the digit characters are in consecutive numerical order, starting with 48 for '0'. Thus, for example, the ASCII value of the digit '1' is 49, which is 1 greater than the ASCII value of the digit '0', i.e. 48. Likewise the ASCII value of the digit '2' is 50, which is 2 greater than the ASCII value of '0', and so on. (You can verify this by running DigitASCII again.) Hence a digit character can be converted to its intended numeric value by subtracting the ASCII value of the character '0' (i.e. 48) from the digit character's own ASCII value.


  10. Converting a string representation of a non-negative integer to its intended numeric value.   Look now at the function parseNaturalNumber in textUtility.cpp. It takes a string as a const reference parameter (hence as an in-parameter only, even though it's a reference paremeter) and, if the string represents a non-negative integer, puts the intended numeric value of the string in an int reference parameter number. The function returns true if the conversion was successful, false otherwise.

    bool parseNaturalNumber(const string& text,
                            int& number)
    {
       // Check validity of text:
       if ( !isNaturalNumber(text) )    // stub - doesn't work yet
          return false;
    
       // Assertion:  If we have reached this point,
       // text represents a valid natural number.
    
       // Read from text the natural number that
       // it represents:
       number = 0;
       for ( int i = 0; i < text.length(); i++ )
          number = (number * 10) + toDigitValue(text[i]);
       return true;
    }  // method parseNaturalNumber
    

    First, the parseNaturalNumber function calls the isNaturalNumber function to determine whether the string represents a valid natural number, i.e. a non-negative integer. As we have seen, the isNaturalNumber function has not been written yet, except for a stub which always returns true no matter what. So, the preliminary test does not yet work. Hence the parseNaturalNumber function does not yet work correctly for invalid strings. It will work correctly for strings that do represent valid natural numbers (at least provided those numbers are no larger than the highest possible int value), but will give nonnense results -- without returning false, as the funciton should -- for strings that do not represent valid natural numbers.

    Even after the isNaturalNumber function has been fully written, the parseNaturalNumber function still will not work correctly for all possible cases. It still will not work correctly for those strings that represent valid natural numbers which are, however, higher than the largest int value on forbin. The isNaturalNumber function is intended to test whether a string represents a natural number, NOT whether the reperesented natural number is also small enough to be contained within a memory location of type int on forbin. In the homework, you'll be asked to fix this problem by writing yet another function isForbinInt and calling it within the parseNaturalNumber function, in addition to calling isNaturalNumber.

    Let's now look at how the parseNaturalNumber function does work, for those strings for which it does already work correctly:

    The for loop converts all the digit characters in string text to their intended numeric values, beginning with the leftmost digit (at position 0) and ending with the rightmost digit. Each digit character is converted to its intended numeric value via a call to the toDigitValue method.

    Once a digit has been converted to its intended numeric value, it must then be combined correctly with those digits that have already been converted. For example, suppose that text is "123". First, the digit '1' is converted to its intended numeric value of 1. Then, the '2' is converted to its intended numeric value of 2. Then, the 1 must be combined correctly with the 2. This is done by multiplying the 1 by 10, to get 10, which is then added to the 2, to get 12. Then, the digit '3' is converted to its intended numeric value of 3 and must be combined correctly with the 12. This is accomplished by multiplying the 12 by 10, to get 120, which is then added to the 3, to get 123. In short, each time a digit is converted, it must be added to ten times the already-converted portion of the string. In our parseNaturalNumber function above, the already-converted portion of the string is represented by the local int parameter number, which is first given a value of 0 so that the leftmost digit too will be processed correctly by the following statement in the for loop:

             number = (number * 10) + toDigitValue(s.charAt(i));
    


  11. Comparison of strings.   Compile stringCompareDemo.cpp and run it. Try entering various pairs of words as command-line arguments. This program compares two words and tells you which word precedes the other in a lexicographical ordering.

    A lexicographical ordering is similar to an alphabetical ordering, except that the ordering is based on ASCII values of the characters. A lexicographical ordering is the same as an alphabetical ordering if both words are either all-uppercase or all lowercase. However, the ASCII values of all the uppercase letters precede the ASCII values of all the lowercase letters; so, for example, "ZEBRA" precedes "ant." Similarly, all the digits precede all the letters.

    A lexicographical comparison first compares the leftmost characters of the two strings, i.e. the characters at position 0. If those cnaracters are different, the strings are ordered based on the ASCII values of the characters at position 0. If the characters at position 0 are different, then the characters at position 1 are compared. If the characters at position 0 are the same but the characters at position 1 are different, the strings are ordered based on the ASCII values of the characters at position 1. If both the characters at position 0 are alike and the characters at position 1 are alike, then the characters at position 2 are compared. And so on.

    Let's now consider a lexicographical comparison of strings representing non-negative integers. In the homework, you'll need to make such a comparison in order to determine whether such a string represents a value not too big to be stored as an int.

    For strings representing non-negative integers, does a lexicographical ordering correspond to a numerical ordering of the two represented numbers? Yes, IF the two numbers have the same number of digits. Otherwise, not necessarily.

    For example, the number 432 precedes the number 1234 in a numerical ordering, but the string "1234" precedes the string "432" in a lexicographical ordering. Make sure you understand why. Do an actual character-by-character lexicographical comparison of the two strings yourself, as described above.


Back to: