Stream I/O

Introduction

Up until now, we have been working with programs that call classes manipulate data and occasionally print something out. We have been building classes and testing them with fixed values. The is a limit to how useful such a program can be as it has limited interaction with the outside world. In the next few lectures we are going to deal with how to get information into and out of our programs.

Today we are talking about Stream IO, which will allow use to read from and write to files and terminal windows (as well as networks connections as we'll see next week).

The Java stream library is particularly (and unnecessarily) complex, so we will first discuss the general properties of stream IO. Then map these ideas onto the Java implementation.

Where does stream input come from (or go)?

Stream semantics

Data Formats

In general, we do not want to read and write uninterpreted bytes. We are interested in reading higher-level datatypes, integers, floats, characters, object, text strings etc. This leads to the problem of how to read bytes and assemble them into the datatypes we are interesting in. In general, to do this we need to know how the data is stored in the file we are reading (or produced by the source we are reading from).

Note on binary format: You may be used to looking at text files. However, store large amounts of numerical or inunterpreted data (such as compiled code) is very inefficient. For example, a Java int take 4 bytes of memory in binary form, if we store it in text form, it will take about 10 bytes. Hence, non-text data is often stores be directly dumping the memory representations into a file. These are known as binary files.

For example, an integer in Java is 4 bytes. This would usually be stored as 4 consecutive bytes in a file, so we can read 4 byte to construct our integer. However, in which order should the bytes be assembed? It turns out different machines and systems organize this differently. In some languages, such as C, one actually has to assemble all of your datatypes by hand from bytes, and worry about what type of machine they were originally written on. Fortunately, Java saves us much of that work (at the price of complexity in the number of stream classes).

This problem of data formats occurs in text strings as well. Many text files in Western systems are stored using 1 byte per character. For Western alphabets, this works fine and there are standard encodings, such as ASCII and Latin 1). For non-western language, such as Chinese, we been more bytes per character and the encoding becomes more complex.

The Java stream library

Some examples

Read a line of input from the terminal and print it back.

  InputStream in1 = System.in; // get InputStream from system, byte stream
  InputStreamReader in2 = new InputStreamReader(in1); // adds Unicode conversion
  BufferedReader instream = new BufferedReader(in2); // add buffering

  // BufferedReader support readLine() so we can get a line (strips newline)
  String line = instream.readLine();
  System.out.println(line); 
Read binary data from a file (note try/catch to handle IOExceptions)

  try{
   DataInputStream in = 
        new DataInputStream ( 
           new BufferedInputStream(
              new FileInputStream("infile.bin")));
   
   int data = in.readInt();
  }
  catch(IOException e){
    // code to handle IO exceptions (no file, etc)
  }

Tokenizing

One other class that is useful enough to be worth mentioning in StringTokenizer. This class will break up a text line into a sequence ot words (or a sequence of tokens separated by some other character). See description in book.

File and directory manipulation

Recitation