Parsing a Text File Using StringTokenizer

Often times files are saved with some sort of "delimiter" separating the different parts of the file, whether they be thought of as records, or entries, or whatever. A good example of this is when you export data from a spreadsheet like Microsoft Excel; there needs to be a way of keeping track of which data belongs in each cell. So when exporting or importing you can declare what character you will use as a delimiter. Often times this is a comma, and often times it is a colon. In fact, the CSV file format is one commonly used by financial institutions for saving account information, and it stands for Comma Separated Values; i.e. comma delimited data.

Java has a quite simple class for looking for delimiters and thereby slicing a file up into its parts. StringTokenizer only has eight methods of its own, and you really only ned to take advantage of three of them to separate a file via delimiter characters.

StringTokenizer(String s, String d)
The constructor takes in a String to be sliced up, and a String that will be the delimter - which is usually a single character, but can be more than one, in which case, each one of the characters acts as a delimiter.

This will allow us to loop through a file while there is still more to be read. A "token" is an individual sliced piece of data.

This returns the text between two delimiters.

So here it is in action:

        String readIn = br.readLine();  //For this example, assume that this is from a BufferedReader object instantiated with a certain file.
        StringTokenizer st = new StringTokenizer(readIn, ":");

In the above example, we assume that the file was saved with writing a : between each "token" of information.

Why Parse A Text File

If you think back to the way BufferedWriter works, you'll recall that if we add a \n character at the end of a String (or a line), then we can read one String (or line) at a time, with the readLine() method. So why, you may wonder, would we ever want to forfeit that approach and use this one. The answer is that using StringTokenizer is more efficient. Using readLine() means going to the hard drive each time we are reading a String (or a line) - it would be much more efficient to read from the (slow) hard drive once, and then with the file in (fast) RAM memory, do all of the chopping up into different Strings (lines) there.