Java Revolution Spiral 1 - Files

Let the Revolution Begin

All these "Java Revolution Spiral 1" documents will be initial, brief, non-programming passes over the HL programming concepts from IB CS Topics 5 and 7. For the most part we'll go backwards from the destination. The idea is that knowing where you are heading puts the getting there in much better context. Another thing hoped for with this approach is that some of these concepts will nicely ferment in the back of your mind over the months between when they are first introduced and when they are actually taught in detail, and with the programming solution presented.

Saving Data

The first important point to make at this stage is that data is no good if it cannot be permanently stored and retrieved. So far we have meandered to and fro with some neat programming constructs and algorithms, and actually produced some pretty schmantzy programs, on up to full GUI database programs not that dissimilar in terms of user experience from full IB CS dossiers and other "entry-level" real databases. But all of the data we entered and then manipulated/sorted/searched went away as soon we closed our program. Yes, our programs themselves were saved, and were able to be re-opened. But the data that we worked with disappeared as soon as we quit Netbeans.

So, yes, saving data files is important, and fairly easy to do - very analogous to the way we used BufferedReader etc. to read, there are pre-made Java classes which specialize in both reading (this time from files, rather than input from the keyboard), and saving.

But in terms of the conceptual approach to the way we should choose to implement files, there are choices to be made.

A Bit of History

Much of the theory on file saving strategies dates from the time when data was saved primarily to tapes. And it can all seem a bit antiquated to talk about sequential files, since now we save to hard drives most often, not to tapes, and you would think that we must therefore be using direct access files all the time. But in fact there are still very good reasons to work with sequential files, and theres no reason in the world that a hard drive cannot work in a sequential way. Both memory efficiency and speed efficiency need to be considered when choosing the kind of file organization to use. More on this soon.

The All Important Topic 7 Chart: Text Book Analogies

I've titled this section what I did because, ultimately, if you understand what is in the chart at the beginning of Topic 7 of the IB CS syllabus, you've got the main Topic 7 points.
Here's a link to the syllabus; Topic 7 is on numbered pages 44-45, though in Safari pages 50-51 gets you there directly (as opposed to sequentially...).

I won't copy and paste the syllabus here, and you will need to get used to using this syllabus, so it is assumed that, accompanying these notes, you've got the syllabus open to the correct place in another tab. On the second spiral of Topic 7, I'll put more detailed notes in the "Assessment Statements" part of the website. They are not done yet, and are not needed for the general overview of the topic.

Text Book Analogies of File Structures:

Sequential File - To find a page, you start at page one and flip the pages until you find the page you are looking for.

Partially-indexed File - You use the Table of Contents, which takes you to the general area/chapter of the text book, and then sequentially look for the page in question.

Fully-indexed File - This would be a text book where each and every page is separately listed in an index. So you sequentially go through the index until you get to the page you want, and then you directly go there.

Direct Access File - You would somehow, magically, be able to do a calculation for the page on which the information you were to look for is located, and directly go to that page.

A Bit More History

"Oh tell us a story Mr. Rayworth! What was it Really like waaaay back in the old, old, old days??? Tell us, oh tell us do!!..." "Well, children...in the olden days we really did save stuff on tapes. And ask me about punch cards one day... "

In fact the idea of direct file access was as much a Shangrila notion back in the day, as the notion would be of having a magical way of being able to open a big text book to the exact page you are looking for each time. Because, just like with a cassette tape, the magnetic media on which the data was stored, had to be re-wound and fast-forwarded to find the exact part of the file that was needed. If you have ever worked with a cassette deck you'll know the frustration, versus using a CD player which directly accesses the track you are looking for. And if that picture doesn't work for you, try to picture an old James Bond movie where down in the secret lair, a wall of computers' tape drives are whirling back and forth, reading and writing data. So, yes, back in the day, it was all sequential file access; there was no other way. Then along came hard drives, and the other means of more quickly accessing data became possible.

Files, Records and Fields?

So what are saving anyway, and how is it fundamentally organized?

Well we're saving data, that's what. And the basic structure of organized, saved data is the file.
The file itself is made up of individual records.
And those records are made up of fields, which are like attributes of an object. Each field has a specific data type.
(And by the way, a bunch of related files together make a database.)

So, here's the hierarchy:

database --------> files ---------> records ----------> fields

Example:
File: iTunes Music Library

Media (.mp3)	Name (String)	Artist (String)	Length (int)	Like (boolean)
Dancing-in-the-dark.mp3	Dancing In The Dark	Lady Gaga	190	true
Stairway-to-heaven.mp3	Stairway To Heaven	Led Zeplin	385	true
Teenage-lobotomy.mp3	Teenage Lobotomy	The Ramones	120	true

The file has three records. And each record is made up of five fields: two of the String data type, along with one int, one boolean, and one media file field. (Which makes this a curious example as it is a file which has one of its fields other files.)

File Structure # 1: Sequential File

This is a file structure in which data is written sequentially, one record after the other to the storage media. To find a specific record in the file, a sequential search must be performed starting at the beginning of the file and progressing until what is being searched for is found.

File Structures # 2 and 3: Fully Indexed and Partially Indexed

The analogy above of using a Table of Contents vs. using a really complete textbook index works very well at conveying how these structures work. And for the partially indexed file, another good example would be how a music CD works, with you choosing the song number, and the reading head moving to the beginning of that song file.

The only thing to keep in mind is that there are two parts to the file searching of each; first there's the looking through the index part, and then there's the going to the group of records (partially indexed) or specific record (fully indexed) part. And these lookin- through-the-index parts, are, in fact, mini sequential searches.

So for the partially indexed file, it's three steps:

Sequential search through the index file.
Direct access of the address found, which is the beginning of a the group of records.
Sequential search though the group for the specific record.

And for the fully indexed file, it's two steps:

Sequential search through the (probably relatively big) index file.
Direct access to the address found, which will be the specific record being looked for.

File Structure # 4: Direct Access

More on how shortly, but for now, just accept that there's a way to perform a calculation on a certain record name which yields a number that can in turn be used as an address. And so by performing the calculation you get the address and put the record there. So then, to find the record, just perform that exact same calculation, and that will yield the address where you should look.

Let's say that we have 25 records in a database of songs. We perform a calculation on the name of the song, "Dancing in the Dark", for example, and get the result 15. So we put that record in the 15th position of the file. And so when we go to find the song, we perform the exact same calculation on the exact same name of the song, "Dancing in the Dark", and we get exactly the same result, 15. So that's the position where we look, and lo and behold that's where the song record is. (But do read on in Part 2 of Files for the rest of the story.)

Revolution-Spiral1-Files-2.php