Explain why compression of data is often necessary when transmitting across a network.


Teaching Note:

S/E, INT Compression has enabled information to be disseminated more rapidly.


Sample Question:


JSR Notes:


Well, taken at face value, this is pretty straight-forward: when you compress something, like a full quality image to a jpeg, or a full quality song to an mp3, those files are smaller, and smaller files hog less network bandwidth, taking less time to transfer over a network.

To take an example, at a certain transmission speed - say 8 Mbps (Megabits per second), which is a common download capacity in 2013, a 100 Megabyte file (that's 800 Megabits) would take 100 seconds to be downloaded (assuming all 8 Mbps were available, which is rare, given the shared nature of community networks). If that file were compressed to half its size (50 Megabytes), then it would take half the time to download, i.e. 50 seconds.

Though do note that the assessment statement uses the word "necessary". Why is compression necessary. Well, it's nice to download your file in half the time, but in terms of being necessary, you should be thinking about the macro view of the overall network. Any given network reaches a point where it is simply impractical or unusable if there is too much traffic, and so one of the ways to keep it practically useful is to limit the amount of data being transmitted, and one great way to do that is to have compression of files done regularly.

The other thing to note in the Teaching Note is the idea of disseminating information more rapidly. This hints at the direction the Internet and data/application distribution is heading currently. Companies, in particular, are keen to introduce and expand the "cloud" model of computing, in which they keep control of not only all data, but all applications. And so not only is data to be stored in various "clouds" (actually mega warehouses with thousands of servers), but also when users run an application (for example Word or Photoshop), they are actually running it across the Internet from the "cloud". This means massive amounts of information moving back and forth across the Internet, and means that the more techniques that can be used for limiting and controlling that amount of information the better. Compression will be one of those things that allows this expansion of data transmission. Certainly there are other ways of limiting the amount of data being transmitted around the Internet, like not streaming video to watch movies, but using traditional TV broadcast or DVDs, and saying No in general to The Cloud (and, dare I say) the control of Us by Them that that brings with it. But whatever else is going on, if you take all the data sloshing around the Internet and compress it by 50%, you have increased the capacity of the Internet by that amount - it is a linear and easy technique to employ.

DO LOOK AT ST. JULIAN'S PAGE. In particular, the lossy vs. lossless image is good.


So in terms of compression techniques, though not referred to directly in the assessment statement or the Teaching Note, you kinda have to have an idea of what they are, I think. So don't take too much time with what follows, but do have a general understanding of compression algorithms/techniques.

Lossy vs. Lossless

First of all you should realize that there are two possible results of compression, in terms of the loss of information. In lossy compression techniques, data is indeed lost through the compression. When expanded, the original file does not contain all of its original data. In the case of lossless compression techniques, upon expansion after compression, the file is back to it's original state; no data at all is lost (thus the term "lossless").

Image Compression

A photograph taken by a digital camera can be Megabytes and Megabytes. For under $100 (2013) you can buy a camera which takes 16 Mega pixel pictures. So at full quality, uncompressed, 24 bit color (refer to 2.1.10), that's 16 million x 3 bytes, or 48 Megabytes for one snap shot of your friends making faces at each other on the weekend. That's a lot of memory for something that does not have to be perfect quality - which is actually useful if you are going to zoom way in and work with it in Photoshop, for example. So it can be saved in several compressed formats, including GIF and JPEG.

With GIF compression, series of pixels that are the exact same color are found and recorded. So, say there are 20 white pixels in a row, the file will not save 20 three-byte (assuming 24 bit color) pieces of information (60 bytes in total), rather it will record the number of white pixels in a row, so three bytes for the color white, and then, two bytes to store the number (5 bytes in total). See pages 178-181 of the yellow and red text book for a discussion of this, with a good set of diagrams. This is a lossless kind of compression.

With JPEG compression, (basically) square regions of an image are turned into gradients, whose mathematical representations are stored, rather than all the pixel information of each individual pixel. So, from one corner of a region which is dark brown, to the opposite corner which is light brown, the colors of in-between browns are distributed. There's lots more to it than this, but you get the idea. This is a lossy means of compression, as you are losing the actual pixel information; it is replaced by the various gradients. But zoomed out, and done in a limited way (choosing jpeg high quality when compressing), it can be hard to see the difference.

Recall me being zoomed in on a Photoshop image when saving as jpeg, and seeing, live, the gradients appearing.

The most common uncompressed image format (which, therefore saves each and every pixel) is TIFF.


See Photoshop Save for Web

Audio Compression

With digital audio, the most common uncompressed format is AIFF. It uses the "CD standard" of 44,100 samples per second and 16 bit sample size (or bit depth). So this means that each sample is the sound frequency that exists for a 1/44,100 fraction of a second, and each sample can be one of 65, 536 (2 ^ 16) frequencies.

So each second of an AIFF is 16 bits (2 bytes) x 44,100, or 88,200 bytes. A 3:00 minute single on the radio would thus be 180 seconds x 88,200 bytes, or 15 Megabytes x 2, because it's actually two channels (i.e. stereo), so 30 Megabytes. And from your personal experience you know that an average (compressed) mp3 is only about 3 Megabytes, so that's lots of compression.

To save time, I'm going to copy and paste images from the Internet, but will replace them with my own diagram some time:

Sound wave (analogue - i.e. continuous change in frequency - an infinite number of points along it.)

Digital sampling of that sound wave.

Compression could be done in a lossless, similarly to GIF, by finding groups of exactly the same samples. So in this diagram there are two groups of three. To be efficient, you would be looking for more in a row than this, but you get the idea.

A lossy audio compression technique (such as MP3) could be a lot "rougher" and look for groups of samples that are close to each other, and then save that group as all the same frequency. (So from sample # 27,123 in second 44, to sample # 27,300 in second 44, digitally save frequency 17.125 kHz - whereas in the uncompressed file, there would be a lot of variation in those 77 samples.)


Libor reports that the difference between full .wav to .mp3 is 10 to 1. So you have lost a great deal of information and fidelity.

Mastering now record in 24 bit 96 kHz. And 24 bit range of sound would mean 16,000,000 different pitches possible.