CPSC 457: Operating Systems

Professor Carey Williamson

Fall 2008

Assignment 4 (30 marks) Due: December 2, 2008 (11:59pm)

The purpose of this assignment is to learn about the Linux file system. In particular, you will study the layout of files on the disk, and the impacts on disk I/O performance. You will do so via a combination of systems-level programming in the OS kernel, as well as some application-level programming. The intent is for you to do your development and testing within User Mode Linux (UML), but if you have root (superuser) access to your own dedicated Linux system, you are welcome to try your luck there! See also the Assignment 4 tips.

My Favourite Year

The year 1984 was a special one. And not just because of George Orwell's book. Even better than that. Former movie actor Ronald Reagan was in the White House as President. Musical performers like Prince, Tina Turner, Duran Duran, Cyndi Lauper, Madonna, and Lionel Richie (gag!) were topping the charts. The Oakland Raiders were in their glory years, capturing Super Bowl XVIII. Wayne Gretzky was in his prime as a player, leading the (hated) Oilers to their first-ever Stanley Cup. And perhaps best of all, there was the release of the new Unix Fast File System. A perfect file system, for the world's most perfect operating system. Ahhh! Those were the days.

The Unix Fast File System (FFS) was a major milestone in the history of operating systems. It represented a complete redesign of the file system, with the primary emphasis on performance. While the early design of the Unix file system was simple and elegant, the popularity of Unix and the growing size of its system installations were making file system performance an issue. The UC Berkeley folks met this challenge head-on, with a total revamp of the file system layout in BSD Unix. Clever use of inodes and in-memory caching. Consideration of fragmentation issues and block sizes for rapidly-improving disk technologies. Clustering of file system directories. Careful layout of data blocks on the disk. Conscientious use of indirection blocks to support ever-growing file sizes. Many of these innovations remain in the Linux file system today.

revcat (5 marks)

Write a simple application-layer program revcat that allows you to display the contents of a file backwards (i.e., with the bytes in reverse order). For example, if you have a file called "foozle", then "cat foozle" might show:
    Mary had a little lamb;
    A little pork, a little ham.
while "revcat foozle" might show:
    .mah elttil a ,krop elttil A
    ;bmal elttil a dah yraM
Note that this representation is approximate only, since there might be carriage return and line feed characters in the file as well, which you should handle in a reasonable way.

Construct a simple text file that is small (less than 1 KB). Show sample output to indicate that your revcat works properly (e.g., "revcat < foozle", "revcat < foozle | revcat > foozle2", "diff foozle foozle2", "head foozle", "tail foozle"). Then construct a medium-sized file (more than 10 KB, but less than 100 KB), and one that is large (more than 1 MB). Using these three files, test "cat" and "revcat" on the command line to see if there are any observable performance differences when displaying and/or copying files (e.g., "revcat < foozle > foozle2"). Show sample output for your measurement results. Comment on your observations.

Changing Data Block Contents (10 marks)

A Unix file is basically a linear sequence of bytes, from start to finish. While your application-layer program doesn't actually change the layout of files on the disk, now it is time to start messing with the file system, and slowing it down.

Imagine a really stupid file system, wherein the data contents of a file are actually written in reverse order onto the data blocks, even though the data blocks of the file are allocated properly (e.g., in sequential order) on the disk. Make a file system layout that is stupid like this, using the default file system data block size on your system.

For example, if the file system is using ridiculously-small 8-byte blocks, then you want to change the file system from using:
    Block i: Mary had
    Block i+1: a little
    Block i+2: lamb; A
    Block i+3: little
    Block i+4: pork, a
    Block i+5: little h
    Block i+6: am.
to a new layout like this:
    Block i: dah yraM
    Block i+1: elttil a
    Block i+2: A ;bmal
    Block i+3: elttil
    Block i+4: a ,krop
    Block i+5: h elttil
    Block i+6: .ma
Again, this representation is approximate only, because of carriage return and line feed characters that might be present in the file. Also, you as a file system designer have to decide whether a partially-full data block goes at the start or end of the file, as well as which end of that data block gets used, and what happens when a user appends to a file (e.g., "cat >> foozle").

For a small file, use cat and revcat to show how the file appears to the user. (The file should look "normal", even though it is stored differently.)

Changing Data Block Layout (10 marks)

Next, we will mess with the file system even further, to slow it down even more. We will do so by altering the data block layout on the disk. We will try two different layouts: one with the data blocks laid out in reverse sequential order on the disk, and the other with the data blocks laid out at random locations on the disk.

For example, if the file system is using 8-byte blocks, then you want to change the file system from using:
    Block i: Mary had
    Block i+1: a little
    Block i+2: lamb; A
    Block i+3: little
    Block i+4: pork, a
    Block i+5: little h
    Block i+6: am.
first to a new reverse-order layout like this (5 marks):
    Block i: .ma
    Block i+1: h elttil
    Block i+2: a ,krop
    Block i+3: elttil
    Block i+4: A ;bmal
    Block i+5: elttil a
    Block i+6: dah yraM
and then to a random-order layout like this (5 marks):
    Block h: elttil a
    Block j: A ;bmal
    Block p: .ma
    Block n: h elttil
    Block w: dah yraM
    Block x: a ,krop
    Block z: elttil
These layouts should defeat any sequential read-ahead optimizations that the file system or disk are doing.

For a small file, use cat and revcat to show how the file appears to the user. (The file should look "normal", even though it is stored differently.) For a medium-sized file, show the data blocks actually used in each of your two file system layouts.

File System Benchmarking (5 marks)

Using three existing text files (one small, one medium, and one large) do some simple benchmarking tests (e.g., using cat, cp, diff, head, tail, or a loop containing one or more of these) to see if there are any observable performance differences between your different file layouts compared to the default file system (probably ffs, ext2, or ext3). Show sample output for your results, supplementing with iostat or file system statistics if needed. Comment on your observations.

Submitting Your Solution

Use the submit command to send your assignment solution to your TA. Among the materials submitted, please include any kernel files that you modified, along with a brief description of what you did, what works, and what does not. Also include your user-level code, and some sample output to show that your wacky file system modifications are working.

Make sure to take a look at the Assignment 4 tips.