Assignment 4 Tips

Here are some tips for Assignment 4.

First, some general advice and tips about information sources. It is always worth re-reading the Assignment 2 tips for general information about working with UML.

Second, it is really worth reading Sections 11.1 to 11.5 (inclusive) in the textbook. An hour well spent, since it sets the context for the assignment well.

Third, it is worth looking at the Useful Links page for some information about the Linux file system. Just some detailed technical reference material if you need it.

Now for some hints and suggestions directly related to the assignment...

Start by looking at Figure 11.1 in the textbook. It shows the different layers at which one can deal with the file system. The assignment basically has you playing with the file system from different layers (starting with the application layer, and working downwards).

First, write revcat. It is easy and fun. About 20 lines of code. Five marks in the bank.

Now for the more substantial parts of the assignment. There are at least three different ways to do it: the VFS way, the shim way, and the system call way.

The VFS way is the most elegant, but perhaps a bit ambitious. It would involve creating a new file system type, perhaps called wackyfs, and then running this as a local file system with a VFS interface. One could start from an existing file system type (such as ufs, ffs, xfs, ext2, or ext3), and then modify the functionality to alter the data layouts and disk block layouts. You could then mount this new file system at a place of your choosing inside UML, and use it for your testing. Definite bonus material if you do it this way! (One word of caution: you really do not want wackyfs to apply to everything you do inside UML, because it will undoubtedly mess up your Linux shell, executable files, compilation, etc. You just want the wacky stuff to happen in your special mounted wackyfs area. CPSC 457 TA David Ma has some info on his Web page about how to create and mount your very own file system, if you are thinking about doing things this way.)

The shim way is all about layering. Recall Figure 11.1 in the textbook, which shows the progression of layers from application layer view of the file system all the way down to I/O requests on secondary storage devices. In essence, you want to add a new "shim" layer somewhere in this chain that alters the data representation before it makes it all the way down to the disk. For safety's sake, you probably only want to apply this for a particular user, which is sufficient for the purposes of the assignment. To elaborate, imagine creating two users in your UML world, perhaps called "Mat" and "Tam". For user Mat, the file system works as normal in Linux. For user Tam, the file system is wacky, with data layouts and block selection completely different. From Mat's account, you can look at Mat's normal files, as well as at Tam's files to see how they are being stored. Similarly, from Tam's account, you can look at Mat's normal files, as well as at Tam's files to see if they look right or not. (As root, you can use "chown" to change the ownership of files as well to facilitate your testing.)

In terms of implementing the shim approach, you basically want to insert a line of code in the "write path" of the file system code that does something like this:
if UserID == Tam
then reverse the bytes in the buffer before writing them to the data block
By symmetry, you will insert a line of code in the "read path" of the file system code that does something like this:
if UserID == Tam
then reverse the bytes in the buffer after reading them from the data block

Code like the above can basically get you the 10 marks for the second part of the assignment. (Extra tip: if you really want to be cautious, you could restrict the wackiness to just Tam's files that start with the name "foo". This is a very good idea until you are sure you know what is going on in the file system code.)

The disk block layout stuff is a lot harder. But once you find the right place in the code, you can probably do the "small" file trivially (one data block), and the "medium" file manually (maybe a dozen data blocks). Generalizing your code to the "large" file will take a bit more work. Do what you can, and try to get some or all of those 10 marks.

The system call way is much more constrained, and should be doable in the short time that you have available. Think of doing Assignment 4 a bit like you did Assignment 2, as an evil system administrator. In particular, you will apply the wacky file layout one file at a time, rather than system-wide. Perhaps an easier and safer approach, which would also suffice for the purposes of the assignment.

With this approach, you basically need a system call that takes a file name or inode number, and then re-arranges the layout of that file. From user space, you should be able to view the results. You probably also want a system call that puts things back the way they were. Perhaps a new field in the inode structure (i_wacky) can help you keep track of which layout is which.

Then do some of the system benchmark testing for correctness, functionality, and performance. Up to 5 more marks here and you are done.

Finally, here are some specific files you might want to know about:
/tmp/YOURNAME/uml/linux/include/linux/fs.h (for declarations of key VFS file system data structures)
/tmp/YOURNAME/uml/linux/include/linux/ext3_fs.h (inode structure and manifest constants for EXT3 file system)
/tmp/YOURNAME/uml/linux/include/linux/buffer_head.h (struct buffer_head used for managing data blocks and buffer cache)
/tmp/YOURNAME/uml/linux/include/linux/bio.h (declarations of struct bio for block I/O)
/tmp/YOURNAME/uml/linux/fs/read_write.c (for routines like vfs_read(), vfs_write(), and do_sync_write() )
/tmp/YOURNAME/uml/linux/mm/filemap.c (for routines like file_read_actor() )
/tmp/YOURNAME/uml/linux/fs/mpage.c (for routines like do_mpage_readpage() )
/tmp/YOURNAME/uml/linux/fs/ext3/file.c (for routines like ext3_file_write() )
/tmp/YOURNAME/uml/linux/fs/ext3/inode.c (for routines like ext3_read_inode() and ext3_update_inode() )
/tmp/YOURNAME/uml/linux/fs/ext3/balloc.c (for routines like ext3_new_blocks() and ext3_try_to_allocate() )

CPSC 457: Operating Systems

Professor Carey Williamson

Fall 2008

Assignment 4 Tips