CPSC 457: Operating Systems

Professor Carey Williamson

Winter 2010

Assignment 4 (30 marks)
Due: April 12, 2010 (11:59pm)

The purpose of this assignment is to learn about file systems as well as network-based inter-process communication (IPC) in the Linux operating system. In particular, you will develop a simple RAM file server application, and demonstrate its use by one or more clients. You will do so using application-level and/or systems-level programming. The intention is for you to do your development and testing for a User Mode Linux (UML) environment, but if you have access to an ordinary Linux system, you are welcome to try your luck there instead!

Miniature RAM File Server

Many files in a typical Linux file system are "small" (e.g., 4 kilobytes or smaller). This property is often true for user files, source code files, data files, Web server files, and many more. While these files consume very little space on the external storage system, accessing them is still slow because of the disk seeks and transfer times required. One solution to this problem is to do in-memory storage of files, using a so-called RAM file server, so that files can be accessed quickly, at memory speeds. This technique is especially suitable for small files.

In this assignment, you will design and implement a simple RAM file server that is intended for very small files. For example, you might limit the maximum file size allowed (e.g., 4 KB) on your server, and you might limit the total storage capacity as well (e.g., 1 MB). Despite the conceptual simplicity of your file server, there are technical issues to sort out regarding how files are stored, how directories are represented, and how basic functionality such as "ls", "cat", and "rm" are to be supported. In addition to your RAM file server implementation, you will write some socket-based networking code that allows a client process to communicate with your file server. You must build the server part, the client part, and the IPC mechanism between the two.

A suggested plan for carrying out the assignment is as follows:

  1. (15 marks) Write a C program "stasher" that represents a very simple file server. This "stasher" is designed to stash (i.e., store) copies of small files that a client wishes to keep handy. The file is stored in RAM memory (i.e., in a data structure maintained by the "stasher" process when it is running). For resource reasons, we will limit the maximum file size that can be stored (e.g., 4 KB), as well as the total space consumed by the files stored in the stasher. Both MAX_FILE_SIZE and TOTAL_STORAGE_SIZE should be settable constants in your "stasher" code.

    Your "stasher" program should support four simple operations:
    • STASH filename: The stasher retrieves a copy of "filename" from the (normal) Linux file system, and stores a copy into the file storage area in memory used by the stasher. The current date and time of the store operation is recorded, along with the file name and size.

      Note that YOU must decide how the file contents are actually stored in the stasher (e.g., contiguous, blocks, linked list, other). Please choose something reasonable, bearing in mind that the point of the assignment is to learn about file systems as well as inter-process communication. Please do NOT completely reimplement the Linux file system!! In particular, you do not need to worry about link counts, permissions, etc.

      If the specified filename does not exist, or you do not have permission to read it, or the file is larger than MAX_FILE_SIZE, an error message should result. No change occurs to the stashed file set.

      If the size of the specified file exceeds the currently available storage space in the stasher (but is less than or equal to MAX_FILE_SIZE), then one or more unpopular files are purged (one at a time) from the stasher, until sufficient space is available. The user does not have to be notified of these removals, but you may find debugging messages regarding these removals to be useful in your development process.
    • LIST: This operation prints out a list of the files currently stored by the "stasher". The output should show the name of each file, the size of each file, the date and time at which the file was (most recently) loaded into the stasher, and an integer count indicating how many times the file has been accessed (i.e., viewed) since it was (most recently) stored. The output can be printed in any reasonable ordering of your choosing (e.g., age, alphabetical, popularity, size).
    • VIEW filename: This operation displays the data contents of the specified file, and its popularity count is incremented. If the file does not currently exist in the stasher, then an error message should result.
    • REMOVE filename: This operation removes the specified file from the stasher. If the file does not exist in the stasher at this time, then an error message should result.
  2. (5 marks) Write a simple main program for your stasher that can be used to exercise its functionality (i.e., for debugging purposes). For example, you could have a simple "promptuser-readcommand-docommand" loop for testing and debugging purposes. This can initially be part of the main() for stasher.c if you wish. Do some basic testing to convince yourself (and the TA) that your stasher is working properly.
  3. (5 marks) Modify your "stasher" server-side code so that it can run as a separate (long-lived) background server process, which listens on a socket port address for client requests. Make your server announce (e.g., via printf) the socket port address that it is using, so that the client knows how to connect to it.
  4. (5 marks) Modify your client (main) application so that it can connect to and communicate with your stasher, using sockets. You can have all of your client code in one file if you wish, with a main loop of "promptuser-readcommand-docommand", for testing purposes. Or you can split it into separate files for each command.

    When debugging your socket-based inter-process communication, make sure to do so with the client and server running as separate processes on the SAME machine. In the UML world, you will likely need to use the loopback interface and/or localhost.localdomain for your socket setup. On an actual Linux machine, you can work with real host names and IP addresses, with your client and server processes running on DIFFERENT machines, connected by a network. However, you should do so only once you are sure that things are working properly! You can also try multiple clients sending requests to your server concurrently if you want to, but this is optional.

Bonus (up to 5 marks)

Add a REFRESH filename command to your stasher. This operation updates (if necessary) the contents of the file stored in the stasher. That is, the contents of the stasher's version of the file and the contents of the actual file in the Linux file system are compared to see if they are identical or not. If identical, then no further action is required by the stasher. If they are not identical, the stasher retrieves and stores a fresh copy of the file, with its new contents (and possibly size, as well). The popularity count is reset to zero if a new version of the file was just loaded, and left unchanged otherwise. The output resulting from this command should indicate whether a new version had to be loaded or not. Note that refreshing a file could result in having to remove some other files to make more room if the new updated version is larger than the previous version. As usual, if the specified filename does not exist in the stasher, then an error message should result.

Comments, Tips, and Hints

This assignment is actually much easier than Assignment 3. In fact, you can probably do most of it solely with application-level programming in C, with no new system calls or kernel code required. Be sure to attend the tutorial on Linux file system organization, as it will give you good insights into some of the design tradeoffs and technical issues involved in your file server. If you have never done socket programming before, you had best attend the tutorial on March 25 as well. As usual, it is best to get started on the assignment early!

Note that the user of the stasher can specify filenames using either a relative pathname or an absolute pathname, and may do so from any directory that he or she chooses. Be careful about this. For example, you probably want to work only with absolute pathnames inside the stasher, so that you are not dependent upon the working directory in which the stasher process was started, and so that you do not run into naming conflicts on relative file names.

Submitting Your Assignment

When you are finished, send your solutions directly to your assigned TA via email, using a single email attachment (e.g., gzipped tar file, including a README file, relevant source code, and sample output). Multiple repeated submissions from the same student are frowned upon, as are multiple email attachments. Please put an appropriate subject line on your email. The email subject line should be in the following format: {Tutorial section}_{Assignment number}_{First name}_{Last name} (e.g., T01_A3_Ali_Abedi) and the name of the attached file should be: {First name}_{Last name} (e.g., Ali_Abedi.tar). Submissions must be received on or before the stated submission deadline, otherwise a late penalty of 10% (3 marks) per day will apply.