<--staticfs files ^--VFS--^ staticfs finalizing-->

How to write a Linux VFS filesystem module - StaticFS - addresses

March 15, 2004

StaticFS

Addresses

Once again, the structure we're concerned about populating (from /usr/src/linux/include/linux/fs.h):
struct address_space_operations {
        int (*writepage)(struct page *);
        int (*readpage)(struct file *, struct page *);
        int (*sync_page)(struct page *);
        /*
         * ext3 requires that a successful prepare_write() call be followed
         * by a commit_write() call - they must be balanced
         */
        int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
        int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
        /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
        int (*bmap)(struct address_space *, long);
        int (*flushpage) (struct page *, unsigned long);
        int (*releasepage) (struct page *, int);
#define KERNEL_HAS_O_DIRECT /* this is for modules out of the kernel */
        int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);
        void (*removepage)(struct page *); /* called when page gets removed from the inode */
};
Once more, the fact that our filesystem is read-only lets us ignore a lot of things: writepage, prepare_write and commit_write; probably direct_IO and removepage can probably be left out. I'll look into them later. sync_page, flushpage and releasepage are likely write-related, or if not, don't seem to matter to us now. bmap I have no idea about. Yet another thing I don't understand about VFS. Hopefully the comment about the bmap entry is referring to bmap, but I'm not so sure, as so many of the VFS modules are currently using it.

March 16, 2004

Let's look at how romfs implements this function (having already seen ramfs's version).

/*
 * Ok, we do readpage, to be able to execute programs.  Unfortunately,
 * we can't use bmap, since we may have looser alignments.
 */

static int
romfs_readpage(struct file *file, struct page * page)
{
        struct inode *inode = page->mapping->host;
        unsigned long offset, avail, readlen;
        void *buf;
        int result = -EIO;

        page_cache_get(page);
        lock_kernel();
        buf = kmap(page);
        if (!buf)
                goto err_out;

        /* 32 bit warning -- but not for us :) */
        offset = page->index << PAGE_CACHE_SHIFT;
        if (offset < inode->i_size) {
                avail = inode->i_size-offset;
                readlen = min_t(unsigned long, avail, PAGE_SIZE);
                if (romfs_copyfrom(inode, buf, inode->u.romfs_i.i_dataoffset+offset, readlen) == readlen) {
                        if (readlen < PAGE_SIZE) {
                                memset(buf + readlen,0,PAGE_SIZE-readlen);
                        }
                        SetPageUptodate(page);
                        result = 0;
                }
        }
        if (result) {
                memset(buf, 0, PAGE_SIZE);
                SetPageError(page);
        }
        flush_dcache_page(page);

        UnlockPage(page);

        kunmap(page);
err_out:
        page_cache_release(page);
        unlock_kernel();

        return result;
}
The comment at the top is interesting. First, the "to be able to execute programs". Does this mean that if we had implemented bmap instead of readpage, we'd be unable to allow executrion in our filesystem? The next part, about being unable to use bmap because they "may have looser alignments" is also interesting. Now I really want to know what bmap is for, and how to use it. But first we write our own readpage.

romfs starts off by getting the inode that contains the page and setting "getting" the page with page_cache_get(), which after following the code, just increments a counter in the page structure to say that we're using it. Then it calls lock_kernel(), which is the BKL -- Big Kernel Lock.

Now, if you look at /usr/src/linux/Documentation/filesystems/Locking, you'll see that they have the rules on locking with VFS. Unfortunately, they're not that clear. This is what it says for address_space_operations:

locking rules:
        All may block
                BKL     PageLocked(page)
writepage:      no      yes, unlocks
readpage:       no      yes, unlocks
sync_page:      no      maybe
prepare_write:  no      yes
commit_write:   no      yes
bmap:           yes
flushpage:      no      yes
releasepage:    no      yes
They have this "BKL" column, but what does it mean? That they can or cannot use the BKL? That they should or should not? That they must or must not?

It's not entirely clear, but the comments afterwards are a little more enlightening:

        ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
may be called from the request handler (/dev/loop).
        ->readpage() and ->writepage() unlock the page.
        ->sync_page() locking rules are not well-defined - usually it is called
with lock on page, but that is not guaranteed. Considering the currently
existing instances of this method ->sync_page() itself doesn't look
well-defined...
        ->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
filesystems and by the swapper. The latter will eventually go away. All
instances do not actually need the BKL. Please, keep it that way and don't
breed new callers.
        ->flushpage() is called when the filesystem must attempt to drop
some or all of the buffers from the page when it is being truncated.  It
returns zero on success.  If ->flushpage is zero, the kernel uses
block_flushpage() instead.
        ->releasepage() is called when the kernel is about to try to drop the
buffers from the page in preparation for freeing it.  It returns zero to
indicate that the buffers are (or may be) freeable.  If ->releasepage is zero,
the kernel assumes that the fs has no private interest in the buffers.

        Note: currently almost all instances of address_space methods are
using BKL for internal serialization and that's one of the worst sources
of contention. Normally they are calling library functions (in fs/buffer.c)
and pass foo_get_block() as a callback (on local block-based filesystems,
indeed). BKL is not needed for library stuff and is usually taken by
foo_get_block(). It's an overkill, since block bitmaps can be protected by
internal fs locking and real critical areas are much smaller than the areas
filesystems protect now.
The last paragraph is interesting. I think we'll avoid using the BKL then, and try and make sure we're not doing anything multithread-bad. The line about bmap is also interesting. So bmap is there to support legacy ioctl() calls. It sounds like we shouldn't be doing this, letting those ioctl() calls die out, so we'll avoid bmap after all.

Back to romfs's readpage. After the kernel is locked, it calls kmap() to get a pointer to the memory to fill for this page. We saw this in ramfs as well when it filled the page with zeroes. Later on kunmap() is called to release it (this is in case a mapping has been set up to virtual memory).

Next is the page->index value. We told the VFS that our blocksize is PAGE_CACHE_SIZE, and we can use PAGE_CACHE_SHIFT, much like romfs does, to take the index and turn it into an offset. That's if we needed an offset! As it is, we shouldn't ever be asked for an index more than 0, because none of our files are larger than PAGE_CACHE_SIZE. We should return EIO if we are asked, much like romfs does if asked for an offset further than it has data for.

romfs loads the buffer with the data, and if the buffer isn't filled, it goes through and fills the remaining buffer with zeroes. Since we know we'll never fill our buffers, I think we'll take a different approach and fill it with zeroes first, then put in our data.

SetPageUptodate() sets a flag inside the page structure if successful. If not successful, the page is zeroed out and SetPageError() is used to tell the caller than things went bad. flush_dcache_page() is then called, and looking at the source in /usr/src/linux/include/asm-i386/pgtable.h, it does nothing -- on Intel at least. That's great -- I'll do that too. Apparently other platforms require more guts, but I don't care at this point. The last few things romfs does is to call UnlockPage(), as the Documentation/filesystems/Locking file said it should, calls kunmap() as we mentioned before, calls page_cache_release() to undo to counter we set at the beginning, and finally releases the BKL with unlock_kernel(). Seems simple enough!

Let's write our own.

static int staticfs_readpage(struct file *file, struct page *page) {
  struct inode *inode = page->mapping->host;
  void *buf;
  int result=-EIO;
  int ino=inode->i_ino;

  page_cache_get(page);
  buf=kmap(page);

  if (buf) {
    memset(buf,0,PAGE_CACHE_SIZE);
    if (page->index==0) {
      if (ino==1 || ino==3) {
        switch (ino) {
        case 1:strncpy(buf,"These are the characters in file a.",35);break;
        case 3:strncpy(buf,"These are the characters in file c.",35);break;
        }
        SetPageUptodate(page);
        result=0;
      }
    }
    if (result) {
      SetPageError(page);
    }
    flush_dcache_page(page);
    UnlockPage(page);
    kunmap(page);
  }
  page_cache_release(page);

  return result;
}
Looks a lot like romfs except for the method of getting the contents of the files. You'll note that I rearranged my code so it doesn't use the gotos. romfs has this to say:
/*
 * Sorry about some optimizations and for some goto's.  I just wanted
 * to squeeze some more bytes out of this code.. :)
 */
I don't know that I believe there was anything squeezed out -- compilers these days are smarter than we are. There's nothing new to my code above, so we won't go over it.

Now we just need to put together our address_space_operations structure.

static struct address_space_operations staticfs_aops = {
        readpage:staticfs_readpage,
};
<--staticfs files ^--VFS--^ staticfs finalizing-->
©2002-2017 Wayne Pearson