<--staticfs addresses ^--VFS--^ staticfs continued-->

How to write a Linux VFS filesystem module - StaticFS - finalizing

March 16, 2004

StaticFS

Finalizing

Well, we've got most of a module, written. Now we've just got to get a bunch of includes in there and see if it compiles! For includes, I've just stolen all of them from romfs:
#include <linux/module.h>
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/romfs_fs.h>
#include <linux/fs.h>
#include <linux/locks.h>
#include <linux/init.h>
#include <linux/smp_lock.h>

#include <asm/uaccess.h>
I probably don't need all of those, but I'm too excited to see if I've written something that resembles a filesystem! We also need a few function prototypes to help the compiler out.
static struct inode_operations staticfs_inode_operations;
static struct file_operations staticfs_dir_operations;
static struct address_space_operations staticfs_aops;
static struct super_operations staticfs_ops;
But now what do I do? How do I compile it? It's not a standalone piece of code. It's a small part of a larger system. My strategy: since I've stolen nearly everything from existing filesystem kernels, I might as well put my module in the same place!

I've gone and made a /usr/src/linux/fs/staticfs directory, put my inode.c into it, and copied the Makefile from the romfs directory. I've then edited the Makefile in /usr/src/linux/fs to add my filesystem in amongst the rest. Will this work? Hell if I know! My kernel's happily building all of my modules now, so we'll see.

Well, that was fun! What happened, you ask? Well, I got myself a module! Of course, it's compiled for a custom version of my kernel, so it complained when I tried to load it into my running one. That's fine -- insmod has a nice --force option, which I used. Oh boy!

Warning: loading staticfs.o will taint the kernel: forced load
  See http://www.tux.org/lkml/#export-tainted for information about tainted modules
Module staticfs loaded, with warnings
Unbelievable. Have I really done it? Written a nice, useless VFS module?? Let's try it!
% mkdir -p /mnt/static
% mount -t staticfs none /mnt/static
mount: Not a directory
%
Well carp. Now I know what I'm doing tomorrow.

March 17, 2004

So I guess I need to find out who doesn't like me. Is it VFS, deciding that "none" isn't a directory (I also tried it with "/tmp")? Is it my own module that's returning something that says it isn't a directory? Is mount just saying that message because it doesn't have a better one to describe what's happening?

My guess is that it would be the filesystem's fault. I can picture VFS asking the module if the supplied target ("none", "/tmp", "/dev/hda1", etc.) is a directory, when it asks for the root inode. Did we forget to say that that was a directory? I think the easiest way (or perhaps the most interesting), will be to use printk() liberally through my code to watch it work.

Okay, so I did that, and what do I see? As I thought, the module had staticfs_read_super() and staticfs_read_inode() read. It called my module (great!), got the superblock (super!) and looked up the root inode (awesome!). It then decided that the root inode wasn't a directory, so failed.

Sure enough, my staticfs_read_inode() didn't set the type as it should. romfs did it here

        /* Compute permissions */
        ino = romfs_modemap[nextfh & ROMFH_TYPE];
and ramfs does it in its ramfs_get_inode(), passed in as one of the parameters. Oops! Let's fix it. We'll change
  switch (ino) {
  case 0:
  case 2: i->i_fop = &staticfs_dir_operations;break;
  case 1:
  case 3: i->i_fop = &generic_ro_fops;break;
  }
to
  switch (ino) {
  case 0:
  case 2: i->i_fop = &staticfs_dir_operations; i->i_mode |= S_IFDIR; break;
  case 1:
  case 3: i->i_fop = &generic_ro_fops; i->i_mode |= s_IFREG; break;
  }
Let's try it again.
% mount -t staticfs none /mnt/static
%
Victory! I've mounted my own filesystem! For some reason it doesn't show up in a df, but it's there in /etc/mtab.
% cd /mnt/static
% ls -al

Hmn. Well, I was able to cd to it, which is cool, but ls didn't work. Checking the logs, I see
Mar 17 09:31:20 whiz17 kernel: In staticfs_read_super
Mar 17 09:31:20 whiz17 kernel: In staticfs_read_inode
Mar 17 09:31:22 whiz17 kernel: In staticfs_statfs
Mar 17 09:31:29 whiz17 kernel: In staticfs_lookup
Mar 17 09:31:30 whiz17 last message repeated 11 times
Mar 17 09:31:31 whiz17 kernel: In staticfs_readdir
Mar 17 09:31:31 whiz17 last message repeated 12 times
Mar 17 09:31:31 whiz17 kernel: readdir
Mar 17 09:31:31 whiz17 kernel: In staticfs_readdir
Mar 17 09:31:32 whiz17 last message repeated 176 times
The staticfs_read_super() and staticfs_read_inod() are the mount attempt that I saw before. It looks like once it was successfully mounted (yay!) it called staticfs_statfs(), and that seemed to go okay. The call to staticfs_lookup() was when I cded into the directory. There are multiple lookups, 11 (or 12?) in all. Should there have been that many? There are only four objects in our filesystem. Is that the source of the problem? Or is it our staticfs_readdir() function, which never seems to return an answer that satisfies VFS? I'm going to look closer at both of them.

I decided to add in a little debugging into staticfs_readdir(), getting it to spit out the name if the file it was about to insert into the directory, like this:

    DMSG(fsname);

    if (filldir(dirent,fsname,1,offset,ino,ftype)<0) {
      return stored;
    }
This returned these results:
% ls /mnt/static
Mar 17 13:33:14 whiz17 kernel: In staticfs_readdir
Mar 17 13:33:14 whiz17 kernel: In staticfs_a
Mar 17 13:33:14 whiz17 kernel: In staticfs_b
Mar 17 13:33:14 whiz17 kernel: In staticfs_readdir
Mar 17 13:33:14 whiz17 kernel: In staticfs_a
Mar 17 13:33:14 whiz17 kernel: In staticfs_b
This seems good and bad. It looks like it's asking about the right directory (the root one), and that our function is returning the right values ("a" and "b"). But VFS keeps asking us for more. Are we not telling it we're done, somehow? Of course! We aren't going back into the code where we got the offset
  unsigned long offset=fp->f_pos;
and updating it as we read it! This means that whenever we returned from staticfs_readdir(), we were saying, "here are some entries, oh, and we left off at the same entry we started at", because we didn't update the fp->f_pos record! We add that at the very end:
    stored++;
    offset++;
    fp->f_pos=offset;
Let's give this a whirl. Well look at that -- it doesn't freeze up! Okay, it didn't return a directory listing either. Something is wrong with the structure of our entries, I'm guessing. Let's try some other things, then, while we're picking on staticfs.

% cd /mnt/static/b
That didn't complain. So it thinks that b is a directory. Good!
% ls -al
total 0
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 c*
Amazing! It's showing us the content of b! So, why does it not work for the root directory? And note that the "." and ".." directories aren't there. Are those our responsibilities? Does ramfs support them? I'll have to go look.
% cat c
These are the characters in file c.%
Now I'm ecstatic! I have a file in my file system! Let's try other things.
% cd ..
% cat b
cat: b: Is a directory
% cd d
d: No such file or directory.
% cat a
These are the characters in file a.%
Wow. This is looking great! It knows that I've got directories that can't be catted, because I didn't implement the readpage() function in staticfs_dir_operations. staticfs_lookup() is correctly informing the VFS that there's no d directory. And I can cat a, too! Incredible!
% cd a
%
Oops. It allows me the cd into a. So, who's responsible for this? Is it up to staticfs_lookup()? It shouldn't be, because it's handed an inode and a filename (in a dentry), to find a dentry for that filename's inode. At this point, VFS has decided that the inode is a directory and that the filename mentioned is within it. That information would only come from readdir(). So is it readdir() that needs some more fixing? Or is it a problem with read_inode(), not telling VFS that a is a ONLY a file, and that it can't be cded into?

Looking at staticfs_read_inode() again, and comparing it to ramfs_get_inode(), I see that in the case of S_IFREG inodes, the inode->i_op doesn't get populated at all, so let's try changing

  case 1:
  case 3: i->i_fop = &generic_ro_fops; i->i_mode |= S_IFREG; break;
to
  case 1:
  case 3: i->i_fop = &generic_ro_fops; i->i_mode |= S_IFREG; i->i_op=NULL; break;
Well, that half-worked!
% cd /mnt/static/b/c
/mnt/static/b/c: Not a directory.
% cd /mnt/static/a
%
So now c is correctly being treated like a file, not a directory, but a isn't. Are we looking up inodes incorrectly?

March 18, 2004

Well, I've played around with a bunch of things, trying to figure out what I might be doing wrong. I tried adding in a staticrootfs_fs_type, like ramfs has, to see if there's something special for the "rootfs". I tried moving all of my inode numbers up by one, in case inode 0 had a special meaning, and it's that one that's causing me problems. I don't think I changed much more, but now my module won't mount again because of the mount: Not a directory message. I think I'm still saying that inode 1 (the root's new number) is a directory! Also, when I unmount, I get an interesting message in my logs:

Mar 18 15:25:31 whiz17 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
This is obviously my fault, somehow not clearing all my inodes somewhere. This is the least of my worries right now - I want my filesystem to mount again!

March 19, 2004

I'm kinda bummed. I still haven't figured out why my filesystem won't mount again. I've regressed back to an older version of the code, one that I'm sure "worked", but it still wouldn't mount. Perhaps these "Busy inodes" are preventing it from working? Time for a reboot.

Did I tell you? I'm also getting a nice little segmentation fault it seems. Must be something I'm corrupting somewhere. Isn't that nice? Definitely a little frustrating. I guess it's time to compare my module with the other two, and see what I might be doing wrong. This is the tedious part of learning. Learning the understandable -- by reading the code -- is kinda fun. Learning the hidden stuff isn't. I just hope I'm able to make enough sense of it all the finally put together a webpage that helps future filesystem authors, because I could sure use one now!

Okay. I don't get it. I did the reboot to get rid of those inodes. Sure enough, that was the reason I couldn't mount. But now ... my root directory is visible. I don't get it! This is the older code that I brought back in, so my inodes are numbered 0-3 again. But look!

% mount /mnt/static
% cd /mnt/static
% ls -al
total 0
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 a*
drwxrwxrwx    1 root     wheel           1 Dec 31  1969 b/
%
I don't know what to say! And look!
% more a
These are the characters in file a.
% cd a
a: Not a directory.
% more b

*** b: directory ***

% cd b
% ls -al
total 0
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 c*
% more c
These are the characters in file c.
% cd c
c: Not a directory.
% more d
d: No such file or directory
% cd d
d: No such file or directory.
%
I don't get it. Did someone sneak in and fix my filesystem? What was going on yesterday? The day before? Do I dare question it? Heck no!

But what happens if I unmount it? Will I still get my dangling inodes? That, and the fact that I might want . and .. to show up, are the only two things left to do for staticfs! Oh, and that seg fault. Right.

Okay, so I still get these busy inodes. Let's figure that out. The message comes from /usr/src/linux/fs/super.c, in the kill_super() function. It logs this message when invalidate_inodes() returns non-zero with our superblock. That function lives in /usr/src/linux/fs/inode.c. It calls invalidate_list() on four lists of inodes: inode_in_use, inode_unused, sb->s_dirty and sb->s_locked_inodes. The first two are global lists maintained by VFS, and the last two are within our own superblock. The question is: which one is reporting a non-zero when it returns? Do I have to go so far as to edit inode.c? Sounds like the fastest way! invalidate_list() would be the best place to put that code - basically, every time it gets to the busy = 1; line, have it spit out the inode's info. It might also help to know which of the four lists the node was in.

Okay, so that's done, kernel's recompiled, and off we go. This is what I get now when I unmount my filesystem:

% umount /mnt/static
Mar 19 13:02:35 whiz17 kernel: inode->i_ino=2  inode->i_sb=dbc0a200
Mar 19 13:02:35 whiz17 kernel: inode->i_ino=1  inode->i_sb=dbc0a200
Mar 19 13:02:35 whiz17 kernel: busy=1
%
And to be sure it was my superblock, I added in a printk() to staticfs_read_super() to spit out the superblock's pointer value when I mount:
% mount /mnt/static
Mar 19 13:02:39 whiz17 kernel: In staticfs_read_super
Mar 19 13:02:39 whiz17 kernel: sb=dbc0a200
Mar 19 13:02:39 whiz17 kernel: In staticfs_read_inode
%
Yup, sure enough, it's me. inodes 1 and 2 are a and b. But then,
% ls /mnt/static/b
/mnt/static/b*
Mar 19 13:06:08 whiz17 kernel: In staticfs_lookup
% umount /mnt/static
Mar 19 13:06:15 whiz17 kernel: inode->i_ino=0  inode->i_sb=dbc0a200
Mar 19 13:06:15 whiz17 kernel: inode->i_ino=2  inode->i_sb=dbc0a200
Mar 19 13:06:15 whiz17 kernel: inode->i_ino=1  inode->i_sb=dbc0a200
Mar 19 13:06:15 whiz17 kernel: busy=1
%
So, when I ls the directory, my root directory's inode is stuck. Just to get everyone in the mix,
% ls /mnt/static/b/c
ls: /mnt/static/b/c: Not a directory
Mar 19 13:06:54 whiz17 kernel: In staticfs_lookup
% cd /mnt/static
% cd b
b: Not a directory.
% ls -al
total 0
%
Well carp. I'm going to reboot and hope that clearing the inodes makes my filesystem act right.

Nope. Well now what! Hmn. It looks like I was screwing up copying the module in each time I changed it, so now I'm confused (I never said that this was easy!) as to which version DID show me the root directory. So I'm currently at the following: no root directory, no released inodes, no dot files. Just where I was a few days ago. Oh, no, except for the following which just happened when I went to reload my module:

% mount /mnt/static
Segmentation fault
Mar 19 13:25:11 whiz17 kernel: In staticfs_read_super
Mar 19 13:25:11 whiz17 kernel: sb=dc973600
Mar 19 13:25:11 whiz17 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000020
Mar 19 13:25:11 whiz17 kernel:  printing eip:
Mar 19 13:25:11 whiz17 kernel: e090a440
Mar 19 13:25:11 whiz17 kernel: *pde = 00000000
Mar 19 13:25:11 whiz17 kernel: Oops: 0000
Mar 19 13:25:11 whiz17 kernel: CPU:    0
Mar 19 13:25:11 whiz17 kernel: EIP:    0010:[]    Not tainted
Mar 19 13:25:11 whiz17 kernel: EFLAGS: 00010202
Mar 19 13:25:11 whiz17 kernel: eax: 00000000   ebx: bfffe780   ecx: ddd19ae0   edx: 00000001
Mar 19 13:25:11 whiz17 kernel: esi: d9b10240   edi: 00001000   ebp: fffffffb   esp: d8bcdf70
Mar 19 13:25:11 whiz17 kernel: ds: 0018   es: 0018   ss: 0018
Mar 19 13:25:11 whiz17 kernel: Process mount (pid: 2956, stackpage=d8bcd000)
Mar 19 13:25:11 whiz17 kernel: Stack: bfffe780 d9b10240 00001000 bffff7a8 c013cb30 ddd19ae0 bfffe780 00001000 
Mar 19 13:25:11 whiz17 kernel:        d9b10240 ddd19ae0 dff73be0 c0108a74 0000000e c0108a96 00000008 00000001 
Mar 19 13:25:11 whiz17 kernel:        c01127e0 d8bcc000 0805a29b 0805a900 c0106ffb 0805a900 bfffe780 00001000 
Mar 19 13:25:11 whiz17 kernel: Call Trace: [] [] [] [] [] 
Mar 19 13:25:11 whiz17 kernel: 
Mar 19 13:25:11 whiz17 kernel: Code: 8b 40 20 8b 70 28 68 7e a5 90 e0 68 27 a5 90 e0 e8 cb c6 80 
Oh goodie. For now, I'm going to work on the assumption that whatever I'm doing wrong for the other things is causing this - that remounting the filesystem while there are floating inodes is A Bad Thing, and causing this messy segfault. What a way to lead into the weekend.
<--staticfs addresses ^--VFS--^ staticfs continued-->
©2002-2018 Wayne Pearson