<--staticfs finalizing ^--VFS--^

How to write a Linux VFS filesystem module - StaticFS - finalizing continued

StaticFS

Finalizing, continued.

March 22, 2004

Alright, so it's a new day, a new week, and I'm determined to fix this thing. The busy inodes are the ones in the inode_in_use list, and we know they're owned by us. I can safely mount and unmount the filesystem if I don't do anything else. If I mount it, ls the root, and unmount, everything still seems okay. However, if I mount it and cat a, then I get the busy file.

What happens when we cat a file? Let's look again at what I tried:

% mount /mnt/static
Mar 22 09:18:57 whiz17 kernel: In staticfs_read_super         
Mar 22 09:18:57 whiz17 kernel: sb=dd41b600
Mar 22 09:18:57 whiz17 kernel: In staticfs_read_inode
% umount /mnt/static
Mar 22 09:19:00 whiz17 kernel: busy=0
%
% mount /mnt/static                                             
Mar 22 09:18:39 whiz17 kernel: In staticfs_init_fs            
Mar 22 09:18:39 whiz17 kernel: In staticfs_read_super
Mar 22 09:18:39 whiz17 kernel: sb=dd41b600
Mar 22 09:18:39 whiz17 kernel: In staticfs_read_inode
% ls /mnt/static
Mar 22 09:18:45 whiz17 kernel: In staticfs_readdir            
Mar 22 09:18:45 whiz17 kernel: In staticfs_a
Mar 22 09:18:45 whiz17 kernel: In staticfs_b
Mar 22 09:18:45 whiz17 kernel: In staticfs_readdir
% umount /mnt/static
Mar 22 09:18:49 whiz17 kernel: busy=0
%
% mount /mnt/static 
Mar 22 09:19:04 whiz17 kernel: In staticfs_read_super
Mar 22 09:19:04 whiz17 kernel: sb=dd41b600
Mar 22 09:19:04 whiz17 kernel: In staticfs_read_inode         
% ls /mnt/static/b
c*
Mar 22 09:19:18 whiz17 kernel: In staticfs_lookup
Mar 22 09:19:18 whiz17 kernel: In staticfs_read_inode
Mar 22 09:19:18 whiz17 kernel: In staticfs_readdir
Mar 22 09:19:18 whiz17 kernel: In staticfs_c
Mar 22 09:19:18 whiz17 kernel: In staticfs_lookup             
Mar 22 09:19:18 whiz17 kernel: In staticfs_read_inode
Mar 22 09:19:18 whiz17 kernel: In staticfs_readdir
% umount /mnt/static
Mar 22 09:19:22 whiz17 kernel: inode->i_ino=3  inode->i_sb=dd41b600
Mar 22 09:19:22 whiz17 kernel: busy=1
Mar 22 09:19:22 whiz17 kernel: inode->i_ino=3  inode->i_sb=dd41b600
Mar 22 09:19:22 whiz17 kernel: busy=1
Mar 22 09:19:22 whiz17 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
%
% mount /mnt/static
Mar 22 09:19:44 whiz17 kernel: In staticfs_read_super         
Mar 22 09:19:44 whiz17 kernel: sb=dd41b600
Mar 22 09:19:44 whiz17 kernel: In staticfs_read_inode
% cat /mnt/static/a
These are the characters in file a.
Mar 22 09:19:48 whiz17 kernel: In staticfs_lookup
Mar 22 09:19:48 whiz17 kernel: In staticfs_read_inode
Mar 22 09:19:48 whiz17 kernel: In staticfs_readpage
% umount /mnt/static                                            
Mar 22 09:19:52 whiz17 kernel: inode->i_ino=1  inode->i_sb=dd41b600
Mar 22 09:19:52 whiz17 kernel: inode->i_ino=0  inode->i_sb=dd41b600
Mar 22 09:19:52 whiz17 kernel: inode->i_ino=3  inode->i_sb=dd41b600
Mar 22 09:19:52 whiz17 kernel: busy=1
Mar 22 09:19:52 whiz17 kernel: VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
%
So, ls on the root is okay, but ls on a directory inside is not. The differences are calls to staticfs_lookup() and staticfs_readdir(). Looking at staticfs_readdir(), we see that we don't do any locking or anything with the inode -- the only thing we do with it is use it to determine what directory we're talking about. So, unless the caller did something that we're expected to undo, I think that function is off the hook. That leaves staticfs_lookup(). Let's look at it again.
static struct dentry *staticfs_lookup(struct inode *dir, struct dentry *dentry) {
  int ino;
  struct inode *inode;
  const char *name;

  DMSG("lookup");

  ino = -1;
  inode = NULL;
  name = dentry->d_name.name;

  switch (dir->i_ino) {
  case 0:
    if (strcmp(name,"a")==0) ino=1;
    if (strcmp(name,"b")==0) ino=2;
    break;
  case 2:
    if (strcmp(name,"c")==0) ino=3;
    break;
  default:
    DMSG("default in lookup");
  }

  if (ino>=0) {
    inode=iget(dir->i_sb,ino);
  }

  d_add(dentry,inode);
  return NULL;
}
Sure enough, we've got a call to iget() here. But we modelled this after romfs_lookup()! Could it be doing things incorrectly? I hope not. It's part of the linux distribution, so it's correct, right? RIGHT? I'm going to go look at another, more complicated filesystem.

Here's ext2_lookup() from /usr/src/linux/fs/ext2/namei.c:

static struct dentry *ext2_lookup(struct inode * dir, struct dentry *dentry)
{
        struct inode * inode;
        ino_t ino;

        if (dentry->d_name.len > EXT2_NAME_LEN)
                return ERR_PTR(-ENAMETOOLONG);

        ino = ext2_inode_by_name(dir, dentry);
        inode = NULL;
        if (ino) {
                inode = iget(dir->i_sb, ino);
                if (!inode)
                        return ERR_PTR(-EACCES);
        }
        d_add(dentry, inode);
        return NULL;
}

March 24, 2004

So I've tried modifying staticfs_lookup() to be more like ext2_lookup(); the biggest difference between romfs_lookup() and ext2_lookup() that I can see is that romfs still has a call to d_add() if its call to iget() fails, where ext2 does not. romfs does this based on that comment it has in the code, that we've seen before:

        /*
         * it's a bit funky, _lookup needs to return an error code
         * (negative) or a NULL, both as a dentry.  ENOENT should not
         * be returned, instead we need to create a negative dentry by
         * d_add(dentry, NULL); and return 0 as no error.
         * (Although as I see, it only matters on writable file
         * systems).
         */
I do like to trust ext2 a little more, though, since it's the core Linux filesystem. Regardless, changing it from one module's way to another didn't solve it, so it doesn't really matter, for now, who's right about this.

March 25, 2004

It looks like I didn't try much yesterday, but that's not true -- I just didn't get to committing my discoveries to the page. I have rewritten staticfs_lookup() a few times now, making it simpler and simpler, trying to rule out that that's the problem. Did I? More on that in a minute. I also decided that if it wasn't the way I was holding the inodes (with iget() calls) then it might be the way that I'm releasing (or NOT releasing) the inodes. There were a couple of members of the struct super_operations that we had ignored, because at the time it seemed that we didn't have to worry about them -- put_inode and delete_inode. romfs ignored them both, so we assumed that we could too. ramfs defined put_inode as force_delete(), as we talked about back during our superblock writing, and wrote it off as unnecessary, as romfs seemed to.

Just to be curious, I looked into /usr/src/linux/fs/inode.c to see what happens when those functions aren't defined.

                if (op && op->put_inode)
                        op->put_inode(inode);
This is in iput(). Pretty simple -- if a module defines a put_inode method, it's called. If not, it isn't.
                if (op && op->delete_inode) {
                        void (*delete)(struct inode *) = op->delete_inode;
                        if (!is_bad_inode(inode))
                                DQUOT_INIT(inode);
                        /* s_op->delete_inode internally recalls clear_inode() */
                        delete(inode);
                } else
                        clear_inode(inode);
if delete_inode is defined, it's used; if not, then clear_inode(). To watch the flow of such things, I added the following to staticfs:
static void staticfs_put_inode(struct inode *inode) {
  DMSG("put_inode");
  force_delete(inode);
}

static void staticfs_delete_inode(struct inode *inode) {
  DMSG("delete_inode");
  clear_inode(inode);
}

static struct super_operations staticfs_ops = {
  read_inode:staticfs_read_inode,
  put_inode:staticfs_put_inode,
  delete_inode:staticfs_delete_inode,
  statfs:staticfs_statfs,
};
% mount /mnt/static
Mar 25 09:59:25 whiz17 kernel: In staticfs_init_fs
Mar 25 09:59:25 whiz17 kernel: In staticfs_read_super
Mar 25 09:59:25 whiz17 kernel: sb=dc993600
Mar 25 09:59:25 whiz17 kernel: In staticfs_read_inode
Mar 25 09:59:25 whiz17 kernel: Out staticfs_read_inode
Mar 25 09:59:25 whiz17 kernel: Out staticfs_read_super
% ls /mnt/static
Mar 25 09:59:29 whiz17 kernel: In staticfs_readdir
Mar 25 09:59:29 whiz17 kernel: In staticfs_a
Mar 25 09:59:29 whiz17 kernel: In staticfs_b
Mar 25 09:59:29 whiz17 kernel: Out staticfs_readdir end
Mar 25 09:59:29 whiz17 kernel: In staticfs_readdir
Mar 25 09:59:29 whiz17 kernel: Out staticfs_readdir end
% umount /mnt/static
Mar 25 09:59:33 whiz17 kernel: In staticfs_put_inode
Mar 25 09:59:33 whiz17 kernel: In staticfs_delete_inode
Mar 25 09:59:33 whiz17 kernel: busy=0
You probably noticed I've added a few more debugging lines, too. Well that seemed to go all right. Yet
% mount /mnt/static
Mar 25 10:09:44 whiz17 kernel: In staticfs_read_super
Mar 25 10:09:44 whiz17 kernel: sb=dc993600
Mar 25 10:09:44 whiz17 kernel: In staticfs_read_inode
Mar 25 10:09:44 whiz17 kernel: Out staticfs_read_inode
Mar 25 10:09:44 whiz17 kernel: Out staticfs_read_super
% ls /mnt/static/a
/mnt/static/a*
Mar 25 10:09:48 whiz17 kernel: In staticfs_lookup
Mar 25 10:09:48 whiz17 kernel: In staticfs_read_inode
Mar 25 10:09:48 whiz17 kernel: Out staticfs_read_inode
Mar 25 10:09:48 whiz17 kernel: Out staticfs_lookup
% umount /mnt/static
Mar 25 10:09:52 whiz17 kernel: inode->i_ino=1  inode->i_sb=dc993600
Mar 25 10:09:52 whiz17 kernel: inode->i_ino=0  inode->i_sb=dc993600
Mar 25 10:09:52 whiz17 kernel: busy=1
No call to staticfs_put_inode() or staticfs_delete_inode()? Note the difference between the two -- staticfs_lookup() is called in the second case, but not the first. Things still seem to point there as the source of our problems, then, since I assume that if VFS doesn't want to call put_inode or delete_inode, it knows best!

Well this is ... interesting.

% mount /mnt/static
Mar 25 14:29:09 whiz17 kernel: In staticfs_read_super
Mar 25 14:29:09 whiz17 kernel: sb=de58e600
Mar 25 14:29:09 whiz17 kernel: In staticfs_read_inode
Mar 25 14:29:09 whiz17 kernel: Out staticfs_read_inode
Mar 25 14:29:09 whiz17 kernel: Out staticfs_read_super
% cd /mnt/static
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-[
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-)
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-setenv
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-LANG
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-"C";
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-setenv
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-SUPPORTED
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-"en_CA:en";
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-setenv
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-SYSFONT
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-"lat0-sun16";
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-setenv
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-SYSFONTACM
Mar 25 14:29:11 whiz17 kernel: In staticfs_lookup
Mar 25 14:29:11 whiz17 kernel: lookup() combo: 0-"iso15";
What's all that, you ask? Good question! Sure looks like we're stepping on someone's toes. That seems to be a little bit of csh script, but why on earth are we seeing it when trying to look up a dentry? Do I really need another thing to go wrong? Or could this be the cause of all my problems? It could certainly be the cause of the segfault I get!

March 26, 2004

On a whim, I thought that I'd up my inode numbers, in case I'm using some verboten value that no one told me about. I've made inodes zero through three be 100 to 103 now. And guess what?

% mount /mnt/static
Mar 26 12:12:09 whiz17 kernel: In staticfs_read_super
Mar 26 12:12:09 whiz17 kernel: sb=de58e600
Mar 26 12:12:09 whiz17 kernel: In staticfs_read_inode
Mar 26 12:12:09 whiz17 kernel: Out staticfs_read_inode
Mar 26 12:12:09 whiz17 kernel: Out staticfs_read_super
% ls /mnt/static
a*  b/
Mar 26 12:12:13 whiz17 kernel: In staticfs_readdir
Mar 26 12:12:13 whiz17 kernel: In staticfs_a
Mar 26 12:12:13 whiz17 kernel: In staticfs_b
Mar 26 12:12:13 whiz17 kernel: Out staticfs_readdir end
Mar 26 12:12:13 whiz17 kernel: In staticfs_lookup
Mar 26 12:12:13 whiz17 kernel: lookup() combo: 100-a
Mar 26 12:12:13 whiz17 kernel: In staticfs_read_inode
Mar 26 12:12:13 whiz17 kernel: Out staticfs_read_inode
Mar 26 12:12:13 whiz17 kernel: Out staticfs_lookup
Mar 26 12:12:13 whiz17 kernel: In staticfs_readdir
Mar 26 12:12:13 whiz17 kernel: Out staticfs_readdir end
% ls -al /mnt/static
total 0
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 a*
drwxrwxrwx    1 root     wheel           1 Dec 31  1969 b/
%
Can you believe it? I can see my root directory! Hallejulah! I'm going to reboot just to make sure it's not some artifact from previous versions...

While we're waiting for the reboot, I'll have you know that this has been a regular occurrence (rebooting) because of the mess I seem to make with the kernel with my naughty filesystem.

Okay, so I have some good news and some interesting news. The good news is that my root directory is still ls-worthy. Hooray! So there's something special about inode number zero. Good to know! Now the interesting news. As the machine was rebooting, I was thinking on this crazy mess of directory lookups. They seem environmental. They only happen when my current directory is in my filesystem. My current directory. Why should that matter? Why is affected by my environment??

Wait a minute. I've been doing this all as root (to recompile kernels, kernel modules, to mount filesystems and the read system logs). The first thing I've been doing is sourcing a file to get my environment the way I like (source my .tcshrc, set up some aliases for quickly recompiling the module, installing the module, etc.) Could that be it? What if i don't source that file?

Would you look at that. No more funny staticfs_lookup() calls. It was me, the whole time. Now, that doesn't really answer the WHY, but who cares for now? Hell, I don't know that that hasn't been happening with every other filesystem.

So, we've gotten rid of two issues at once. Excellent. We still have the dangling inode problem, and still need to get . and .. to show up, and I've found a new problem:

% cat /mnt/static/c
cat: /mnt/static/c: Permission denied
Not quite right, that! We really want "No such file or directory". Let's go tidy that up. I mentioned before that staticfs_lookup() got a rewrite; here's what it looks like now.
#define MAXINODE 104

static struct dentry *staticfs_lookup(struct inode *dir, struct dentry *dentry) {
  ino_t ino,root;
  struct inode *inode;
  const char *name;

  root=dir->i_ino;
  name = dentry->d_name.name;

  ino=MAXINODE;
  if (root==100 && !strcmp(name,"a")) ino=101;
  if (root==100 && !strcmp(name,"b")) ino=102;
  if (root==102 && !strcmp(name,"c")) ino=103;

  if (ino==MAXINODE) {
    return ERR_PTR(-ENOENT);
  }

  inode=iget(dir->i_sb,ino);

  d_add(dentry,inode);
  return NULL;
}
I've now used ENOENT instead of EACCES. It makes a lot more sense -- we used EACCES because romfs did, but really it used it when it had a problem reading the filesystem, where "permission denied" would make more sense. What does this mean for romfs, then, if I tried catting a file that didn't exist? Who cares! Mine works.

So, if we're fixing the easy problems first, let's look at the . and .. issue. How does romfs do it? Well, actually, I don't know that it does. One thing that I've never actually done is mount a romfs filesystem. This requires a premade filesystem using some program that I haven't bothered to go get.

Okay, so ramfs then. Now, I mentioned this briefly before, when we were talking about inodes, and said how ramfs had changed between different kernels. Because of this, I kinda missed discussing how ramfs does its directories, because that's one of the things that changed. One version has this

                case S_IFDIR:
                        inode->i_op = &ramfs_dir_inode_operations;
                        inode->i_fop = &dcache_dir_ops;
                        break;
and the other has
                case S_IFDIR:
                        inode->i_op = &ramfs_dir_inode_operations;
                        inode->i_fop = &ramfs_dir_operations;
                        break;
The first one uses built-in calls for all directory operations (it's the newer version) The second one uses
static struct file_operations ramfs_dir_operations = {
        read:           generic_read_dir,
        readdir:        dcache_readdir,
        fsync:          ramfs_sync_file,
};
That's the same as dcache_dir_ops, except for the ramfs_sync_file(), which was a useless function anyway.

So, the answer for ramfs in either case is dcache_readdir(), a function we haven't looked at yet. Let's do so.

int dcache_readdir(struct file * filp, void * dirent, filldir_t filldir)
{
        int i;
        struct dentry *dentry = filp->f_dentry;

        i = filp->f_pos;
        switch (i) {
                case 0:
                        if (filldir(dirent, ".", 1, i, dentry->d_inode->i_ino, DT_DIR) < 0)
                                break;
                        i++;
                        filp->f_pos++;
                        /* fallthrough */
                case 1:
                        if (filldir(dirent, "..", 2, i, dentry->d_parent->d_inode->i_ino, DT_DIR) < 0)
                                break;
                        i++;
                        filp->f_pos++;
                        /* fallthrough */
                default: {
                ...
The stuff I've clipped out goes through the dentry cache and finds out what it knows. Not important to us, because right there, clear as day, is the answer. Can we just call dcache_readdir() to get it to do all this mojo for us? No, because it only contains cached information (such as the parent/child relationship of . and ..), but we may not have returned this information yet. We're just going to fill in the two dot entries explicitly, as I thought we might have to do. The one thing I didn't know how to do until looking at the code above was how to find out the inode of the parent of our root directory. That's what the dentry->d_parent->d_inode->i_ino does for us. Here's our new staticfs_readdir().
static int staticfs_readdir(struct file *fp, void *dirent, filldir_t filldir) {
  struct inode *i=fp->f_dentry->d_inode;
  struct dentry *dentry = fp->f_dentry;
  unsigned long offset=fp->f_pos;
  ino_t ino=i->i_ino, dino;
  int stored = 0;
  unsigned char ftype=DT_REG;
  char fsname[3];

  dino=100;

  for (;;) {
    fsname[0]='\0';

    switch (ino) {
    case 100:
      switch (offset) {
      case 0:
        fsname[0]='.';fsname[1]='\0';ftype=DT_DIR;
        dino=dentry->d_inode->i_ino; // dino=100;
        break;
      case 1:
        fsname[0]='.';fsname[1]='.';fsname[2]='\0';ftype=DT_DIR;
        dino=dentry->d_parent->d_inode->i_ino;
        break;
      case 2:
        fsname[0]='a';fsname[1]='\0';ftype=DT_REG;
        dino=101;
        break;
      case 3:
        fsname[0]='b';fsname[1]='\0';ftype=DT_DIR;
        dino=102;
        break;
      }
      break;
    case 102:
      switch (offset) {
      case 0:
        fsname[0]='.';fsname[1]='\0';ftype=DT_DIR;
        dino=dentry->d_inode->i_ino; // dino=102;
        break;
      case 1:
        fsname[0]='.';fsname[1]='.';fsname[2]='\0';ftype=DT_DIR;
        dino=dentry->d_parent->d_inode->i_ino; // dino=100;
        break;
      case 2:
        fsname[0]='c';fsname[1]='\0';ftype=DT_REG;
        dino=103;
        break;
      }
      break;
    default: // Should never happen!
      fsname[0]='z';fsname[1]='\0';ftype=DT_REG;
      dino=104;
      break;
    }

    if (fsname[0]=='\0') {
      return stored;
    }

    if (filldir(dirent,fsname,strlen(fsname),offset,dino,ftype)<0) {
      return stored;
    }
    stored++;
    offset++;
    fp->f_pos=offset;
  }
}
And how well does it work?
% cd /mnt/static
% ls -al
total 4
drwxrwxrwx    1 root     wheel           2 Dec 31  1969 ./
drwxr-xr-x    9 root     wheel        4096 Mar 16 15:31 ../
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 a*
drwxrwxrwx    1 root     wheel           1 Dec 31  1969 b/
% more a
These are the characters in file a.
% more b

*** b: directory ***

% cd b
% ls -al
total 0
drwxrwxrwx    1 root     wheel           1 Dec 31  1969 ./
drwxrwxrwx    1 root     wheel           2 Dec 31  1969 ../
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 c*
% more c
These are the characters in file c.
% more d
d: No such file or directory
% more .

*** .: directory ***

% more ..

*** ..: directory ***

% cd ..
% ls -al
total 4
drwxrwxrwx    1 root     wheel           2 Dec 31  1969 ./
drwxr-xr-x    9 root     wheel        4096 Mar 16 15:31 ../
-rwxrwxrwx    1 root     wheel          35 Dec 31  1969 a*
drwxrwxrwx    1 root     wheel           1 Dec 31  1969 b/
% ls ..
cdrom/  floppy/  profs/  ramdisk/  static/  updates/  zip250.0/
%
Fantastic. Oh yeah, and there's one more thing that I've yet to try, but keep thinking about on the drive home.
% cat >> a
a: Read-only file system.
% touch b
touch: creating `b': Is a directory
% touch c
touch: creating `c': No such file or directory
% mkdir d
mkdir: cannot create directory `d': No such file or directory
%
On the plus side, it didn't explode when I tried to create new files, but it didn't quite return useful messages for all of the cases. If we look at the debugging messages, we see this:
% touch /mnt/static/c
touch: creating `/mnt/static/c': No such file or directory
Mar 26 15:21:52 whiz17 kernel: In staticfs_lookup
Mar 26 15:21:52 whiz17 kernel: lookup() combo: 100-c
Mar 26 15:21:52 whiz17 kernel: In staticfs_lookup
Mar 26 15:21:52 whiz17 kernel: lookup() combo: 100-c
%
This is only a guess, but here's what I see happening above. The touch command first checks to see if the file is there (this is the first call to staticfs_lookup()), because if it exists, it'll update the date on the file. Because the lookup fails, it then likely called open() with O_CREAT set. It doesn't seem to check the result of that call, but rather checks again whether the file exists (thus the second call to staticfs_lookup()). When it doesn't find it there, it then complains that there's no file to touch, instead of its failure to create the file for you. Maybe touch isn't the best test?
% cp /etc/hosts /mnt/static/c
cp: cannot create regular file `/mnt/static/c': No such file or directory
Mar 26 15:34:31 whiz17 kernel: In staticfs_lookup
Mar 26 15:34:31 whiz17 kernel: lookup() combo: 100-c
Mar 26 15:34:31 whiz17 kernel: In staticfs_lookup
Mar 26 15:34:31 whiz17 kernel: lookup() combo: 100-c
Same actions as touch, but a better answer. Looks like we're okay! So what's left? Just that one last little problem - the sticking inodes. If that doesn't sound like something for Monday, I don't know what does!

March 27, 2004

One interesting thing I've noticed is that when I log out, the kernel messages about the locked inodes stop appearing. Almost immediately in fact. But if I come back, and mount a different filesystem, such as ramfs, they reappear!

For instance, I logged out yesterday at 15:40, but at 15:44, the logs are no longer complaining about the busy inodes. I then logged in today at 11:56, and still no busy inodes. I then mounted a ramfs filesystem, not my own, and as soon as I unmounted it, I started getting complaints about the old inodes from before -- the same inodes, the same superblock -- definitely mine. What does this mean? That the inodes were forgotten about somewhere until the VFS was "tickled"? I don't get it. I mean, if those inodes somehow got "unbusy", through whatever means (obviously not the right ones, since we're having this problem), then how would they get "rebusy" just because I've logged in again and mounted another filesystem?

On a whim, I decided to see if anyone else had reported the "Busy inodes after unmount" error in their travels, and came upon quite a few interesting posts on the net. Most of them pertained to NFS and to autofs, though other filesystems were mentioned. Some people were saying that it was the way that such-and-such module was doing things, others were saying it was a problem in that version of the kernel, etc. So maybe it isn't my fault? Should I perhaps try a different kernel? Or is it the way that I'm doing certain things that just doesn't agree with the kernel, and that it should still be me that changes my ways?

One post, talking about NFS's problems with this message, mentioned something that caught my eye.

>   Previously anonymous dentries were hashed (though with no name, the
>   hash was pretty meaningless).  This meant that they would hang around
>   after the last reference was dropped.  This was actually fairly
>   pointless as they would never get referenced again, and caused a real
>   problem as umount wouldn't discard them and so you got the message
>                 printk("VFS: Busy inodes after unmount. "
>                         "Self-destruct in 5 seconds.  Have a nice day...\n");
> 
>   In 2.4.6-pre3 I stopped hashing those dentries so now when the last
>   reference is dropped, the dentry is freed.  So now there will never be
>   more anonymous dentries than there are active nfsd threads.
This is very interesting, since we do the same thing -- add anonymous dentries -- in our staticfs_lookup(). Maybe we shouldn't add these anonymous entries, even though romfs has a comment that says that it and we should do exactly that? But didn't we try this before? I'm losing track! Yeah, I suppose that's the whole point of this journal.

March 29, 2004

I'm currently waiting for a newer kernel to compile. Why? Because I want to rule out that this is a kernel issue. Of course, even if it is a kernel issue, I don't know that the version I'm compiling has it fixed, but we shall see. I'm not sure what result I'm hoping for, really; if a newer kernel fixes it, I'm going to be both happy and upset -- happy that the last problem is fixed, but upset that I wasted so much time trying to fix it.

Okay, that's done, and staticfs is installed, and the new kernel is running, and what happens? The same old stuck inodes? Yes. But no. What do I mean? Well, they're stuck to begin with, but this newer version of the kernel seems to properly "self-destruct" them, because they only complain the once! Then it seems all clear. Kinda. If I mount another filesystem, like ramfs, and unmount it, I get the reappearance of my bad inodes, like before. And after a reboot? Yes, it still crashes, but this time I got something useful from it! I was told that the crash happened in dcache.c, on line 345. Let's take a look.

void prune_dcache(int count)
{
        spin_lock(&dcache_lock);
        for (;;) {
                struct dentry *dentry;
                struct list_head *tmp;

                tmp = dentry_unused.prev;

                if (tmp == &dentry_unused)
                        break;
                list_del_init(tmp);
                dentry = list_entry(tmp, struct dentry, d_lru);

                /* If the dentry was recently referenced, don't free it. */
                if (dentry->d_vfs_flags & DCACHE_REFERENCED) {
                        dentry->d_vfs_flags &= ~DCACHE_REFERENCED;
                        list_add(&dentry->d_lru, &dentry_unused);
                        continue;
                }
                dentry_stat.nr_unused--;

                /* Unused dentry with a count? */
                if (atomic_read(&dentry->d_count))
                        BUG();

                prune_one_dentry(dentry);
                if (!--count)
                        break;
        }
        spin_unlock(&dcache_lock);
}
it's the BUG(); line that's causing the error at shutdown. So, one (or more) dentry objects have a count when they shouldn't. It's got to be my fault, but I still can't see what I'm doing wrong -- any more wrong than my "guides" of ramfs and romfs! I'm going to add some debugging code to dcache.c to find out a little more about the dentry that's causing the problem. Maybe I can then find out what that problem is!

March 30, 2004

Okay, well that did it! Here's what I get from my debug code, which I inserted just before the BUG{}; above: it's crashing because of the dcache in my superblock (I think we all knew that); it's crashing on the dcache entry for c (this is after I ls /mnt/static/b/c to cause the error); and it's crashing because dentry->d_count is -1. -1! We must not be incrementing it when we should, and thus something is decrementing it from the zero we left it at, and therefore the system is upset when it's NOT zero at that point!

So I went looking through the VFS code for where d_count might be incremented or set. The only use with atomic_inc() is in dcache.c, in dget_locked(). It seems to only be called by filesystems, and not internally. There's only one use of atomic_set() on d_count, too, and that's also in dcache.c, and that's in d_alloc(). Following through the source, in namei.c, we can see the function real_lookup() is the VFS function that calls our staticfs_lookup() function, and before that it calls that d_alloc() function to create the dentry for our proposed new entry. This means that the dentry that is passed into our function already has its d_count set to 1. If we return non-zero (or non-NULL), real_lookup() calls dput() on the newly-created dentry, reducing the d_count and thus effectively removing that new dentry object. Otherwise it returns the new dentry to the caller to real_lookup().

That sure seems like it makes sense. So where's the problem? From that, it looks like the dentry gets its d_count incremented elsewhere, and that it's not our responsibility. Looking further into the VFS code, I find only three points where atomic_dec() is called, all in dcache.c. One is in dput(), as we saw before. The other two are in a function called d_genocide(). Nice.

This is what I've concluded. The dentries might be getting messed up because of the other problem we have -- the sticking inodes. In super.c, in the kill_super() function, we see the whole gang appear. This is where d_genocide() is called, and also where invalidate_nodes() is called. So, fix one, fix the other? That still requires fixing one.

Let's take the same approach with inode->i_count as we did with dentry->d_count, seeing where all the manipulations take place. But first, let's see what value i_count has when we realize it's causing problems.

Remember a while back we decided to add in staticfs_put_inode() and staticfs_delete_inode()? Well I've just added in some debugging there to see if, when and on what they're being called, and that yielded something interesting.

% mount /mnt/static
% ls /mnt/static/b/c
/mnt/static/b/c*
% umount /mnt/static
Mar 30 10:22:38 whiz17 kernel: In staticfs_put_inode
Mar 30 10:22:38 whiz17 kernel: put: inode->i_ino=102  inode->i_count=1  inode->i_nlink=1
Mar 30 10:22:38 whiz17 kernel: put: inode->i_ino=102  inode->i_count=1  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: In staticfs_delete_inode
Mar 30 10:22:38 whiz17 kernel: del: inode->i_ino=102  inode->i_count=0  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: del: inode->i_ino=102  inode->i_count=0  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: In staticfs_put_inode
Mar 30 10:22:38 whiz17 kernel: put: inode->i_ino=100  inode->i_count=1  inode->i_nlink=1
Mar 30 10:22:38 whiz17 kernel: put: inode->i_ino=100  inode->i_count=1  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: In staticfs_delete_inode
Mar 30 10:22:38 whiz17 kernel: del: inode->i_ino=100  inode->i_count=0  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: del: inode->i_ino=100  inode->i_count=0  inode->i_nlink=0
Mar 30 10:22:38 whiz17 kernel: inode->i_count=1  inode->i_ino=103  inode->i_sb=dd7dcc00
Mar 30 10:22:38 whiz17 kernel: busy=1
The pairs of messages in staticfs_put_inode() and staticfs_delete_inode() are the before and after of the calls to force_delete() and clear_inode() respectively. It looks good: the first inode, 102, which is for b, gets sent through staticfs_put_inode(), which decreases i_nlink, and then through staticfs_delete_inode(), which sets inode->i_state to I_CLEAR. The same happens for inode 100, our root directory. But why not for c? That's the one (inode 103) that then becomes a problem. But there's no attempt to call staticfs_put_inode() or staticfs_delete_inode() on it.

It's iput() that calls both of those functions, so this means that inode 103 is never being passed to iput(). Who calls that? There are a couple of calls in inode.c, regarding syncing, but I don't think those are the ones. There's also one in dcache.c, in dentry_iput(). That's called in three different functions: prune_one_dentry(), d_delete() and dput(). prune_one_dentry() is when trying to regain some memory when the dcache gets big. This doesn't sound like our situation. d_delete() is used when a file is deleted from a filesystem -- again, not for us. So it's the simple dput() function that either isn't getting called on our dentry, or is deciding not to go further with it.

dput() does have one part that jumps out at me:

        if (!atomic_dec_and_lock(&dentry->d_count, &dcache_lock))
                return;
If the d_count on the dentry for c wasn't 1, then the rest of dput() wouldn't be called, which would explain the absence of everything else. Looking back at the logs, though
% ls /mnt/static/b/c
/mnt/static/b/c*
Mar 30 10:22:35 whiz17 kernel: In staticfs_lookup
Mar 30 10:22:35 whiz17 kernel: lookup() combo: 100-b
Mar 30 10:22:35 whiz17 kernel: In staticfs_read_inode
Mar 30 10:22:35 whiz17 kernel: Out staticfs_read_inode
Mar 30 10:22:35 whiz17 kernel: Out staticfs_lookup
Mar 30 10:22:35 whiz17 kernel: In staticfs_lookup
Mar 30 10:22:35 whiz17 kernel: lookup() combo: 102-c
Mar 30 10:22:35 whiz17 kernel: In staticfs_read_inode
Mar 30 10:22:35 whiz17 kernel: Out staticfs_read_inode
Mar 30 10:22:35 whiz17 kernel: Out staticfs_lookup
From here it looks like b and c are being called the same amount of times. But maybe the difference is between b being a directory and c being a file?

I've just spent a few hours trying this and that, to no avail. I tried watching dput() to see if c was coming in with some odd values, but let me tell you, dput() gets called a LOT! It was painful trying to run with all that logging, but even after that, I didn't see a call to dput() for c.

So what does this mean? It means that whatever should be calling dput() on c isn't, and it happens when we do an ls. Now, ls ends up calling namei(), which does all the looking-up (and eventually calls staticfs_lookup()), so does that mean we should trace the events of a directory lookup?

March 31, 2004

I'm stuck again. I don't really want to trace through ls, nor the namei() function. But how else can I figure out the problem? I just tried something else, though, which I didn't notice before.

% mount /mnt/static
Mar 31 12:57:51 whiz17 kernel: In staticfs_init_fs
Mar 31 12:57:51 whiz17 kernel: In staticfs_read_super
Mar 31 12:57:51 whiz17 kernel: sb=df79dc00
Mar 31 12:57:51 whiz17 kernel: In staticfs_read_inode
Mar 31 12:57:51 whiz17 kernel: Out staticfs_read_inode
Mar 31 12:57:51 whiz17 kernel: Out staticfs_read_super
% ls /mnt/static
a*  b/
Mar 31 12:57:53 whiz17 kernel: In staticfs_readdir
Mar 31 12:57:53 whiz17 kernel: In staticfs_.
Mar 31 12:57:53 whiz17 kernel: In staticfs_..
Mar 31 12:57:53 whiz17 kernel: In staticfs_a
Mar 31 12:57:53 whiz17 kernel: In staticfs_b
Mar 31 12:57:53 whiz17 kernel: Out staticfs_readdir end
Mar 31 12:57:53 whiz17 kernel: In staticfs_lookup
Mar 31 12:57:53 whiz17 kernel: lookup() combo: 100-a
Mar 31 12:57:53 whiz17 kernel: In staticfs_read_inode
Mar 31 12:57:53 whiz17 kernel: Out staticfs_read_inode
Mar 31 12:57:53 whiz17 kernel: lookup dentry->d_inode=0  dentry->d_count=1  inode->i_count=1
Mar 31 12:57:53 whiz17 kernel: lookup dentry->d_inode=dc5226c0  dentry->d_count=1  inode->i_count=1
Mar 31 12:57:53 whiz17 kernel: Out staticfs_lookup
Mar 31 12:57:53 whiz17 kernel: In staticfs_readdir
Mar 31 12:57:53 whiz17 kernel: Out staticfs_readdir end
% umount /mnt/static
Mar 31 12:58:04 whiz17 kernel: inode->i_count=1  inode->i_ino=101  inode->i_sb=df79dc00
Mar 31 12:58:04 whiz17 kernel: inode->i_count=1  inode->i_ino=100  inode->i_sb=df79dc00
Mar 31 12:58:04 whiz17 kernel: busy=1
For some reason, doing an ls on the root directory did a lookup on a, but not b. Why one and not the other? As a matter of fact, staticfs_readdir() returns four entries, and only one was looked up. That's fishy. We wouldn't see a request for ., of course, since that's not in our filesystem. We might not see a request for .. either, since that inode is known from when we passed up the superblock. But why not ask about b?

April 2, 2004

I didn't enter anything into this journal yesterday. It's not that I didn't look at staticfs, just that I didn't accomplish anything. I tried a few things, all desperate and unfounded in logic, which is why I didn't bother to document them. I've gone and reduced the size of the filesystem to just the root and the file a , just to make the code smaller and perhaps help me see what I'm doing wrong a little easier. It hasn't worked yet.

April 5, 2004

Yeah, I'm still here. And I'm still stuck at the same place. I've been reading webpages, linux-kernel mailing lists, VFS presentations, and anything else that might shed some light, but to no avail. It still doesn't work.

April 6, 2004

I did it. Kinda. I got rid of the sticking inode. I got rid of the -1 dentry count. How? I changed

  d_add(dentry,inode);
to
  d_instantiate(dentry,inode);
I did this because I had concluded that if there weren't any entries in the dentry cache to clean up, then there could be any to screw up. If you remember, d_add() was defined like this:
static __inline__ void d_add(struct dentry * entry, struct inode * inode)
{
        d_instantiate(entry, inode);
        d_rehash(entry);
}
So we've taken out the d_rehash() step. The good news is, of course, that our filesystem doesn't have any bugs. The bad news is that because we're not caching anything in the dentry cache, VFS has to ask our module EVERY TIME it wants to know about a certain file. This is bad, and not satisfactory. On the other hand, I've spent over three weeks trying to figure this out, and to be honest, I'm almost ready to say "screw it", and have an inefficient module instead of a buggy one. Though, I guess some would say that the module is still buggy BECAUSE it doesn't cache. And I'm inclined to agree.

April 12, 2004

I'm going to waste a little more time on this bug. I figure if I don't figure it out by the end of this week, I'll give up on it and move ahead to a read/write filesystem, so I can at least learn something new.

As we covered before, some dentries aren't being handled properly. This was proven by our "fix" above. We also know that when we get that kernel panic, it's because there's a dentry with a -1 in its d_count. We ruled out that it's not getting incremented as it should, so concluded that it was a problem when it was being decremented. At this point, we started looking at the inode being the problem, not the dentry. Let's try and really rule out the dentry as the cause.

We determined before that there are two places where VFS decrements a dentry's d_count -- dput() and d_genocide(). I never did see if-and-when d_genocide() was called, so I'm going to do that now.

Well there it is. d_genocide() is to blame for decrementing d_count to -1. Maybe it should be checking that d_count isn't already too low? Or maybe it's still someone else's fault that it's already too low at this point?

So, back to dput(). A few more debug statements, and I see this:

% ls /mnt/static
a*  b/
Apr 12 16:22:54 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=3 dentry->d_name.name=/
Apr 12 16:22:54 whiz17 kernel: In staticfs_readdir
Apr 12 16:22:54 whiz17 kernel: In staticfs_.
Apr 12 16:22:54 whiz17 kernel: In staticfs_..
Apr 12 16:22:54 whiz17 kernel: In staticfs_a
Apr 12 16:22:54 whiz17 kernel: In staticfs_b
Apr 12 16:22:54 whiz17 kernel: Out staticfs_readdir end
Apr 12 16:22:54 whiz17 kernel: In staticfs_lookup
Apr 12 16:22:54 whiz17 kernel: lookup() combo: 100-a
Apr 12 16:22:54 whiz17 kernel: In staticfs_read_inode
Apr 12 16:22:54 whiz17 kernel: Out staticfs_read_inode
Apr 12 16:22:54 whiz17 kernel: lookup dentry->d_inode=0  dentry->d_count=1  inode->i_count=1
Apr 12 16:22:54 whiz17 kernel: lookup dentry->d_inode=dc406740  dentry->d_count=1  inode->i_count=1
Apr 12 16:22:54 whiz17 kernel: Out staticfs_lookup
Apr 12 16:22:54 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=5 dentry->d_name.name=/
Apr 12 16:22:54 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
Apr 12 16:22:54 whiz17 kernel: In staticfs_readdir
Apr 12 16:22:54 whiz17 kernel: Out staticfs_readdir end
The key line is this one
Apr 12 16:22:54 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
This happens in dput(), just before the atomic_dec_and lock() call. We see, then, that the dentry for a is being decremented right after the lookup. And why not? It's done with it, isn't it? But if that's so, then why is it still being (re)decremented in d_genocide()? Here's the pertinent part of that function:
     while (next != &this_parent->d_subdirs) {
             struct list_head *tmp = next;
             struct dentry *dentry = list_entry(tmp, struct dentry, d_child);
             next = tmp->next;
             if (d_unhashed(dentry)||!dentry->d_inode)
                     continue;
             if (!list_empty(&dentry->d_subdirs)) {
                     this_parent = dentry;
                     goto repeat;
             }
             atomic_dec(&dentry->d_count);
     }
What jumps out at me is the d_unhashed(dentry) statement. If it's unhashed, it's supposed to continue on, but it looks that at this point it isn't. Should it be? Does being hashed imply having a d_count greater than zero? Or should anyone who decrements d_count be responsible for removing it from the dentry cache? I'll try to find that out tomorrow.

April 14, 2004

I admit defeat. All I've accomplished over the last two days is to make a mess of my kernel with a thousand debug statements, and still I can't figure out why the dcache is causing me grief. I should probably document the last of what I did, though.

I went through every source file in /usr/src/linux(version)/fs and found every call to dput(). I then put this bit of code before every call:

          if (dentry->d_inode) {
            if (dentry->d_inode->i_sb) {
              if (dentry->d_inode->i_sb->s_magic==0x61626364) {
                printk("dputE(): dentry->d_count (BEFORE dec)=%d dentry->d_name.name=%s\n",atomic_read(&dentry->\
d_count),dentry->d_name.name);
              }
            }  
          }
The dputE(): part got a different letter for every instance. This was put RIGHT before the call to dput(), because I wanted to find the discrepancy that was appearing when dput() was called differently depending on whether or not the dentries were put into the dcache. This is a sample of the output when the dentry was cached:
Apr 14 08:34:41 whiz17 kernel: A: dentry->d_count=1 dentry->d_name.name=a
Apr 14 08:34:41 whiz17 kernel: dputQ(): dentry->d_count (BEFORE dec)=5 dentry->d_name.name=/
Apr 14 08:34:41 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=5 dentry->d_name.name=/
Apr 14 08:34:41 whiz17 kernel: B: dentry->d_count=1 dentry->d_name.name=a
Apr 14 08:34:41 whiz17 kernel: dputE(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
Apr 14 08:34:41 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
and here when I don't cache it:
Apr 14 08:57:21 whiz17 kernel: A: dentry->d_count=1 dentry->d_name.name=a
Apr 14 08:57:21 whiz17 kernel: dputQ(): dentry->d_count (BEFORE dec)=5 dentry->d_name.name=/
Apr 14 08:57:21 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=5 dentry->d_name.name=/
Apr 14 08:57:21 whiz17 kernel: B: dentry->d_count=1 dentry->d_name.name=a
Apr 14 08:57:21 whiz17 kernel: dputE(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
Apr 14 08:57:21 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=1 dentry->d_name.name=a
Apr 14 08:57:21 whiz17 kernel: dput(): dentry->d_count (BEFORE dec)=4 dentry->d_name.name=/
See the difference? There's an extra dput() call when the dentry is cached, one that's missing when it isn't. But where is dput() being called from? I DON'T KNOW! Notice there's no "lettered" dput() message before the last dput() call. I want to find the code that is deciding to call dput() there, to see how the caching is causing a problem. But I can't. From this, it doesn't seem that that last dput() call is being done from the VFS -- at least, not in the directory I'm looking in! Where else to look?

I don't know. I should really just continue on with learning the "write" portion of the VFS, and worry about it another time, but this has gotten me down, and I think I need to take a break from VFS for now. A month chasing down one bug is a little much, even for me.

April 29, 2004

I just built a machine with the 2.6.5 kernel, and thought that I'd give staticfs a try on that -- who knows, maybe it was a kernel problem the whole time!

Trying to slap the module into the kernel didn't work; they've changed quite a few things in VFS, covered in /usr/src/linux(version)/Documentation/filesystems/porting. After changing over everything mentioned there, and looking at the diffs between the inode.c for romfs in both 2.4.26 and 2.6.5, I got a clean compile -- and a clean segmentation fault when I tried to ls it. Oh well. Now that the VFS has changed quite a bit, I'll probably have to go through ramfs and romfs once again. That's not going to happen immediately, though, since I'm working on a network scanning project right now. Hopefully I'll be back soon.
<--staticfs finalizing ^--VFS--^

©2002-2018 Wayne Pearson