March 4, 2004

inodes

Now that the VFS has our superblock, and a root inode, what can it do? Well, the inode represents information about a specific file, much as a superblock represented information about a specific filesystem. The corresponding structure is struct inode, in the same fs.h file where the struct super_block was found. This structure has a member called i_op, which much like its super_block counterpart, holds a structure of pointers to functions that act upon inodes. Here's the definition found in ramfs/inode.c for the structure used in ramfs:
static struct inode_operations ramfs_dir_inode_operations = {
        create:         ramfs_create,
        lookup:         ramfs_lookup,
        link:           ramfs_link,
        unlink:         ramfs_unlink,
        symlink:        ramfs_symlink,
        mkdir:          ramfs_mkdir,
        rmdir:          ramfs_rmdir,
        mknod:          ramfs_mknod,
        rename:         ramfs_rename,
};
A lot more functions have been supplied in this structure than in the superblock structure. This made me wonder, and I went and looked at the definition for struct inode_operations (also in fs.h). Here it is:
struct inode_operations {
        int (*create) (struct inode *,struct dentry *,int);
        struct dentry * (*lookup) (struct inode *,struct dentry *);
        int (*link) (struct dentry *,struct inode *,struct dentry *);
        int (*unlink) (struct inode *,struct dentry *);
        int (*symlink) (struct inode *,struct dentry *,const char *);
        int (*mkdir) (struct inode *,struct dentry *,int);
        int (*rmdir) (struct inode *,struct dentry *);
        int (*mknod) (struct inode *,struct dentry *,int,int);
        int (*rename) (struct inode *, struct dentry *,
                        struct inode *, struct dentry *);
        int (*readlink) (struct dentry *, char *,int);
        int (*follow_link) (struct dentry *, struct nameidata *);
        void (*truncate) (struct inode *);
        int (*permission) (struct inode *, int);
        int (*revalidate) (struct dentry *);
        int (*setattr) (struct dentry *, struct iattr *);
        int (*getattr) (struct dentry *, struct iattr *);
};
There are bunch of values that aren't assigned anything! So what does this mean? Obviously the ramfs filesystem doesn't have anything to say about the last seven operations shown, and it appears that VFS doesn't mind. What I don't know is what happens when these operations are attempted? Since the structure has null values, does VFS just take NULL as "do nothing"? As "error"? Or does it have a set of default actions that it takes? But would that make sense? All questions for the future...

Of course, now I want to peek back at the struct super_operations:

struct super_operations {
        void (*read_inode) (struct inode *);

        /* reiserfs kludge.  reiserfs needs 64 bits of information to
        ** find an inode.  We are using the read_inode2 call to get
        ** that information.  We don't like this, and are waiting on some
        ** VFS changes for the real solution.
        ** iget4 calls read_inode2, iff it is defined
        */
        void (*read_inode2) (struct inode *, void *) ;
        void (*dirty_inode) (struct inode *);
        void (*write_inode) (struct inode *, int);
        void (*put_inode) (struct inode *);
        void (*delete_inode) (struct inode *);
        void (*put_super) (struct super_block *);
        void (*write_super) (struct super_block *);
        void (*write_super_lockfs) (struct super_block *);
        void (*unlockfs) (struct super_block *);
        int (*statfs) (struct super_block *, struct statfs *);
        int (*remount_fs) (struct super_block *, int *, char *);
        void (*clear_inode) (struct inode *);
        void (*umount_begin) (struct super_block *);
        /* Following are for knfsd to interact with "interesting" filesystems
         * Currently just reiserfs, but possibly FAT and others later
         *
         * fh_to_dentry is given a filehandle fragement with length, and a type flag
         *   and must return a dentry for the referenced object or, if "parent" is
         *   set, a dentry for the parent of the object.
         *   If a dentry cannot be found, a "root" dentry should be created and
         *   flaged as DCACHE_NFSD_DISCONNECTED. nfsd_iget is an example implementation.
         *
         * dentry_to_fh is given a dentry and must generate the filesys specific
         *   part of the file handle.  Available length is passed in *lenp and used
         *   length should be returned therein.
         *   If need_parent is set, then dentry_to_fh should encode sufficient information
         *   to find the (current) parent.
         *   dentry_to_fh should return a 1byte "type" which will be passed back in
         *   the fhtype arguement to fh_to_dentry.  Type of 0 is reserved.
         *   If filesystem was exportable before the introduction of fh_to_dentry,
         *   types 1 and 2 should be used is that same way as the generic code.
         *   Type 255 means error.
         *
         * Lengths are in units of 4bytes, not bytes.
         */
        struct dentry * (*fh_to_dentry)(struct super_block *sb, __u32 *fh, int len, int fhtype, int parent);
        int (*dentry_to_fh)(struct dentry *, __u32 *fh, int *lenp, int need_parent);
        int (*show_options)(struct seq_file *, struct vfsmount *);
};

Ugh. What a mess. And ramfs only defines two of the values? What's going on? We'll figure all that out later for sure.

Looking back at our inode operations:

static struct inode_operations ramfs_dir_inode_operations = {
        create:         ramfs_create,
        lookup:         ramfs_lookup,
        link:           ramfs_link,
        unlink:         ramfs_unlink,
        symlink:        ramfs_symlink,
        mkdir:          ramfs_mkdir,
        rmdir:          ramfs_rmdir,
        mknod:          ramfs_mknod,
        rename:         ramfs_rename,
};
we can see that we define a lot of functionality when it comes to files in our filesystem, which only makes sense. Let's look at all of these functions.
static int ramfs_create(struct inode *dir, struct dentry *dentry, int mode)
{
        return ramfs_mknod(dir, dentry, mode | S_IFREG, 0);
}
Not very interesting, but a few things to note. the create member of the inode_operations structure is used for creating "regular" files, not directories. Okay, that's about the only thing to note.

March 5, 2004

static struct dentry * ramfs_lookup(struct inode *dir, struct dentry *dentry)
{
        d_add(dentry, NULL);
        return NULL;
}
This, I believe, looks up a file in a directory. I'm not sure what's going on here, but the comment preceding the above code says
/*
 * Lookup the data. This is trivial - if the dentry didn't already
 * exist, we know it is negative.
 */
Because this is ramfs, perhaps the assumption is that the VFS will have entries in its cache regarding all files, so if it doesn't already know, then it can't exist? I'll investigate further with other filesystems at a later time. I'll have to look up what d_add() does, too.
static int ramfs_link(struct dentry *old_dentry, struct inode * dir, struct dentry * dentry)
{
        struct inode *inode = old_dentry->d_inode;

        if (S_ISDIR(inode->i_mode))
                return -EPERM;

        inode->i_nlink++;
        atomic_inc(&inode->i_count);    /* New dentry reference */
        dget(dentry);           /* Extra pinning count for the created dentry */
        d_instantiate(dentry, inode);
        return 0;
}
Links are ways for more than one file to refer to the same inode. These are hard links, not symbolic links. First we check that the target is not a directory, because you can't have a hard link to a directory (I'm not sure why, actually). Then we up the count of the links, call atomic_inc() (which I'm guessing is a thread-safe way to increase a usage counter), and yet more calls to d_() functions that we'll have to look into.
static int ramfs_unlink(struct inode * dir, struct dentry *dentry)
{
        int retval = -ENOTEMPTY;

        if (ramfs_empty(dentry)) {
                struct inode *inode = dentry->d_inode;

                inode->i_nlink--;
                dput(dentry);                   /* Undo the count from "create" - this does all the work */
                retval = 0;
        }
        return retval;
}
unlinking means deleting, specifically for a file. Because more than one file can refer to the same inode, the inode itself keeps track of how many files are referring to it (thus the inode->i_nlink-- line). The ENOTEMPTY error is there because the ramfs_unlink() function is also used to remove directories, and so there's a call to ramfs_empty() to ensure that the directory is really empty. I'm assuming ramfs_empty() returns true on a file (which isn't a container). And again with the dput(), another function I'll have to figure out.
static int ramfs_symlink(struct inode * dir, struct dentry *dentry, const char * symname)
{
        int error;

        error = ramfs_mknod(dir, dentry, S_IFLNK | S_IRWXUGO, 0);
        if (!error) {
                int l = strlen(symname)+1;
                struct inode *inode = dentry->d_inode;
                error = block_symlink(inode, symname, l);
        }
        return error;
}
A symlink is a symbolic link, creating a reference to some other file. ramfs_mknod() is called as it was in ramfs_create(), but a few more things are happening. The type is now S_IFLNK, and all the flags are being set on the file (the S_IRWXUGO). After the file is created, the block_symlink() function is called, and I'm not sure what that does. Another on the list o to-do.
static int ramfs_mkdir(struct inode * dir, struct dentry * dentry, int mode)
{
        return ramfs_mknod(dir, dentry, mode | S_IFDIR, 0);
}
Pretty simple, it's the same as ramfs_create(), but with the S_IFDIR flag instead of S_IFREG.
#define ramfs_rmdir ramfs_unlink
Here we see that ramfs_rmdir() is really just ramfs_unlink(), as we mentioned above.
static int ramfs_mknod(struct inode *dir, struct dentry *dentry, int mode, int dev)
{
        struct inode * inode = ramfs_get_inode(dir->i_sb, mode, dev);
        int error = -ENOSPC;

        if (inode) {
                d_instantiate(dentry, inode);
                dget(dentry);           /* Extra count - pin the dentry in core */
                error = 0;
        }
        return error;
}
Finally, we get to see this ramfs_mknod() function that has been used to create regular files, symbolic links and directories! The first thing done is to create an inode with ramfs_get_inode(), which makes sense. We'll have to wait until we see that function to see what mode and dev are used for. d_instantiate() is another one of those d_() functions that we'll have to look into. They all have to do with dentry structures, which are instances of a file. We'll talk about them later (and we'll probably go learn about these d_() functions at the same time).
static int ramfs_rename(struct inode * old_dir, struct dentry *old_dentry, struct inode * new_dir,struct dentry *new_dentry)
{
        int error = -ENOTEMPTY;

        if (ramfs_empty(new_dentry)) {
                struct inode *inode = new_dentry->d_inode;
                if (inode) {
                        inode->i_nlink--;
                        dput(new_dentry);
                }
                error = 0;
        }
        return error;
}
ramfs_rename() is passed in a few things. That's because the idea of renaming a file under Linux is really done as a move, though often the move is into the same directory. This comment precedes the code above:
/*
 * The VFS layer already does all the dentry stuff for rename,
 * we just have to decrement the usage count for the target if
 * it exists so that the VFS layer correctly free's it when it
 * gets overwritten.
 */
I'm still not sure why ramfs doesn't have to worry about removing the file from one directory and placing it into another. The code seems to do the following: determine if the new name is a non-empty directory, and if it is, return the ENOTEMPTY error; if it's empty (because it's either a file or an empty directory), then grab the inode if it exists. If it doesn't, then we're done. If it does, that means the file existed already, and we're overwriting it with this rename, so we decrement the inode's links by one.

What bothers me about this and the ramfs_unlink() functions are that they decrement this link counter in the inode, but what if it's now at zero? Should the inode not be erased? Or, does it float around as an unreferenced inode until something cleans it up? Really, this is a file taking up space in our filesystem that no one can access. Does something go through periodically and find all the zero-linked inodes and do something about it? Perhaps the answer lies back in the unexplored reaches of the superblock.

That covers all of the VFS interface functions for inode-related activities, but we still have a few other internal functions that we've seen and not looked at.

struct inode *ramfs_get_inode(struct super_block *sb, int mode, int dev)
{
        struct inode * inode = new_inode(sb);

        if (inode) {
                inode->i_mode = mode;
                inode->i_uid = current->fsuid;
                inode->i_gid = current->fsgid;
                inode->i_blksize = PAGE_CACHE_SIZE;
                inode->i_blocks = 0;
                inode->i_rdev = NODEV;
                inode->i_mapping->a_ops = &ramfs_aops;
                inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
                switch (mode & S_IFMT) {
                default:
                        init_special_inode(inode, mode, dev);
                        break;
                case S_IFREG:
                        inode->i_fop = &ramfs_file_operations;
                        break;
                case S_IFDIR:
                        inode->i_op = &ramfs_dir_inode_operations;
                        inode->i_fop = &ramfs_dir_operations;
                        break;
                case S_IFLNK:
                        inode->i_op = &page_symlink_inode_operations;
                        break;
                }
        }
        return inode;
}
This function was called in ramfs_mknod(). A new inode is created with new_inode(), and the inode's values are initialized with a bunch of values. The mode is the mode passed in through ramfs_mknod(), referring to the type of file and the permission flags set. The uid and gid are set from a current structure, which I'm not sure about yet. The i_mapping->a_ops gets set with another structure we define in our module, which we'll look at later. The i_fop and i_op pointers are also set to other structures with additional pointers within our module -- these differ depending on whether or not the inode is a regular file, a directory or a symbolic link. (Note:the code above might differ a bit with what you have, as it seems to have changed between kernels, but it should be similar. The code above is from linux-2.4.18-10 kernel).

Basically, then, the ramfs_get_inode() function creates a new structure and populates it. Just as we might expect. The one other thing I noticed was the default: case for the inode mode, which called a function init_special_inode(). This is probably for block devices, sockets, pipes, etc. I'll look into that later, too.

static int ramfs_empty(struct dentry *dentry)
{
        struct list_head *list;

        spin_lock(&dcache_lock);
        list = dentry->d_subdirs.next;

        while (list != &dentry->d_subdirs) {
                struct dentry *de = list_entry(list, struct dentry, d_child);

                if (ramfs_positive(de)) {
                        spin_unlock(&dcache_lock);
                        return 0;
                }
                list = list->next;
        }
        spin_unlock(&dcache_lock);
        return 1;
}
The ramfs_empty() function was used a few times to determine whether something was safe for deletion -- because it was either just a file or it was a directory with nothing in it.

The spin_lock() and spin_unlock() functions are used to prevent new files from being added to the directory while we're checking it out, so we don't return a false answer about its emptiness.

The code seems to loop through the contents of the directory, and if it ever hits anything considered not empty (determined by the ramfs_positive() function, below), then it returns zero, which means the directory wasn't empty. If it makes it through its loop without returning, then one is returned, stating it's empty. Easy.

static inline int ramfs_positive(struct dentry *dentry)
{
        return dentry->d_inode && !d_unhashed(dentry);
}
We'll have to look at d_unhashed() to see what it returns to fully understand the logic here. Again, that falls under the dentry discussion.