OpenBSD in the Classroom

John Aycock
Department of Computer Science
University of Calgary
aycock@cpsc.ucalgary.ca

Introduction

Q: How do you get a hundred students in an operating systems class to work on real kernel code, using outdated machines and a lab barely big enough for a quarter of them?

A: Very carefully.

Like most university computer science programs, ours has a mandatory course on operating systems. It is a third-year (junior) course, with high enrollment -- 238 students over the last two semesters -- and assignments traditionally based on OS simulation or toy problems.

It would be nice to have students working with real OS code, though. Students get little experience studying large pieces of software, much less software that is well-written. There is also something to be said for working on the Real Thing rather than abstracted academic contrivances.

The kernel-hacking initiative was first started here by a sessional instructor, teaching operating systems over the summer using the Linux kernel. Being a summer term, there was a relatively low enrollment and, equally important, a reduced demand for student workstations -- a bank of about 30 SPARC-5 machines was allocated for the course, whose kernel the students could modify and reboot with impunity.

How could kernel work be done during a regular semester? There are three issues:

Cost. In economically-challenged times, it's hard to justify hardware and software expenditures for what might be a one-time experiment. Free is good.
Equipment. During the busy fall and winter terms, there are few machines to dedicate to a single course.
People. Ideally, teaching assistants with kernel programming experience are available to teach the labs and mark assignments. Unfortunately, while good TAs can be found, none have the right background: there are no faculty who research operating systems here, and the TAs used for the Linux version of the course were working on their graduate theses and unavailable to teach.

It was the lack of an experienced TA pool that led me to OpenBSD, strangely enough. Basically, there was no Linux kernel talent to rely upon, so there was no compelling reason to stick with Linux. OpenBSD had two other advantages, from the point of view of what I wanted to teach students. First, it doesn't enjoy the same popularity as Linux, so students would have to study the code; they couldn't easily find "how-to" kernel code on the Internet or in books. Second, and I'm sure someone will flame me for saying this, the code quality tends to be better in OpenBSD than in Linux. The OpenBSD kernel would not only have to be studied, but was good to study.

The other pieces of the puzzle, cost and equipment, were solved rather serendipitously. Our campus IT department gave us some dusty PCs they considered obsolete: P166 IBMs with 64M RAM, 2G of disk, floppy and CDROM drives. Free is good.

Configuring the Machines

Our support staff set up 28 of the castoff machines in a separate lab, hidden behind a firewall which let nothing in, and allowed only outbound ssh and sftp connections.

I had two priorities for the OpenBSD configuration on these machines. First, students had to be able to rebuild the machine from an unknown state quickly. Second, kernel compiles had to happen quickly.

To rebuild the machines quickly, I created a configuration where as much of the filesystem as possible resided on CD-ROM. OpenBSD's caching of blocks from the CD-ROM gave good enough performance to make extensive use of the CD-ROM feasible. Tracking down all the programs that wanted to write to the filesystem took a while; I must confess that it took ten attempts to iron out all the details! Once I was done, the basic machine rebuilding sequence the students had to follow took a matter of minutes:

Booting from floppy. The machines were first booted single-user off the floppy, mounting the root filesystem from CD-ROM. (Unfortunately, the computers were unable to boot from the CD-ROM, which would have simplified matters.) Even single-user mode seemed to want writable /dev entries, so I set up a 1000-block MFS filesystem and changed /etc/rc to untar device nodes into it (this was far faster than running MAKEDEV).
Installing a writable filesystem. I put a script in /sbin which the students ran as root to do a laundry list of tasks:
1. Run fdisk and write a new disk label, with a 512M "a" partition and a 128M swap partition. The remainder of the disk was unused, which reduced rebuilding time a bit.
2. Run newfs on the "a" partition, and mount the resulting filesystem on /w. All things writable on the system, like /tmp, were symlinked to /w.
3. Populate /w. This was done by untarring an uncompressed tarfile located on the CD-ROM, containing the contents of /root and /var. To permit booting from the disk, /boot, /etc/boot.conf, and a kernel image were also placed in /w.
4. Run installboot on /w and unmount it.
Booting multi-user. Students had a usable system at this point and could log in as root. No writable kernel source was installed during machine rebuilding; it was left as a separate step.

Setting up a fresh, writable copy of the kernel source as quickly as possible took quite a bit of experimentation. I also wanted to have a prebuilt set of object files for the kernel to reduce kernel build time -- building a kernel from scratch on these machines took almost 15 minutes!

Using the union filesystem for kernel source would have been perfect, but it proved to be far too unstable and was abandoned. I also tried using a recursive cp, restore, and untarring symlink trees. The fastest and easiest method I found, however, was untarring a compressed version of the kernel source plus accompanying object files. The time was reduced further by omitting kernel source for architectures other than the x86. Again, I incorporated this into a script, which students would run after rebuilding the machine. This script would take about two minutes to complete.

Students could then modify the kernel source and build their own kernels. To test kernels, students would copy them into /w and simply boot from the hard drive.

I supplied students with a script to find files they had added or modified in the kernel source tree. As their kernel work could conceivably perturb the clock setting, basing changes on file modification times would be unwise. Instead, I precomputed an MD5 hash for each file in the source tree and stored these hashes on the CD-ROM; my script would then compute new MD5 hashes and look for differences. The output was a list of added or modified files that could be used as input to tar.

Configuring the Assignments

Our operating systems course has four assignments, which students are to do themselves, i.e., no group work is permitted. To accommodate the sheer size of the class, I actually set up eight assignments, divided into two four-assignment "streams": an OpenBSD lab stream whose assignments must be done in the OpenBSD lab, and a non-lab stream whose (traditional simulation-based) assignments could be done on any of the more plentiful workstations. Each student had to do one OpenBSD-stream assignment, and three non-lab-stream assignments.

Each student could pick which OpenBSD assignment they wanted to do. I supplied a summary of the assignments at the beginning of the course to help them make an informed choice. In an ideal world, each student would have chosen an OpenBSD assignment whose topic interested them. I also made it clear to students in lectures that it was their responsibility to distribute themselves over the four assignments. Naive on my part, at best.

Traditionally, the first operating systems assignment is often an easier introductory assignment; not all students know C at the start of the course, for instance. I followed this tradition, assuming that students with a greater learning curve would avoid the first OpenBSD assignment. I also made the final OpenBSD assignment a challenging one, to encourage students not to procrastinate.

Results and Lessons

In hindsight, what happened next was predictable. Of 97 submissions for the first assignment, 89 students opted for the OpenBSD assignment -- word got out that it was an easier assignment. I don't want to dwell on how 89 students crammed themselves into a lab meant for 28, though! What I am pleased to mention is the fact that five students waited and did the final OpenBSD assignment, despite knowing that it was going to be harder than the rest.

What did I learn from this? Based on my experience and the results from a survey I gave the students, several lessons are clear.

A lot of work is involved in dual assignment streams. Essentially, my workload for assignments doubled by doing this: twice the assignment specifications, twice the marking guides. The TAs didn't fare too badly here because most people opted for the first OpenBSD lab assignment, so they didn't have twice the assignment variety in their marking load. Interestingly, the students' survey responses indicated that a large majority of them preferred having the choice of which assignment to do.
One approach I am considering as an alternative to the dual streams is to have a set of three traditional assignments, and make the OpenBSD work into a term project.
When a choice of assignment is given, assignments must be of equal difficulty. This is a hard goal to achieve, especially considering that the assignments need to change from year to year for pragmatic reasons. Perhaps either more, easier assignments, or less, harder assignments, are called for.
Lab space is critical. This was perhaps the key limiting factor in what I was able to do. Even the obvious approach of allowing students to work together would have resulted in too-large groups. I am currently investigating running OpenBSD on an x86 simulator, which would enable students to work on any of our computers, and eliminate the need for both a specialized OpenBSD distribution and dedicated lab machines.
Handling real OS code challenges many students. Part of the problem was not the OS code itself, but simply the speed of the lab machines. Students are used to having a very fast edit-compile-run cycle now, and found waiting for a slow reboot to complete very frustrating. This could be mitigated by having students develop loadable kernel modules instead, though forcing students to think carefully about what they're doing through a slow development cycle is not necessarily a bad thing.
The other part of the problem was the students' fledgling code-reading skills. The TAs taught students how to use the OpenBSD machines, build kernels, and use tools like grep to search for things in the kernel source, but I asked them to give students limited guidance in the code itself. I had high expectations of the students' ability to read code, perhaps too high -- students are not exposed to a lot of C code now, and the OpenBSD kernel documentation, written by experts for experts, was of little help to students. A far better approach, which I plan to adopt, is to have the TAs walk through a series of small lab exercises with the students, to give them practice with the OpenBSD code structure and finding things of interest in the code.

According to the survey results, the majority of students liked working with OpenBSD kernel code, at least in principle. The real problems lay in the implementation of the idea, but with some refinements I expect using OpenBSD will be a very educational experience for the students.

Acknowledgments

Thanks to Jim Parker for suggesting the dual streams, Theo de Raadt for feedback on the assignments, and the technical support staff (especially Debbie Mazurek) for setting up the lab. Tim Williams taught the Linux version of the course. Shannon Jaeger proofread a draft of this article.