^--systems--^ implementation-->

Masses and Springs


Information visualization has long been an interest of mine. With today's online living, and the information domains that have been created with the introduction of computing and networking, there exists extreme amounts of information that require new methods for comprehension. This includes: files in a filesystem; webpages on a webserver; routers connected on the internet and the latency between; Internet domain names.

A caveat: I have no formal background in information visualization, so I apologize now to any in the field for my probably misuse or ignorance of terminology that exists for this area of research.

A visual graph, to me, seems an obvious choice for representing large amounts of hierarchical data. By graph I mean the computer science meaning of a graph, with nodes and edges, as opposed to the general idea of a chart that might come to mind. With all the examples above, it is easy to map them to the idea of having nodes and edges, objects and connections or items and their relationships. While it's certainly easy to represent these relationships within data structures for computational purposes, their comes a time when the data wants to be seen as a whole, to easily find meaning at a larger scale than the individual connections represent.

What is needed, then, is a way to visually represent these relationships on a screen. There are a lot of approaches you can take for laying out information on a screen, from a hyperbolic view to the tree format most of us are familiar with in Windows Explorer. These are quite different solutions, each with their own pros and cons.

Visualization goals

So what is the best way to represent data visually? I think that's a subjective question, but we can put together a list of a few important issues. Scalability is the ability to be able to handle a small amount or large amount of data using the same visualization. Expandability, to me, is the ability to change the scope of your view -- whether you want just a small subset or the whole domain of your data. Readability refers to the ease in which the user can find the data on the screen, and the ability to understand the relationships between the data. Navigability addresses the users' interface to change their view.

Of the two approaches mentioned above -- the hyperbolic view and Windows Explorer -- I certainly like the hyperbolic view the best. it's scalability is excellent, being able to map huge amounts of data onto the hyperbolic sphere. The expandability is lacking, which is my biggest complaint with it; there's only so much data you can visualize at one time, and because of the edge-receding nature of the hyperbolic sphere, the "usable" space is a bit limited. Readability is great, as all child nodes are evenly arranged around the parent, connected by edges. Navigability changes with the implementation, but generally it's done well, with the ability to rotate the sphere and center a node easily with the mouse.

The Windows Explorer directory tree (I don't really have a good name for it, though I'm sure there is one), is not so great. Scalability is infinite, limited only by the computer's capability. Expandability is quite good, actually, because the main concept of the tree is the expand and condense buttons -- the plus and minus boxes beside the directory names. Readability is average -- depending on the amount of content under a parent node, it can sometimes be difficult to find the hierarchy to a given item. Quite different from the hyperbolic sphere, the Explorer tree displays children as objects indented and below the parent (some implementations of the same idea use horizontal and vertical lines to connect them further, which can sometimes help, but sometimes hinder, depending on the depth). Navigability is poor, in my opinion. Quite often, you must use a scrollbar to move about the current view, only to have to use it again if you expand or contract any node.

So is there a perfect system? Probably not, as some of the issues above can be mutually opposing. So, what's the best? As I said, I do like the hyperbolic sphere, but I admit I don't know the math behind it. Sure, I could learn it, but one of the other features that I'd like is one I haven't mentioned before -- self-arranging.

Self-arranging data

My first desire to implement a visualization system came from thinking about internet routing, probably spurred by a Map of the Internet that I saw in a grad student's cubicle. Two things bothered me about it, though: it was static (it was a poster, after all), and unnavigable (it's pleasing to look at, but doesn't relay much information at all). What if I were to build my own?

Thinking about how I'd collect the data, I of course thought about how to display it, and what kind of interface I'd want. Layout was important, because unlike the Map of the Internet, I wanted to be able to read every name on every point in my map, which meant I had to be able to zoom in to a readable "depth", and to arrange the data so it was readable. And I wanted to see it collect the data, live. This meant that it had to rearrange the layout as new data was found. But I didn't want a sudden change when the data was all-of-a-sudden rearranged. I wanted to see it "grow", and see watch the new information be arranged in the existing data set.

Masses and Springs

A co-worker put me onto the idea of masses and springs -- to represent my data as a universe of objects that affect each other by a set of physics. The masses repel themselves, ensuring that they don't sit too close to themselves, and thus are easily viewable. The springs represent relationships between the objects, and thus hold them to each other, ensuring that they don't repel themselves too far that their relationship can't be seen. Just drop the objects into the universe, and let physics sort them out! Introduce a new object, and let it settle in with the others as the rules of the universe require.

This has worked out great, for the most part. While this is still a work in progress, and thus these pages can continue to change, a lot has been done before I sat down to write this. As a matter of fact, the implementation I'll be discussing here is my fourth attempt at such a system -- the first two were in Java, the third in Proce55ing, and the fourth again in Java. Why Java? I think it's a great language. It's code is cleaner than C++ (in readability), it has an enormous set of libraries, and is highly portable. C, my first love, doesn't have the object-oriented features I want. C++ would require me to learn how to render graphics on some platform, and then again on another. Sure, I could use QT, and OpenGL, and in the future I might. But as it is, I'm quite comfortable putting pixels in a Java window, where I'm not in C++. Actually, my next implementation is likely going to be in C#, as I figure this library is probably a good body of code to learn from. And I do plan on rewriting it in something other than Java, because Java eventually hits a performance peak which I can avoid with a compiled language.

Why so many versions already? my first version had some really bad physics and ran very slow. It was also badly designed, in that I didn't abstract some of the concepts well enough for expansion. It was a nice proof of concept, and was nice to watch. The second version was an attempt to properly apply OOP practices, but never got to a point where it was useable. Too much second-guessing on whether this or that should be a class, an interface, etc. The idea of visualization lay dormant for a while after that. I think it was Wired magazine that pointed me to Proce55ing, and I gave it a whirl. Proce55ing's a nice little system, and I threw together a 2D and 3D implementation in no time. Proce55ing, though, was limited in its library, so I went back to Java and redesigned the system once more.
^--systems--^ implementation-->

©2002-2017 Wayne Pearson