Supporting Collaboration through Multimedia Digital Document Archives

9 Advanced Developments

The examples given so far have been chosen to demonstrate the use of multimedia digital formats, CD-ROM technology and Internet communications that are routinely available and simply used without computer programming. This section given an overview of related developments that show that what is already being done, no matter how significant in its own right, is only the early beginnings of the use of digital multimedia technology to support collaborative communities. Experimental systems are already in operation that are a natural evolution of World-Wide Web and yet are revolutionary in the capabilities they offer. To implement such systems currently requires some degree of programming capability which the examples in the preceeding sections did not.

9.1 The Advent of HTML Forms

In March 1993 WWW was still being presented (Berners-Lee, 1993b) as primarily a hypermedia retrieval system, but in November that year a development took place that so changed the nature of WWW as to constitute a revolution in its own right. Andreessen (1993) issued NCSA Mosaic version 2 using SGML tags to encode definitions of Motif widgets embedded as forms within a hypermedia document, and allowed the state of those widgets within the client to be transmitted to the server. Suddenly the WWW protocols transcended their original conception to become the basis of general interactive, distributed, client-server information systems.

Figure 9.1 shows the client-server architecture of World-Wide Web as it has already been described in this article. A client accesses servers on the Internet using various protocols. It communicates with various helper applications that extend its functionality. When it accesses a World-Wide Web server using the HTTP protocol that server can also access various helper applications through server gateways.

Figure 9.1 Client-server architecture of World-Wide Web

What changed with the advent of the forms capability was that the client in Figure 9.1 became able to trasmit structured information from the user back to an arbitrary application gatewayed through the server. The server could then process that information and generate a HTML document which it sent back as a reply. This document could itself contain forms for further interaction with the user, thus supporting a sequence of client-server transactions.

In essence, the adoption of an SGML tagged encoding schema allows HTML documents to include not only text, typographic and multimedia material but also to carry arbitrary additional data through simple, backwards compatible extensions. The HTML DTD is currently being standardized at four levels (Berners-Lee et al., 1994):

Level 0 functionality can be supported on an alphanumeric terminal. Level 1 adds typography and pictures. Level 2 adds embedded GUIs. Level 3 is still somewhat open-ended and will evolve through prototype implementations (Raggett, 1994).

9.2 Graphic User Interfaces through HTML Forms

Figure 7.5 has already shown the genesis of using HTML documents as Graphic User Interfaces (GUIs) through hypertext links attached to pictures. It is a level 1 document resembling, and having the functionality of, an iconic GUI. To the user it appears as a computer interface where clicking on one of 6 icons gives access to one of 6 services. The document generating Figure 7.5 is shown in Figure 9.2.
<TITLE>Faculty of Physical Education home page</TITLE>
<IMG SRC="pelogo.gif">
<H1>University of Calgary Faculty of Physical Education</H1>
<B><A HREF="calendar.html">
<IMG SRC="cal.gif" ALT=" "> Calendar Info</A></B>
<B><A HREF="memos/readmem.html">
<IMG SRC="memo.gif"> Faculty Memos</A></B>
<B><A HREF = "execute/runmail.jcl">
<IMG SRC="mail.gif"> Email</A></B><BR>
<B><A HREF="meet.html">
<IMG SRC="meet.gif" ALT=" "> Meet the staff</A></B>
<B><A HREF="mission/mission.html">
<IMG SRC="mission.gif" ALT=" "> Faculty Mission</A></B>
<B><A HREF="media.html">
<IMG SRC="media.gif" ALT=" "> Multi-Media</A></B>
Figure 9.2 Document generating Figure 7.5

Clicking on an icon in Figure 7.5 requests another document but this can have the side-effect of taking other actions at either server or client. For example, the "runmail.jcl" document in line 8 of Figure 9.2 runs a script in a job control application on the client that starts the user's normal email application and brings its window to the front.

The interface of Figure 7.5 is achieved though level 1 HTML hypermedia functionality. However, such applications suggested the provision of embedded GUIs as a natural extension to HTML. The level 2 forms extension enhances the capability of HTML documents to act as GUIs by allowing other widgets such as buttons, check boxes, radio buttons, popup menus, scrolling lists, and text entry boxes to be embedded. Figure 9.3 shows the memo entry facility selected through the "Faculty Memos" icon in Figure 7.5. The user can type information into what appears to be a normal GUI dialog box, and submit that information to the server.

Figure 9.3 Memo entry in a university information system

Figure 9.4 shows the HTML specification for the form of Figure 9.3. The text boxes are simply specified, and the ACTION field gives the URL to which their contents will be sent using the POST facility of the HTTP protocol. When the "Submit" button is clicked the client sends the contents of the form to the server which passes them to the program "post-memo" which files them. Users may retrieve memos through an index document with hypertext links to available memos.

<FORM METHOD="POST" ACTION="http://pe.cpsc.ucalgary.ca/cgi-bin/post-memo">
<H2>Enter the following information and press the "submit" key</H2> <BR>
<STRONG>From: </STRONG>
<INPUT SIZE=20 NAME="username">
<STRONG>Urgent:</STRONG>
<INPUT TYPE=CHECKBOX NAME="urgent" VALUE="yes"> <BR>
<STRONG>To: </STRONG>
<INPUT TYPE=RADIO NAME="to" VALUE="All" CHECKED>
<STRONG>All</STRONG>
<INPUT TYPE=RADIO NAME="to" VALUE="Faculty">
<STRONG>Faculty</STRONG>
<INPUT TYPE=RADIO NAME="to" VALUE="Students">
<STRONG>Students</STRONG>
<BR> <STRONG>Subject: </STRONG>
<INPUT SIZE=40 NAME="subject"> <BR>
<STRONG>Memo Text: </STRONG><BR>
<TEXTAREA NAME="Content" cols=50 rows=6></textarea> <BR>
<STRONG>Attachments:</STRONG>
<INPUT SIZE=40 NAME="attachments"><BR>
<STRONG>Number of days to keep active: </STRONG>
<INPUT SIZE=4 NAME="duration"> <BR>
<INPUT TYPE=SUBMIT VALUE="Submit">
<INPUT TYPE=RESET VALUE="Clear Form">
</FORM>
Figure 9.4 Form generating user interface in Figure 9.3

From a human-computer interaction perspective, what is significant about Figures 7.5 and 9.3 is that they represent graphic user interfaces operating on Unix, Mac, PC and other platforms, giving access to a functionality on a local or remote server operating on any platform supporting tcp/ip connections. Moreover, the user interface is programmed through a simple script that can be written in any text editor. It is this simplicity of development of interactive applications which has generated widespread interest in programming client-server applications through World-Wide Web in the short period since the inception of the forms capability.

9.3 Group Writing through HTML Forms

The examples above were chosen because they show the capabilities of HTML forms in the rapid development of conventional GUIs. However, the embedding of the user interface in high-quality documents offers the possibility of innovative applications in which the interface functionality emerges naturally from the application context. Figure 9.5 shows a screen from GroupWriteNet (GWN), a system for supporting collaboration in document production. GWN is an extension of the KSI group writing (Gaines and Malcolm, 1993) and active document (Gaines and Shaw, 1993) tools to operate across the Internet, and part of the development of a suite of tools supporting distributed scientific communities (Gaines and Shaw, 1994b, c).

GWN documents are stored in HTML format and can be accessed normally through WWW. A member of a registered group can request that a document belonging to that group be retrieved in annotatable form as shown in Figure 9.5. Each paragraph in the annotatable document is followed by a form allowing annotation to be entered and submitted to the server. Members of the group can then retrieve the document with each paragraph followed by hypertext links to the related annotation.

Figure 9.5 Annotation in collaborative writing

Figure 9.6 shows the HTML specification for the forms in Figure 9.5. Note the HIDDEN field in line 2 which contains an identification code for the annotator. The HTTP protocol is designed to be stateless so that the server does not in itself keep track of a sequence of related transactions with the same client. However, the required state information can be embedded in documents sent to the client so that when the server receives a request the associated service can determine the state of a transaction sequence. Since a client can save an HTML document and reopen it at a later time, this also allows transaction sequences to be suspended with the state information retained by the client. The fact that this is achieved through the normal local document storage mechanism also makes it a simple and natural activity for the user.

<FORM ACTION="ISC4.script" METHOD=POST>
<INPUT TYPE=HIDDEN NAME="Annotator" VALUE="BRGpGF9y">
<TEXTAREA NAME="ISC4.1.1-1" ROWS=3 COLS=60>
</TEXTAREA> <BR>
<INPUT TYPE=RESET VALUE="Clear Annotation">
<INPUT TYPE=SUBMIT VALUE="Send Annotation">
</FORM>
Figure 9.6 Annotation forms in Figure 9.5

9.4 Teaching Programming through HTML Forms

The level 2 HTML capabilities make WWW an excellent environment for developing client-server applications, and have triggered a wide range of innovative interactive systems. Figure 9.7 shows interaction with Ibrahim's (1994) system for teaching Pascal programming through the web. The student sees a Pascal program in a scrolling text area and can request the server to run the program, reporting the results, step through it, run to a breakpoint, and show the values of selected variables, allowing them to be changed. Breakpoints and variable to be displayed are defined by clicking on the appropriate program line.

Figure 9.7 Teaching Pascal over the web

9.5 Knowledge-Based Simulation through HTML Maps

Figure 9.8 shows Gruber and Gauthier's (1993) Device Modeling Environment being used in the simulation of a leak in the Space Shuttle's reaction control system.

Figure 9.8 Fault diagnosis over the web

When the user clicks on a component in the schematic at the bottom of Figure 9.8 the server returns a document giving information about its role in the device model linked to other information, such as its current state. This uses another HTML level 1 interface feature, the ISMAP attribute that designates an image as a `map' which when clicked returns the position of the cursor in the image to the server. The Device Modeling Environment system is also interesting because it uses a distributed server based on agents communicating through the kqml protocol (Finin, Weber, Wiederhold, Genesereth, Fritzson, McKay, McGuire, Shapiro and Beck, 1992) to make inferences about the model.

9.6 Concept Mapping through HTML Maps

Maps allow the web user interface to be extended with additional widgets provided a single click interaction is adequate. For example, Figure 9.9 shows the KSI's KMapServer using KMap (Gaines and Shaw, 1994a), the concept mapping tool shown indexing the GNOSIS CD-ROM in Figure 5.1, to provide access to a multimedia database. The difference between the applications is that in the CD-ROM situation KMap is running on the user's local machine whereas in the World-Wide Web situation it is running on the remote server. The server uses KMap to determine in which concept the user has clicked, treating the click as if it has occurred in a server window holding the map. This allows any concept map developed in KMap to be delivered as an interactive widget on World-Wide Web.

Attaching the concept mapping tool as a helper to the client allows local use of its full interactive functionality. Attaching it through a gateway to the server allows significant parts of its functionality to be offered through any client. Both are useful system architectures, and determining where to place such functionality is one of the interesting decisions to be made in designing distributed client-server applications.

Figure 9.11 Concept map server

9.7 Implementing MUDs through HTML Forms and Maps

The Multi-User Dungeons described in Section 6.5 can also be implemented using HTML forms for user interaction, and this opens up interesting possibilities for collaborative activities on the Internet. Figure 9.12 shows the initial screen from David Blair's MUD proving access to a large archive of multimedia material derived from his film, WAX. Users can register as developers and use forms to add material to the already massive WAX environment which contains more than 900 pages of hypertext in English, French and Japanese, together with 1500 photographs, 500 video clips and 2,000 sound clips.

The multilingual, multimedia, interactive facilities of WAX illustrate what has become possible in the support of special-interest communities through MUD-like systems on World-Wide Web (Rossello, 1994).

Figure 9.12 A MUD operating through World-Wide Web

9.8 Telepresence through HTML Forms and Maps

Servers can interact with the real-world, for example to support tele-presence at a remote site. Figure 9.13 shows the Mercury Project (Goldberg and Mascha, 1994) developed by the Departments of Anthropology and Computer Science at USC in which users tele-operate a robot arm moving over a terrain filled with buried artifacts. A CCD camera and pneumatic nozzle mounted on the robot allow users to select viewpoints and to direct short bursts of compressed air into the terrain. Thus users can "excavate" regions within the sand by positioning the arm, delivering a burst of air, and viewing the newly cleared region. The project seeks a coherent theory that explains the buried artifacts.

The capability to support telepresecence where the user interacts with real systems in a remote situation opens up many innovative applications of World-Wide Web. It changes the nature of the web from one of supporting discourse between people to one of supporting action in the real world. Experiments involving specialist equipment can take place at one site and be controlled and monitored at another. New forms of research collaboration can be explored.

Figure 9.13 Tele-operation of a robot arm

9.9 Developments in World-Wide Web Technology

World-Wide Web technology is evolving rapidly. Impediments to effective interaction are being removed as they are noted. In analyzing existing limitations one must distinguish between current constraints, those that will change in the near future, and those that are comparatively long-term. For example, lack of incremental updating of the screen will disappear in a short time as clients become more intelligent, lack of down-loadable interactive widgets will disappear as a standard is developed, and delays in communication between client and server will decrease as the information highway becomes a reality. However, there may always be users with minimally-featured clients and lower speed connections, so taking such limitations into account will remain significant for many years.

The technology involves four main areas of human-computer interaction:

The client-server computing issues most strongly affecting the usability of WWW are: While it is good design practice to factor out the human interface from the application functionality, communication delays on the web make it impossible to implement what has become standard practice in local application design. Designers have become used to programming the state of the interface to change as the user interacts with it so that its affordances accurately track meaningful user choices. They have also become used to providing rapid context-sensitive help that does not disturb the interaction. Neither of these is possible within the HTML level 2 specification, although the SGML tagging could allow an indefinite amount of user interface programming to be communicated to the client. Raggett (1994) is addressing this in the HTML level 3 specification through the provision of a script language for controlling the dynamic features of the interface.

In future developments it would be simple for clients to support script languages of the power of tcl and tk (Ousterhout, 1994), which would allow interactivity to be programmed equivalent to that of local operation. In particular this would allow widgets to be programmed offering the full variety of click and drag interfaces. There are security problems in clients supporting scripts with full access to file and operating system facilities, but these can be overcome through the use of trusted interpreters with well-defined restrictions such as those proposed for General Magic's Telescript (Knaster, 1994).

The stateless transaction protocol adds an interesting design factor to World-Wide Web systems in that state information is normally associated with the computations taking place at the server rather than with the user interaction taking place at the client. However, one soon becomes used to passing state information back to a client, and making provision to store it there through operations that are natural to the user. This would be aided if clients made better provision for local action to be taken as a result of user interaction with a document. Essentially, one needs to be able to post forms to a local mini-server that can interact directly with the client. This is becoming easier as clients come to support scripting through inter-application protocols.

Another addition to client functionality that would greatly improve interactivity is the support of incremental updating of the screen. Much could be achieved by the normal flicker reduction technique of writing new data into an offscreen bitmap and refreshing the complete screen. World-Wide Web was designed to retrieve a succession of different items of information, but it is being used increasingly in applications such as those illustrated in Figures 9.9 and 9.13 where the user envisions a screen of changing information not a succession of different documents.


Contents, Previous Section, Next Section.