Supporting Collaboration through Multimedia Digital Document Archives

6 Collaboration through the Internet

The interconnection of large numbers of computer networks across the world to form the Internet has provided a communication medium supporting many forms of discourse in geographically dispersed communities. Since digital networks are content-neutral, they provide the means for disseminating multi-media of arbitrary type in a wide variety of formats. Since the networks provide access to digital storage resources, they provide the means for establishing on-line archives of multimedia material. Since the networks operate in real-time, they provide the means for a wide variety of forms of interaction. Since the networks also provide access to computing resources, they provide the means for offering a wide variety of computational services from those for indexing and information retrieval, through those for data analysis, to those for intelligent agents.

The growth of public interest in the Internet has given rise to a proliferation of guides to the net and its services. Krol's (1992) Whole Internet is a good starting point for those setting up services for a professional community. The following sub-sections give an overview of Internet facilities relevant to supporting collaboration.

6.1 Discourse on the Net--Electronic Mail

The foundation for many other services on the Internet is the capability to send electronic mail (email) from one individual to another. This is possible because each machine on the net has a unique address, and each person with an account on that machine has a unique name on that machine. Thus, point to point transmission can be supported in which a digital message is sent to a specific individual and placed in their digital mailbox. Because the message is digital it can comprise not only text but also multimedia attachments such as pictures and sounds. Because the addressee is a computer file it is possible for the effective addressee to be a computer program that takes action based on the mail, thus providing a basis for many services.

Mail readers provide a user interface to the mail system and a wide range of facilities for sorting, filing, indexing and filtering mail. The serious use of email requires the use of a mail reader with such facilities if a user is not to become overloaded in managing their mail. Figure 6.1 shows POPmail on the Macintosh being used to access some person-to-person mail relating to GNOSIS communications. At the top are a set of function icons for replying to mail, saving it in a separate file, fetching it from the server to the local machine, printing it, initiating a new item, recycling it to one of a set of user-named archives, and returning to the current message browser. Beneath these is a scrolling list of messages indexed by source, subject and date, and sortable on any one of those keys. The selected message is shown in the scrolling window below. It consists of a from, date, to and subject header, followed by the message. The indented portion of the message was automatically inserted by the "Reply" function so that some portion of the message sent can be used to provide a context for the reply. The three `signature' lines at the bottom are also routinely inserted.

POPmail provides facilities for searching mail by content, for archiving it under user control, and automatically unpacking other files sent as attachments to the mail. More complex mailers such as Eudora also provide facilities for automatically filtering mail to different archives by source, subject line or content.

Figure 6.1 Accessing mail related to GNOSIS archived through a mail reader

6.2 Discourse on the Net--News

The News facility on the Internet provides a parallel service to email in which messages are not sent to an individual but to a community, again with a unique name but this time specifying a news group. Messages sent to a news group are distributed to all sites on the net that maintain news archives. Users at each site access these archives to read news items in much the same way as they access their mail boxes to read email. Multimedia attachments can be incorporated in news messages but this is not normally done since the volume of news is already a burden on networking resources--it is better to provide an address from which such material may be fetched using the file transfer protocol. Because of the volume of news and the way that is moved from site to site, it may take several days for a news message to reach a site remote from the sender, which limits its utility in supporting spontaneous discourse.

Figure 6.2 Accessing News groups through a news reader

The volume of news is very high with many groups receiving several hundred messages a day, and it is again important to use a reader that provides an excellent user interface for managing access to news. Figure 6.2 shows NewsWatcher on the Macintosh being used to access an item in the group comp.ai. The window at the top left lists the number of items unread in a list of user-specified news groups that interest the user. The group comp.ai has 47 items and when it is double clicked the window to the right appears giving a list of those items by source and topic, grouped by the threads of initial item and related replies. When the item posted by Dickson Lukose is double clicked it opens in the window at the bottom of the Figure, giving information on a conference relevant to GNOSIS's knowledge systematization theme.

It is important to use a news reader that gives easy access to the news tracks through simple browsing mechanisms that support selective browsing by relevant groups and by content. Internet News is a valuable resource but the sheer volume of news can be overwhelming if the user interface is inadequate. Good news readers provide simple mechanisms for printing and filing interesting news items similar to those of a mail reader.

6.3 Discourse on the Net--List Servers

Email supports private individual discourse, and News supports public broadcasting to major communities at large. List servers provide an intermediate service supporting selective dissemination of private information to specific communities. The principle is simple--an email message is sent to an address which is not that of a person but rather that of a list server which resends the mail to a list of email addresses maintained on the list server machine. To the person sending the message it presents a convenient way of sending mail to all members of a specific community. However, many benefits accrue from the use of a list server: Since list servers use electronic mail, access to them is through the same mail readers as used for person-to-person email. Figure 6.3 shows POPmail being used to access mail sent by one partner to all partners through the list server. The only difference from the person-person-mail of Figure 6.1 is that the "To" field in the mail itself specifies that it is to "gnosis-all@IPA.FhG.de" which is the address of a list server at the Fraunhofer Institute in Germany. The list server resends the mail to all those GNOSIS partners on its list.

Setting up a list server has proved to be a very effective way of supporting collaboration in geographically dispersed communities through the Internet. However, while the principle of resending mail to a list is very simple, there are many pitfalls for the unwary, and it is important to use a well-designed list server of which many are now available. A good list server incorporates techniques to:

Figure 6.3 Accessing mail through a GNOSIS list server

6.4 Real-Time Discourse on the Net--Internet Relay Chat

The speed of Internet communications between many sites is such that electronic mail reaches its destination within a minute or so of being sent, and hence some degree of real-time, `conversational' interaction is supported. In some situations, an even more rapid `conference' mode of interaction is desirable, and this is supported through a number of chat facilities.

Internet relay chat (IRC) is the major system supporting conferencing world-wide (Pioch, 1993). A conference participant runs an IRC client program on his or her workstation which communicates with one of the major IRC relay sites that coordinate communication across IRC world-wide. Anyone can set up an IRC conference by defining a named IRC `channel', and specifying the terms of membership to be public or private to an invited group. One joins the conference through a simple command, sends messages to it a line at a time, and sees these lines from all participants appear in a log screen shortly after they are sent.

Figure 6.4 shows IRCLE, a Macintosh IRC client, connected to the #Macintosh conference on IRC which is used by Macintosh users on IRC to discuss problems, new software, and so on. Each user specifies a nickname which appears in angle brackets at the beginning of their messages to identify the source. As shown in the top window, the order of messages is chronological and multiple simultaneous conversations become intertwined which can be confusing but supports spontaneous interactions in a way that a more formal protocol would not. The small window at the bottom is where the local user types messages and commands.

IRC discourse can be logged by any user, and there are famous logs available on the net relating to Gorbachev's disappearance, operation Desert Storm, and so on. There are also interesting sociological studies of the cultures on IRC, behaviors of participants, and so on (Reid, 1991).

Figure 6.4 Real time discourse through Internet Relay Chat

6.5 Real-Time Discourse on the Net--MUDs

Multi-user dungeons (MUDs) are, as the name suggests, environments for playing fantasy games originating from dungeons and dragons. However, the technology is increasingly being used to support a variety of professional community activities (Curtis, 1993). The basic MUD concept is to provide a spatial metaphor for the discourse--the MUD equivalent of an IRC channel is a `room'. The spatial metaphor extends naturally from participants to objects, and this provides a natural interface to other Internet services. A `library' might contain a set of `books' that can be `read' by a participant. Asking what is in the room corresponds to requesting a directory listing. Asking to read a book corresponds to open the corresponding file in a text reader. The MUD provides a conceptual interface to Internet services which is natural in terms of every day experience. There are again interesting sociological studies of cultural formations in MUDs (Reid, 1994).

Astro-VR is a MUD for use by the international astronomical community:

"The system is intended to provide a place for working astronomers to talk with one another, give short presentations, and otherwise collaborate on astronomical research. In most cases, this system will provide the only available means for active collaboration at a level beyond electronic mail and telephones. Initially, Astro-VR will provide the following facilities of interest to our user community:

A MUD may be used as a convenient conferencing system by a collaborative group, such as a system support team:

"We have found that the MUD is an effective way to hold pre-arranged meetings for people who can't be in the same physical location. We save a transcript of the meeting and email it to people who weren't present. It's common for us to have a five-minute conversation on the MUD about a small systems issue. Previously, these conversations would have happened through slower email, through office visits, or at regular systems meetings. All of these mechanisms are more cumbersome, and would have happened much less frequently. Thus the MUD has enabled new communications patterns." (Evard, 1993)

Because of their association with game playing, both IRC and MUDs are often neglected as resources for the support of professional communities. However, MUDs in particular offer an attractive way of managing access to Internet resources through a natural spatial metaphor, and they, or their derivatives, can be expected to play an increasingly significant role in the future. The main limitation currently is the text-based interface, and this is being overcome by the development of MUDs on the World-Wide Web using the interactive color graphic interfaces available to the web.

6.6 Multimedia Archives on the Net--File Transfer Protocol

The availability of the Internet File Transfer Protocol (FTP) has been the major basis for supporting multimedia archives on the net. The protocol allows a user at one machine to request that files be transferred to or from any other machine on the Internet. To provide normal security a user must supply an account and a password to the remote machine corresponding to the capability to log in to that machine. However, it has become conventional to establish an `anonymous login' account whose account name is `anonymous' and whose password is symbolic which has access to a well defined sub-file system that is publicly accessible. The symbolic password requested is generally the email address of the user which can be used to monitor access. Some systems check that a syntactically correct email address is given, others that the domain corresponds to that of the calling machine, while others perform no checks at all. It is generally accepted that anonymous FTP access is open to all. The only controls are to restrict the number of simultaneous anonymous accesses allowed at popular sites whose Internet communications might otherwise become overloaded.

The Unix file transfer protocol is simple and it is supported by many different tools. For example, an email or news tool may support anonymous FTP access to files specified within email or news items. A good FTP tool will provide a user interface to email similar to file directory access on the local machine, a user-definable directory of commonly addressed sites, and facilities for unpacking files sent in one of the common compressed formats. Figure 6.5 shows Fetch on the Macintosh being used to access a sub-directory of the GNOSIS archives containing transcripts of presentations given by officials of the US Department of Commerce at an IMS meeting in Dallas. The directory of files on the remote machine on the left looks like a local Macintosh directory, file transfer is effected through simple "Put" and "Get" buttons, and the status of a transfer in progress is shown on the right.

In many scholarly communities large digital archives have already been established which fulfill the role previously filled by the circulation of paper preprints (Anderson, 1991). The high-energy physics archive contains over 10,000 papers and has become the major source of recent results for the associated community research community. Archives exist across all disciplines including poetry, philosophy, sociology, economics and mathematics. As well as providing easy circulation of preprints, they have also been the basis of successful `electronic journals' that are not published in paper form (Gaines, 1993; Manitoba, 1993).

Figure 6.5 Access to document archives through the File Transfer Protocol

6.7 Indexing Archives on the Net--Archie

The growth of the digital publication and archiving on the Internet has been such that finding specific material is a major problem. There is no directory for all machines on the Internet, but it is common for individual disciplines to attempt to keep track of all the anonymous FTP archives relevant to that discipline. Such list are generally published through the relevant news groups or list servers. There are also tools that attempt to index material on the net in a variety of ways.

Archie is a system for indexing material at anonymous FTP sites and allowing users to search for specific documents by the content of their titles. There are a number of Archie servers world-wide that index different collections of machines. They may be accessed through a variety of tools which usually provide not only search facilities but also the capability to transfer and decompress files once their location is found. Figure 6.6 shows Anarchie on the Macintosh being used to search for files on STEP, and to access files on Ontolingua. Anarchie provides similar FTP facilities to Fetch, and the window at the top is a user-defined list of commonly accessed FTP sites which may be accessed directly without search. The two windows below it are ongoing searches on two different servers for files whose name contains the string "STEP". The bottom window is the result of a previous search for files whose name contains the string "ontolingua". A directory and two files have been found at Stanford. An FTP transfer may be initiated by double-clicking on one of the file names.

Figure 6.6 Searching document archives using Archie

6.8 Indexing Archives on the Net--Gopher

Gopher is a system for indexing material on the Internet not by name but by cataloguing it in arbitrary hierarchical indices. It was originally developed at the University of Minnesota to provide a campus-wide information system (CWIS), and such use is common across universities in North America. However, the Gopher cataloguing scheme proved so simple and yet so powerful that it has been widely adopted for indexing material on the Internet world-wide. The power of Gopher comes from the structure of its catalogs which are documents containing a title of a document, the type of document, the Internet address of the machine on which it is held, and a file path to the document on that machine. Document types include text, various graphic formats, PostScript, and Gopher catalog documents--this last providing the basis of hierarchical indices with parts scattered over many machines world-wide. Thus a researcher in France might maintain a Gopher catalog of manufacturing research reports from French and German research agencies. A researcher in Japan might set up an index to manufacturing research reports world-wide and include those in France and Germany by simply including a pointer to the French researcher's Gopher catalog.

Figure 6.7 shows TurboGopher on the Macintosh being used to access Internet archives. The window shown is the top level one at the University of Minnesota, and the origins as a CWIS are apparent. However by double clicking on the catalog item "Other Gopher and Information Services" one opens a second-level catalog of areas of the world that have Gopher sites. Double-clicking on an area opens a third-level catalog of sites, and so on until one reaches files which are fetched. Gopher has an interface to Archie which can be used to search as described above. The Gopher network is itself indexed by Veronica which can be accessed through Gopher to search for named items in Gopher space in the same way as Archie does for FTP space.

Figure 6.7 Accessing document archives through Gopher

6.9 HyperMedia Archives on the Net--World-Wide Web

It will have been noted that the tools described have been cumulative in their functionality--Archie supports FTP, Gopher supports FTP, Archie and Veronica. The various Internet protocols and facilities are not competitive but complementary and cumulative. The ultimate cumulative system currently is the World-Wide Web which subsumes all the protocols so far described, and adds to them hypertext links between active documents that support general client-server computing through graphic user interfaces.

World-Wide Web was conceived by Berners-Lee in March 1989 (CERN, 1994) as a "hypertext project" to organize documents at CERN in an information retrieval system (Berners-Lee and Cailliau, 1990). The design involved: a simple hypertext markup language that authors could enter through a word processor; distributed servers running on machines anywhere on the network; and access through any terminal, even line mode browsers. World-Wide Web today still conforms to this basic model.

A poster and demonstration at HT91 in December 1991 announced World-Wide Web to the computing community. However, major usage only began to grow with the February 1993 release of Andreessen's Mosaic for X (Andreessen, 1993). Whereas the original proposal specifically states it will not aim to "do research into fancy multimedia facilities such as sound and video" (Berners-Lee and Cailliau, 1990), the HTTP protocol for document transmission was designed to be content neutral and is as well-suited to multimedia material as to text. The availability of the rich X-Windows environment on workstations supporting color graphics and sound led naturally to multimedia support, although the initial objective of meaningful access through any terminal was retained. Most web material can still be browsed effectively through a line mode browser.

Figure 6.8 shows MacWeb on the Macintosh providing access to the GNOSIS archives encoded as a hypermedia document collection. The GNOSIS final report has been fetched from a remote server across the Internet. It appears on the screen, and can be printed, with the typography, layout, and embedded colored diagrams expected of a high-quality document processor. It supports embedded hypertext links which can be used to access other documents across the net. For example, the GNOSIS logo near the top of the page in Figure 6.8 is an embedded picture. The underlined term "Section 8" at the end of line 3 of the Overview is a hypertext link. Clicking on the underlined term causes the document referenced to be fetched, in this case Section 8 of the report which itself has hypertext links to the other GNOSIS technical reports.

Figure 6.8 Access to hypertext multimedia document archives through World-Wide Web

One can now perceive an evolutionary sequence: FTP can fetch documents across the net for viewing in other applications; Gopher can fetch and display document catalogs and simple textual documents; World-Wide Web can fetch and display documents with typography, embedded images, and embedded hypertext links to other documents. Also the functionality is cumulative in that World-Wide Web can FTP a document for another application and display Gopher catalogs. Figure 6.9 shows the Gopher catalog of Figure 6.7 being accessed through World-Wide Web hypertext links. And World-Wide Web is itself evolving to include general two-way interaction through active documents providing a client-server computing environment on the Internet (Section 9).

Figure 6.9 Access to Gopher through World-Wide Web

6.10 Growth of Discourse on the Net

This cumulative evolution of services has led to massive growth in the use of the Internet to support access to information. Figure 6.10 shows a plot of the data transfer statistics for Gopher and World-Wide Web collected by Merit NIC Services from the NSFNET backbone traffic. The exponential growth of World-Wide Web usage in the past 18 months is apparent, with Gopher showing similar growth a year ago but increasing at a lower rate as the web became widely accessible. Figure 6.11 includes the FTP transfers also. These dominate in volume because they are typically used to transfer large PostScript and data files. This difference in usage makes the significance of World-Wide Web growth even more apparent--it is primarily responsible for the growth of commercial and government interest in the potential of the information highway.

Figure 6.10 Growth of data transferred through Gopher and World-Wide Web

Figure 6.11 Growth of data transferred through FTP, Gopher and World-Wide Web

The statistical plots of Figures 6.10 and 6.11 were themselves obtained through World-Wide Web from a server that James Pitkow has developed at the Graphics and Usability Center of Georgia Institute of Technology ( http://www.cc.gatech.edu/gvu/stats/NSF/merit.htm).


Contents, Previous Section, Next Section.