Supporting Collaboration through Multimedia Digital Document Archives

5 Archiving a Research Program on CD-ROM

At the end of the first year of the GNOSIS project, the project reports, slide sequences, photographs, movies, and so on, were issued on a CD-ROM. To enable widespread access to the reports, a hybrid format was used that could be read on Macintosh, Windows and Unix platforms, and all the reports were issued in Microsoft Word, Farallon Replica and Adobe PostScript formats. The total volume of material on the CD-ROM was: 57 reports totaling 1590 pages, each in 3 formats; 11 movies totaling 70 minutes; plus software and maps of the material. 3 test masters were made of the CD-ROM using a Philips CDD521 recorder and OMI QuickTOPiX software. The third one was sent to a CD pressing company that produced 500 copies in 10 days at a cost of $1500. The second test master was very close to the final one and was sent off at the same time to a Japanese partner in GNOSIS who was scheduled to demonstrate the GNOSIS materials at a major IMS conference shortly after the production versions would be available.

The combination of high-speed production of one-off CD-ROMs and low-cost production of quantity CD-ROMs provides an extremely powerful and flexible technology for the dissemination of technical material. In particular, the space required to archive CD-ROMs is very much less than that of the equivalent paper documents, and the cost of mailing them is also very much lower. Additionally, the CD-ROM can contain colored diagrams, photographs, sound movies, simulations and other software, that it would be very much more expensive to issue in other formats.

5.1 Read-Only and Speed Constraints

The various technologies for digital document and CD-ROM production are now mature enough to be highly significant in the issue of multimedia materials. However, they are changing very rapidly and it is probably useful to go through the stages of production, the decisions made and technologies used at each stage, the details of what has already changed and the forecasts of what is expected in the near future.

The CD-ROM technology in itself is extremely simple. A CD-ROM for the Macintosh is a bit-by-bit image of a normal magnetic drive and hence appears to the user as a disk drive whose only peculiarities are that it is read-only and somewhat slower than a magnetic drive--typically 150KBytes/sec for a so-called `single-speed' drive, 300KBytes/sec for `double-speed' and so on. To a large extent one can treat the preparation of a CD-ROM exactly as if one were loading the document archive onto a standard magnetic drive.

The `read-only' limitation means that one has to be very careful to prepare the CD-ROM accurately. Errors cannot be corrected, although it is possible to `update' a CD-ROM by using software that checks an associated magnetic drive for updates before it reads files from the CD.

The speed limitation (compared with a typical magnetic drive speed of 1.5MBytes/sec) means that one has to be very careful to avoid multiple disk accesses in fetching one item of information. `Defragmentation' software needs to be run before an image of a magnetic drive is transferred to CD-ROM so that directory and file structures are in contiguous segments and not scattered on different tracks. This can make the difference between an access of 1 second and one of 30 seconds which greatly affects the usability of the CD. There are also more subtle impacts on speed such as the need to demount a hard drive on a Macintosh before transferring its image to CD-ROM. This ensures that the status information about the drive indicates that the directory structure is correct and avoids a long delay while the Macintosh operating system checks this when the disk is mounted. Good quality CD-ROM preparation software checks for fragmentation and demounts the disk before it commences the irrevocable process of writing to a one-off CD-ROM.

Another impact of the speed limitation with multimedia material is the need to ensure that it can still be read effectively at the lower data rate of the CD-ROM. As noted above, programs such as MovieShop and Premiere have options to filter a digital movie to reduce the data rate required to play it to a sufficiently low value to enable it to be played from a CD-ROM. The typical data rate specified for a single-speed CD is 90KBytes which gives time for both data access and movie decompression. The filtering essentially reduces the effective frame rate and lowers the quality of reproduction, but this is far preferable to the erratic stop-start playing of both sound and video if the data rate is inappropriate to the media.

5.2 Multi-Platform Formatting

CD-ROMs for the Macintosh, PC and Unix platforms can each be based on bit-by-bit images of the file structures on the standard magnetic drives for the respective platforms. However, these structures are very different between platforms so that some other approach is necessary to create a single CD-ROM that can be used on all three platforms. One approach is to produce a CD-ROM in ISO 9600 format which closely resembles the MS-DOS file format and can be read by CD-ROM drive software on all three platforms. This is a common format for many cross-platforms CD-ROMs containing largely textual data such as programs.

The problem with the ISO 9600 format is that Macintosh users in particular are used to accessing their disks through a the spatial layout of an iconic `desk top' with descriptive file names in upper and lower case, and these capabilities are lost in the MS-DOS 8 character name plus 3 character extension, uppercase file naming format. Fortunately, it is possible to produce hybrid CD-ROMs whose directory structure appears as a normal desk top database to the Macintosh system, and as an ISO 9600 MS-DOS name structure to the DOS, Windows and Unix operating systems. This possibility is not a design feature of the operating systems but rather a convenient `hack' made possible by the differing conventions of where the different disk drive software expects to find directory information.

5.3 Portable Document Formats

The choice of document formats for the issue of archives on CD-ROM is not an easy one. If the documents are to be reused it is helpful to issue them in a common word processor format such as Microsoft Word or Novell WordPerfect. Most readers will have a word processor that either reads such documents directly or can import them while preserving most formatting information. However, word processed documents are designed for editing and reformatting, and do not preserve their pagination across different platforms and paper sizes. One rarely obtains an identical document when one exports between word processors, across the same word processor on different platforms, or between countries using different `letter' paper sizes. There is also the problem that the word processors commonly used on the Macintosh and PC are rarely available on Unix workstations.

These problems have prompted the development of new technologies and products for portable electronic documents that can be read and printed across all platforms. The concept is to provide a system that:

There are now a number of products that support this concept, notably Adobe Acrobat, Farallon Replica, No Hands Common Ground and Novell Envoy. They vary greatly in maturity, capabilities, royalty arrangements, and the state of both products and the nature of market is in flux (Seybold, 1994). Portable document support is seen as a growth market but it has been very difficult for commercial organizations to find a product and royalty structure that matches the needs of customers and is profitable. For example, when Adobe introduced Acrobat it was first to market but required payment for even the simplest viewer as did No Hands for Common Ground. This limited the utility of sending out documents on the Internet or CD-ROM since there was no guarantee that the user would have a viewer and it was expensive for the document supplier to provide one. Farallon overcame this with Replica by offering a free viewer and putting it up for FTP on the Internet for ease of access. No Hands rapidly followed suit, and Adobe has done the same with its release of Acrobat 2. Novell has done something similar with Envoy but only supports a free viewer integrated with a document which limits cross-platform portability--one has to prepare and issue one document for the Macintosh and one for the PC.

For CD-ROMs for which payment is made, it is attractive to consider the extended viewers which offer various forms of indexing, hypertext linking, document corpus management, and extension capabilities through application program interfaces (APIs). The cost of the associated tools for document preparation and the royalty arrangements for issuing the viewers on CD-ROM vary greatly, and are changing with time. It is necessary to check the current state of each vendor's product each time one makes a CD.

On more complex material, such as slides from a presentation package such as Adobe Persuasion, it is important to check the quality of document conversion both on the original platform and cross-platform. The usual mechanism for conversion to a portable document is to insert a specialist `printer driver' that emulates the actual printer driver on a particular platform and operating system. The document generation process then consists simply of `printing' to the specialist driver. However, it is not a simple task to capture complex color layouts into a portable document format from an arbitrary range of packages which have in common only that they can produce printed output. The existing products still have `glitches' in the conversion of some material, and careful quality checking is required.

There are also other alternatives for portable documents. Frame Corporation's FrameMaker has an output format that is portable across Macintosh, PC and Unix platforms, but its royalty costs for a reader are currently high and one is restricted to documents in FrameMaker. The Standard Generalized Markup Language (SGML) makes provision for Document Type Definitions (DTDs) that allow structures in documents to be marked up in a way that supports portability not only across platforms but also across different formats and media (Bryan, 1988; Goldfarb, 1990). The HyperText Markup Language (HTML) used for World-Wide Web documents is a DTD in SGML (Berners-Lee, Connolly and Muldrow, 1994; Raggett, 1994), and it is attractive to use it as a portable document format on CD-ROMs also since there are public domain browsers available for all platforms.

Adobe PostScript is also a useful cross-platform document format since it can be produced by most document systems, printed by most laser printers, and there are public domain browsers for all platforms. There can be portability problems with arbitrary PostScript files produced on different machines using different software, but portable PostScript is readily produced in the same way as the other portable document formats by the use of proper tools and careful quality checking. Its primary disadvantages are the comparatively large size of the files produced and the lack of search and clipping capabilities. The specially formatted and compressed PostScript of Acrobat is intended to be the preferred alternative but, until it is in widespread use, raw PostScript is still useful.

When the CD-ROM described in this section was produced, Farallon Replica had the only royalty-free viewer, and documents were issued in Replica, Word and PostScript formats. The seven final reports occupied 2.3 MBytes each in Replica and Word, and 4.7 MBytes in PostScript. Thus, the storage required for all three formats is about four times that for Word or Replica alone. The storage requirement was not significant in this application because the 1590 pages of reports take up only some 10% of the CD-ROM capacity when issued in all three formats. Video material usually establishes the dominant requirement for storage.

5.4 Indexing

One factor significant to the usability of a CD-ROM is the type of indexing provided. It is also very relevant to the CD-ROM as a presentation medium. The `corporate image' projected by access to material through a simple Unix or Windows directory is not the same as that projected by access through some multimedia navigational tool that provides a user-friendly contextual map of the material. There are commercial tools that index document collections by content and provide an information retrieval interface to the material. These are appropriate to reference materials that need to be searched by content. However, they again present a rather `technical' and unattractive interface, and even when they are used it is appropriate to provide higher-level maps of the material.

Apple's HyperCard on the Macintosh and Asymetrix's ToolBook on the PC provide a basis for developing attractive multimedia access to a CD-ROM of multimedia materials. Both products support graphics, color, sound, video and direct manipulation interfaces. They are able to open files in other applications such as Word or Replica, and hence can be used to access heterogeneous materials using the native application. While the products do not operate cross-platform, is possible to develop stacks in HyperCard that are virtually identical in appearance and functionality to those in ToolBook, and vice versa, so that a users can be largely unaware of cross-platform variations.

For the GNOSIS archives, it was appropriate to use as an indexing tool Mediator (Gaines and Norrie, 1994), a system that had been developed to support collaborative activities across the network as part of the GNOSIS research program. The Mediator implementation was based on groupware concept-mapping tools that were already in use for indexing multi-media materials (Gaines and Shaw, 1994a). Figure 5.1 shows the GNOSIS project archives being accessed through layered concepts maps. The map in the window at the upper left is a top level "Server Agent" that manages a particular collection of material. In the example shown a local user is accessing material directly through this agent. Remote users connect to the server agent over the network using client agents that give them the same functionality through calls to the server.

Figure 5.1 Accessing the GNOSIS archives through layered concept maps

The concept map at the top left is currently write-disabled, and the cursor has changed to a button as the user mouses over the "Group Photo" node. Clicking at this point will display the photograph in a separate window. The user has already clicked on the node "GNOSIS Final Reports" to open the concept map shown at the lower right. This has a node for each report, and clicking on one will open the appropriate report, in this application using Farallon's Replica. The node at the top left gives access to a series of slides on the project displayed using Replica. A similar node in the original concept map at the top left gives access to a movie on the GNOSIS project that will be opened in Apple's MoviePlayer.

The user has already clicked on the "TW4 Workshop Papers" node in the concept map at the top left, and opened the relevant concept map at the bottom left. She has then clicked on the node "Click here to see the report in Replica", and opened the report visible at the back on the right of Figure 5.1. She has also clicked on the "Click here to see all the movies" node and opened the KWrite document visible behind the concept maps. This displays eight QuickTime movies of various demonstrations given at the Workshop, any of which can be played by double clicking on it.


Contents, Previous Section, Next Section.