Title: LOC Workshop on Electronic Texts Author: Library of Congress Editor: James Daly Language: English Edited by James Daly 9-10 June 1992 Library of Congress Supported by a Grant from the David and Lucile Packard Foundation *** *** *** ****** *** *** *** TABLE OF CONTENTSAcknowledgements Introduction Proceedings Session I. Content in a New Form: Who Will Use It and What Will They Do? Session II. Show and Tell Session III. Distribution, Networks, and Networking: Session IV. Image Capture, Text Capture, Overview of Text and Session V. Approaches to Preparing Electronic Texts Session VI. Copyright Issues Session VII. Conclusion Appendix I: Program Appendix II: Abstracts Appendix III: Directory of Participants *** *** *** ****** *** *** *** Acknowledgements I would like to thank Carl Fleischhauer and Prosser Gifford for the opportunity to learn about areas of human activity unknown to me a scant ten months ago, and the David and Lucile Packard Foundation for supporting that opportunity. The help given by others is acknowledged on a separate page. 19 October 1992 *** *** *** ****** *** *** *** INTRODUCTIONThe Workshop on Electronic Texts (1) drew together representatives of various projects and interest groups to compare ideas, beliefs, experiences, and, in particular, methods of placing and presenting historical textual materials in computerized form. Most attendees gained much in insight and outlook from the event. But the assembly did not form a new nation, or, to put it another way, the diversity of projects and interests was too great to draw the representatives into a cohesive, action-oriented body.(2) Everyone attending the Workshop shared an interest in preserving and providing access to historical texts. But within this broad field the attendees represented a variety of formal, informal, figurative, and literal groups, with many individuals belonging to more than one. These groups may be defined roughly according to the following topics or activities: * Imaging This summary is arranged thematically and does not follow the actual sequence of presentations. NOTES: (2) The Workshop was held at the Library of Congress on 9-10 June PRESERVATION AND IMAGINGPreservation, as that term is used by archivists,(3) was most explicitly discussed in the context of imaging. Anne KENNEY and Lynne PERSONIUS explained how the concept of a faithful copy and the user-friendliness of the traditional book have guided their project at Cornell University.(4) Although interested in computerized dissemination, participants in the Cornell project are creating digital image sets of older books in the public domain as a source for a fresh paper facsimile or, in a future phase, microfilm. The books returned to the library shelves are high-quality and useful replacements on acid-free paper that should last a long time. To date, the Cornell project has placed little or no emphasis on creating searchable texts; one would not be surprised to find that the project participants view such texts as new editions, and thus not as faithful reproductions. In her talk on preservation, Patricia BATTIN struck an ecumenical and flexible note as she endorsed the creation and dissemination of a variety of types of digital copies. Do not be too narrow in defining what counts as a preservation element, BATTIN counseled; for the present, at least, digital copies made with preservation in mind cannot be as narrowly standardized as, say, microfilm copies with the same objective. Setting standards precipitously can inhibit creativity, but delay can result in chaos, she advised. In part, BATTIN's position reflected the unsettled nature of image-format standards, and attendees could hear echoes of this unsettledness in the comments of various speakers. For example, Jean BARONAS reviewed the status of several formal standards moving through committees of experts; and Clifford LYNCH encouraged the use of a new guideline for transmitting document images on Internet. Testimony from participants in the National Agricultural Library's (NAL) Text Digitization Program and LC's American Memory project highlighted some of the challenges to the actual creation or interchange of images, including difficulties in converting preservation microfilm to digital form. Donald WATERS reported on the progress of a master plan for a project at Yale University to convert books on microfilm to digital image sets, Project Open Book (POB). The Workshop offered rather less of an imaging practicum than planned, but "how-to" hints emerge at various points, for example, throughout KENNEY's presentation and in the discussion of arcana such as thresholding and dithering offered by George THOMA and FLEISCHHAUER. NOTES: (4) Titles and affiliations of presenters are given at the THE MACHINE-READABLE TEXT: MARKUP AND USEThe sections of the Workshop that dealt with machine-readable text tended to be more concerned with access and use than with preservation, at least in the narrow technical sense. Michael SPERBERG-McQUEEN made a forceful presentation on the Text Encoding Initiative's (TEI) implementation of the Standard Generalized Markup Language (SGML). His ideas were echoed by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL. While the presentations made by the TEI advocates contained no practicum, their discussion focused on the value of the finished product, what the European Community calls reusability, but what may also be termed durability. They argued that marking up—that is, coding—a text in a well-conceived way will permit it to be moved from one computer environment to another, as well as to be used by various users. Two kinds of markup were distinguished: 1) procedural markup, which describes the features of a text (e.g., dots on a page), and 2) descriptive markup, which describes the structure or elements of a document (e.g., chapters, paragraphs, and front matter). The TEI proponents emphasized the importance of texts to scholarship. They explained how heavily coded (and thus analyzed and annotated) texts can underlie research, play a role in scholarly communication, and facilitate classroom teaching. SPERBERG-McQUEEN reminded listeners that a written or printed item (e.g., a particular edition of a book) is merely a representation of the abstraction we call a text. To concern ourselves with faithfully reproducing a printed instance of the text, SPERBERG-McQUEEN argued, is to concern ourselves with the representation of a representation ("images as simulacra for the text"). The TEI proponents' interest in images tends to focus on corollary materials for use in teaching, for example, photographs of the Acropolis to accompany a Greek text. By the end of the Workshop, SPERBERG-McQUEEN confessed to having been converted to a limited extent to the view that electronic images constitute a promising alternative to microfilming; indeed, an alternative probably superior to microfilming. But he was not convinced that electronic images constitute a serious attempt to represent text in electronic form. HOCKEY and MYLONAS also conceded that their experience at the Pierce Symposium the previous week at Georgetown University and the present conference at the Library of Congress had compelled them to reevaluate their perspective on the usefulness of text as images. Attendees could see that the text and image advocates were in constructive tension, so to say. Three nonTEI presentations described approaches to preparing machine-readable text that are less rigorous and thus less expensive. In the case of the Papers of George Washington, Dorothy TWOHIG explained that the digital version will provide a not-quite-perfect rendering of the transcribed text—some 135,000 documents, available for research during the decades while the perfect or print version is completed. Members of the American Memory team and the staff of NAL's Text Digitization Program (see below) also outlined a middle ground concerning searchable texts. In the case of American Memory, contractors produce texts with about 99-percent accuracy that serve as "browse" or "reference" versions of written or printed originals. End users who need faithful copies or perfect renditions must refer to accompanying sets of digital facsimile images or consult copies of the originals in a nearby library or archive. American Memory staff argued that the high cost of producing 100-percent accurate copies would prevent LC from offering access to large parts of its collections. THE MACHINE-READABLE TEXT: METHODS OF CONVERSIONAlthough the Workshop did not include a systematic examination of the methods for converting texts from paper (or from facsimile images) into machine-readable form, nevertheless, various speakers touched upon this matter. For example, WEIBEL reported that OCLC has experimented with a merging of multiple optical character recognition systems that will reduce errors from an unacceptable rate of 5 characters out of every l,000 to an unacceptable rate of 2 characters out of every l,000. Pamela ANDRE presented an overview of NAL's Text Digitization Program and Judith ZIDAR discussed the technical details. ZIDAR explained how NAL purchased hardware and software capable of performing optical character recognition (OCR) and text conversion and used its own staff to convert texts. The process, ZIDAR said, required extensive editing and project staff found themselves considering alternatives, including rekeying and/or creating abstracts or summaries of texts. NAL reckoned costs at $7 per page. By way of contrast, Ricky ERWAY explained that American Memory had decided from the start to contract out conversion to external service bureaus. The criteria used to select these contractors were cost and quality of results, as opposed to methods of conversion. ERWAY noted that historical documents or books often do not lend themselves to OCR. Bound materials represent a special problem. In her experience, quality control—inspecting incoming materials, counting errors in samples—posed the most time-consuming aspect of contracting out conversion. ERWAY reckoned American Memory's costs at $4 per page, but cautioned that fewer cost-elements had been included than in NAL's figure. OPTIONS FOR DISSEMINATIONThe topic of dissemination proper emerged at various points during the Workshop. At the session devoted to national and international computer networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG highlighted the virtues of Internet today and of the network that will evolve from Internet. Listeners could discern in these narratives a vision of an information democracy in which millions of citizens freely find and use what they need. LYNCH noted that a lack of standards inhibits disseminating multimedia on the network, a topic also discussed by BESSER. LARSEN addressed the issues of network scalability and modularity and commented upon the difficulty of anticipating the effects of growth in orders of magnitude. BROWNRIGG talked about the ability of packet radio to provide certain links in a network without the need for wiring. However, the presenters also called attention to the shortcomings and incongruities of present-day computer networks. For example: 1) Network use is growing dramatically, but much network traffic consists of personal communication (E-mail). 2) Large bodies of information are available, but a user's ability to search across their entirety is limited. 3) There are significant resources for science and technology, but few network sources provide content in the humanities. 4) Machine-readable texts are commonplace, but the capability of the system to deal with images (let alone other media formats) lags behind. A glimpse of a multimedia future for networks, however, was provided by Maria LEBRON in her overview of the Online Journal of Current Clinical Trials (OJCCT), and the process of scholarly publishing on-line. The contrasting form of the CD-ROM disk was never systematically analyzed, but attendees could glean an impression from several of the show-and-tell presentations. The Perseus and American Memory examples demonstrated recently published disks, while the descriptions of the IBYCUS version of the Papers of George Washington and Chadwyck-Healey's Patrologia Latina Database (PLD) told of disks to come. According to Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul Migne's definitive collection of Latin texts to machine-readable form. Although everyone could share the network advocates' enthusiasm for an on-line future, the possibility of rolling up one's sleeves for a session with a CD-ROM containing both textual materials and a powerful retrieval engine made the disk seem an appealing vessel indeed. The overall discussion suggested that the transition from CD-ROM to on-line networked access may prove far slower and more difficult than has been anticipated. WHO ARE THE USERS AND WHAT DO THEY DO?Although concerned with the technicalities of production, the Workshop never lost sight of the purposes and uses of electronic versions of textual materials. As noted above, those interested in imaging discussed the problematical matter of digital preservation, while the TEI proponents described how machine-readable texts can be used in research. This latter topic received thorough treatment in the paper read by Avra MICHELSON. She placed the phenomenon of electronic texts within the context of broader trends in information technology and scholarly communication. Among other things, MICHELSON described on-line conferences that represent a vigorous and important intellectual forum for certain disciplines. Internet now carries more than 700 conferences, with about 80 percent of these devoted to topics in the social sciences and the humanities. Other scholars use on-line networks for "distance learning." Meanwhile, there has been a tremendous growth in end-user computing; professors today are less likely than their predecessors to ask the campus computer center to process their data. Electronic texts are one key to these sophisticated applications, MICHELSON reported, and more and more scholars in the humanities now work in an on-line environment. Toward the end of the Workshop, Michael LESK presented a corollary to MICHELSON's talk, reporting the results of an experiment that compared the work of one group of chemistry students using traditional printed texts and two groups using electronic sources. The experiment demonstrated that in the event one does not know what to read, one needs the electronic systems; the electronic systems hold no advantage at the moment if one knows what to read, but neither do they impose a penalty. DALY provided an anecdotal account of the revolutionizing impact of the new technology on his previous methods of research in the field of classics. His account, by extrapolation, served to illustrate in part the arguments made by MICHELSON concerning the positive effects of the sudden and radical transformation being wrought in the ways scholars work. Susan VECCIA and Joanne FREEMAN delineated the use of electronic materials outside the university. The most interesting aspect of their use, FREEMAN said, could be seen as a paradox: teachers in elementary and secondary schools requested access to primary source materials but, at the same time, found that "primariness" itself made these materials difficult for their students to use. OTHER TOPICSMarybeth PETERS reviewed copyright law in the United States and offered advice during a lively discussion of this subject. But uncertainty remains concerning the price of copyright in a digital medium, because a solution remains to be worked out concerning management and synthesis of copyrighted and out-of-copyright pieces of a database. As moderator of the final session of the Workshop, Prosser GIFFORD directed discussion to future courses of action and the potential role of LC in advancing them. Among the recommendations that emerged were the following: * Workshop participants should 1) begin to think about working with image material, but structure and digitize it in such a way that at a later stage it can be interpreted into text, and 2) find a common way to build text and images together so that they can be used jointly at some stage in the future, with appropriate network support, because that is how users will want to access these materials. The Library might encourage attempts to bring together people who are working on texts and images. * A network version of American Memory should be developed or consideration should be given to making the data in it available to people interested in doing network multimedia. Given the current dearth of digital data that is appealing and unencumbered by extremely complex rights problems, developing a network version of American Memory could do much to help make network multimedia a reality. * Concerning the thorny issue of electronic deposit, LC should initiate a catalytic process in terms of distributed responsibility, that is, bring together the distributed organizations and set up a study group to look at all the issues related to electronic deposit and see where we as a nation should move. For example, LC might attempt to persuade one major library in each state to deal with its state equivalent publisher, which might produce a cooperative project that would be equitably distributed around the country, and one in which LC would be dealing with a minimal number of publishers and minimal copyright problems. LC must also deal with the concept of on-line publishing, determining, among other things, how serials such as OJCCT might be deposited for copyright. * Since a number of projects are planning to carry out preservation by creating digital images that will end up in on-line or near-line storage at some institution, LC might play a helpful role, at least in the near term, by accelerating how to catalog that information into the Research Library Information Network (RLIN) and then into OCLC, so that it would be accessible. This would reduce the possibility of multiple institutions digitizing the same work. CONCLUSIONThe Workshop was valuable because it brought together partisans from various groups and provided an occasion to compare goals and methods. The more committed partisans frequently communicate with others in their groups, but less often across group boundaries. The Workshop was also valuable to attendees—including those involved with American Memory—who came less committed to particular approaches or concepts. These attendees learned a great deal, and plan to select and employ elements of imaging, text-coding, and networked distribution that suit their respective projects and purposes. Still, reality rears its ugly head: no breakthrough has been achieved. On the imaging side, one confronts a proliferation of competing data-interchange standards and a lack of consensus on the role of digital facsimiles in preservation. In the realm of machine-readable texts, one encounters a reasonably mature standard but methodological difficulties and high costs. These latter problems, of course, represent a special impediment to the desire, as it is sometimes expressed in the popular press, "to put the [contents of the] Library of Congress on line." In the words of one participant, there was "no solution to the economic problems—the projects that are out there are surviving, but it is going to be a lot of work to transform the information industry, and so far the investment to do that is not forthcoming" (LESK, per litteras). *** *** *** ****** *** *** *** PROCEEDINGSWELCOME+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ GIFFORD * Origin of Workshop in current Librarian's desire to make LC's collections more widely available * Desiderata arising from the prospect of greater interconnectedness * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ After welcoming participants on behalf of the Library of Congress, American Memory (AM), and the National Demonstration Lab, Prosser GIFFORD, director for scholarly programs, Library of Congress, located the origin of the Workshop on Electronic Texts in a conversation he had had considerably more than a year ago with Carl FLEISCHHAUER concerning some of the issues faced by AM. On the assumption that numerous other people were asking the same questions, the decision was made to bring together as many of these people as possible to ask the same questions together. In a deeper sense, GIFFORD said, the origin of the Workshop lay in the desire of the current Librarian of Congress, James H. Billington, to make the collections of the Library, especially those offering unique or unusual testimony on aspects of the American experience, available to a much wider circle of users than those few people who can come to Washington to use them. This meant that the emphasis of AM, from the outset, has been on archival collections of the basic material, and on making these collections themselves available, rather than selected or heavily edited products. From AM's emphasis followed the questions with which the Workshop began: who will use these materials, and in what form will they wish to use them. But an even larger issue deserving mention, in GIFFORD's view, was the phenomenal growth in Internet connectivity. He expressed the hope that the prospect of greater interconnectedness than ever before would lead to: 1) much more cooperative and mutually supportive endeavors; 2) development of systems of shared and distributed responsibilities to avoid duplication and to ensure accuracy and preservation of unique materials; and 3) agreement on the necessary standards and development of the appropriate directories and indices to make navigation straightforward among the varied resources that are, and increasingly will be, available. In this connection, GIFFORD requested that participants reflect from the outset upon the sorts of outcomes they thought the Workshop might have. Did those present constitute a group with sufficient common interests to propose a next step or next steps, and if so, what might those be? They would return to these questions the following afternoon. ****** +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ FLEISCHHAUER * Core of Workshop concerns preparation and production of materials * Special challenge in conversion of textual materials * Quality versus quantity * Do the several groups represented share common interests? * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress, emphasized that he would attempt to represent the people who perform some of the work of converting or preparing materials and that the core of the Workshop had to do with preparation and production. FLEISCHHAUER then drew a distinction between the long term, when many things would be available and connected in the ways that GIFFORD described, and the short term, in which AM not only has wrestled with the issue of what is the best course to pursue but also has faced a variety of technical challenges. FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library formats, such as motion picture collections, sound-recording collections, and pictorial collections of various sorts, especially collections of photographs. In the course of these efforts, AM kept coming back to textual materials—manuscripts or rare printed matter, bound materials, etc. Text posed the greatest conversion challenge of all. Thus, the genesis of the Workshop, which reflects the problems faced by AM. These problems include physical problems. For example, those in the library and archive business deal with collections made up of fragile and rare manuscript items, bound materials, especially the notoriously brittle bound materials of the late nineteenth century. These are precious cultural artifacts, however, as well as interesting sources of information, and LC desires to retain and conserve them. AM needs to handle things without damaging them. Guillotining a book to run it through a sheet feeder must be avoided at all costs. Beyond physical problems, issues pertaining to quality arose. For example, the desire to provide users with a searchable text is affected by the question of acceptable level of accuracy. One hundred percent accuracy is tremendously expensive. On the other hand, the output of optical character recognition (OCR) can be tremendously inaccurate. Although AM has attempted to find a middle ground, uncertainty persists as to whether or not it has discovered the right solution. Questions of quality arose concerning images as well. FLEISCHHAUER contrasted the extremely high level of quality of the digital images in the Cornell Xerox Project with AM's efforts to provide a browse-quality or access-quality image, as opposed to an archival or preservation image. FLEISCHHAUER therefore welcomed the opportunity to compare notes. FLEISCHHAUER observed in passing that conversations he had had about networks have begun to signal that for various forms of media a determination may be made that there is a browse-quality item, or a distribution-and-access-quality item that may coexist in some systems with a higher quality archival item that would be inconvenient to send through the network because of its size. FLEISCHHAUER referred, of course, to images more than to searchable text. As AM considered those questions, several conceptual issues arose: ought AM occasionally to reproduce materials entirely through an image set, at other times, entirely through a text set, and in some cases, a mix? There probably would be times when the historical authenticity of an artifact would require that its image be used. An image might be desirable as a recourse for users if one could not provide 100-percent accurate text. Again, AM wondered, as a practical matter, if a distinction could be drawn between rare printed matter that might exist in multiple collections—that is, in ten or fifteen libraries. In such cases, the need for perfect reproduction would be less than for unique items. Implicit in his remarks, FLEISCHHAUER conceded, was the admission that AM has been tilting strongly towards quantity and drawing back a little from perfect quality. That is, it seemed to AM that society would be better served if more things were distributed by LC—even if they were not quite perfect—than if fewer things, perfectly represented, were distributed. This was stated as a proposition to be tested, with responses to be gathered from users. In thinking about issues related to reproduction of materials and seeing other people engaged in parallel activities, AM deemed it useful to convene a conference. Hence, the Workshop. FLEISCHHAUER thereupon surveyed the several groups represented: 1) the world of images (image users and image makers); 2) the world of text and scholarship and, within this group, those concerned with language—FLEISCHHAUER confessed to finding delightful irony in the fact that some of the most advanced thinkers on computerized texts are those dealing with ancient Greek and Roman materials; 3) the network world; and 4) the general world of library science, which includes people interested in preservation and cataloging. FLEISCHHAUER concluded his remarks with special thanks to the David and Lucile Packard Foundation for its support of the meeting, the American Memory group, the Office for Scholarly Programs, the National Demonstration Lab, and the Office of Special Events. He expressed the hope that David Woodley Packard might be able to attend, noting that Packard's work and the work of the foundation had sponsored a number of projects in the text area. ****** SESSION I. CONTENT IN A NEW FORM: WHO WILL USE IT AND WHAT WILL THEY DO?+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ DALY * Acknowledgements * A new Latin authors disk * Effects of the new technology on previous methods of research * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Serving as moderator, James DALY acknowledged the generosity of all the presenters for giving of their time, counsel, and patience in planning the Workshop, as well as of members of the American Memory project and other Library of Congress staff, and the David and Lucile Packard Foundation and its executive director, Colburn S. Wilbur. DALY then recounted his visit in March to the Center for Electronic Texts in the Humanities (CETH) and the Department of Classics at Rutgers University, where an old friend, Lowell Edmunds, introduced him to the department's IBYCUS scholarly personal computer, and, in particular, the new Latin CD-ROM, containing, among other things, almost all classical Latin literary texts through A.D. 200. Packard Humanities Institute (PHI), Los Altos, California, released this disk late in 1991, with a nominal triennial licensing fee. Playing with the disk for an hour or so at Rutgers brought home to DALY at once the revolutionizing impact of the new technology on his previous methods of research. Had this disk been available two or three years earlier, DALY contended, when he was engaged in preparing a commentary on Book 10 of Virgil's Aeneid for Cambridge University Press, he would not have required a forty-eight-square-foot table on which to spread the numerous, most frequently consulted items, including some ten or twelve concordances to key Latin authors, an almost equal number of lexica to authors who lacked concordances, and where either lexica or concordances were lacking, numerous editions of authors antedating and postdating Virgil. Nor, when checking each of the average six to seven words contained in the Virgilian hexameter for its usage elsewhere in Virgil's works or other Latin authors, would DALY have had to maintain the laborious mechanical process of flipping through these concordances, lexica, and editions each time. Nor would he have had to frequent as often the Milton S. Eisenhower Library at the Johns Hopkins University to consult the Thesaurus Linguae Latinae. Instead of devoting countless hours, or the bulk of his research time, to gathering data concerning Virgil's use of words, DALY—now freed by PHI's Latin authors disk from the tyrannical, yet in some ways paradoxically happy scholarly drudgery— would have been able to devote that same bulk of time to analyzing and interpreting Virgilian verbal usage. Citing Theodore Brunner, Gregory Crane, Elli MYLONAS, and Avra MICHELSON, DALY argued that this reversal in his style of work, made possible by the new technology, would perhaps have resulted in better, more productive research. Indeed, even in the course of his browsing the Latin authors disk at Rutgers, its powerful search, retrieval, and highlighting capabilities suggested to him several new avenues of research into Virgil's use of sound effects. This anecdotal account, DALY maintained, may serve to illustrate in part the sudden and radical transformation being wrought in the ways scholars work. ****** ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ MICHELSON * Elements related to scholarship and technology * Electronic texts within the context of broader trends within information technology and scholarly communication * Evaluation of the prospects for the use of electronic texts * Relationship of electronic texts to processes of scholarly communication in humanities research * New exchange formats created by scholars * Projects initiated to increase scholarly access to converted text * Trend toward making electronic resources available through research and education networks * Changes taking place in scholarly communication among humanities scholars * Network-mediated scholarship transforming traditional scholarly practices * Key information technology trends affecting the conduct of scholarly communication over the next decade * The trend toward end-user computing * The trend toward greater connectivity * Effects of these trends * Key transformations taking place * Summary of principal arguments * ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Avra MICHELSON, Archival Research and Evaluation Staff, National Archives and Records Administration (NARA), argued that establishing who will use electronic texts and what they will use them for involves a consideration of both information technology and scholarship trends. This consideration includes several elements related to scholarship and technology: 1) the key trends in information technology that are most relevant to scholarship; 2) the key trends in the use of currently available technology by scholars in the nonscientific community; and 3) the relationship between these two very distinct but interrelated trends. The investment in understanding this relationship being made by information providers, technologists, and public policy developers, as well as by scholars themselves, seems to be pervasive and growing, MICHELSON contended. She drew on collaborative work with Jeff Rothenberg on the scholarly use of technology. MICHELSON sought to place the phenomenon of electronic texts within the context of broader trends within information technology and scholarly communication. She argued that electronic texts are of most use to researchers to the extent that the researchers' working context (i.e., their relevant bibliographic sources, collegial feedback, analytic tools, notes, drafts, etc.), along with their field's primary and secondary sources, also is accessible in electronic form and can be integrated in ways that are unique to the on-line environment. Evaluation of the prospects for the use of electronic texts includes two elements: 1) an examination of the ways in which researchers currently are using electronic texts along with other electronic resources, and 2) an analysis of key information technology trends that are affecting the long-term conduct of scholarly communication. MICHELSON limited her discussion of the use of electronic texts to the practices of humanists and noted that the scientific community was outside the panel's overview. MICHELSON examined the nature of the current relationship of electronic texts in particular, and electronic resources in general, to what she maintained were, essentially, five processes of scholarly communication in humanities research. Researchers 1) identify sources, 2) communicate with their colleagues, 3) interpret and analyze data, 4) disseminate their research findings, and 5) prepare curricula to instruct the next generation of scholars and students. This examination would produce a clearer understanding of the synergy among these five processes that fuels the tendency of the use of electronic resources for one process to stimulate its use for other processes of scholarly communication. For the first process of scholarly communication, the identification of sources, MICHELSON remarked the opportunity scholars now enjoy to supplement traditional word-of-mouth searches for sources among their colleagues with new forms of electronic searching. So, for example, instead of having to visit the library, researchers are able to explore descriptions of holdings in their offices. Furthermore, if their own institutions' holdings prove insufficient, scholars can access more than 200 major American library catalogues over Internet, including the universities of California, Michigan, Pennsylvania, and Wisconsin. Direct access to the bibliographic databases offers intellectual empowerment to scholars by presenting a comprehensive means of browsing through libraries from their homes and offices at their convenience. The second process of communication involves communication among scholars. Beyond the most common methods of communication, scholars are using E-mail and a variety of new electronic communications formats derived from it for further academic interchange. E-mail exchanges are growing at an astonishing rate, reportedly 15 percent a month. They currently constitute approximately half the traffic on research and education networks. Moreover, the global spread of E-mail has been so rapid that it is now possible for American scholars to use it to communicate with colleagues in close to 140 other countries. Other new exchange formats created by scholars and operating on Internet include more than 700 conferences, with about 80 percent of these devoted to topics in the social sciences and humanities. The rate of growth of these scholarly electronic conferences also is astonishing. From l990 to l991, 200 new conferences were identified on Internet. From October 1991 to June 1992, an additional 150 conferences in the social sciences and humanities were added to this directory of listings. Scholars have established conferences in virtually every field, within every different discipline. For example, there are currently close to 600 active social science and humanities conferences on topics such as art and architecture, ethnomusicology, folklore, Japanese culture, medical education, and gifted and talented education. The appeal to scholars of communicating through these conferences is that, unlike any other medium, electronic conferences today provide a forum for global communication with peers at the front end of the research process. Interpretation and analysis of sources constitutes the third process of scholarly communication that MICHELSON discussed in terms of texts and textual resources. The methods used to analyze sources fall somewhere on a continuum from quantitative analysis to qualitative analysis. Typically, evidence is culled and evaluated using methods drawn from both ends of this continuum. At one end, quantitative analysis involves the use of mathematical processes such as a count of frequencies and distributions of occurrences or, on a higher level, regression analysis. At the other end of the continuum, qualitative analysis typically involves nonmathematical processes oriented toward language interpretation or the building of theory. Aspects of this work involve the processing—either manual or computational—of large and sometimes massive amounts of textual sources, although the use of nontextual sources as evidence, such as photographs, sound recordings, film footage, and artifacts, is significant as well. Scholars have discovered that many of the methods of interpretation and analysis that are related to both quantitative and qualitative methods are processes that can be performed by computers. For example, computers can count. They can count brush strokes used in a Rembrandt painting or perform regression analysis for understanding cause and effect. By means of advanced technologies, computers can recognize patterns, analyze text, and model concepts. Furthermore, computers can complete these processes faster with more sources and with greater precision than scholars who must rely on manual interpretation of data. But if scholars are to use computers for these processes, source materials must be in a form amenable to computer-assisted analysis. For this reason many scholars, once they have identified the sources that are key to their research, are converting them to machine-readable form. Thus, a representative example of the numerous textual conversion projects organized by scholars around the world in recent years to support computational text analysis is the TLG, the Thesaurus Linguae Graecae. This project is devoted to converting the extant ancient texts of classical Greece. (Editor's note: according to the TLG Newsletter of May l992, TLG was in use in thirty-two different countries. This figure updates MICHELSON's previous count by one.) The scholars performing these conversions have been asked to recognize that the electronic sources they are converting for one use possess value for other research purposes as well. As a result, during the past few years, humanities scholars have initiated a number of projects to increase scholarly access to converted text. So, for example, the Text Encoding Initiative (TEI), about which more is said later in the program, was established as an effort by scholars to determine standard elements and methods for encoding machine-readable text for electronic exchange. In a second effort to facilitate the sharing of converted text, scholars have created a new institution, the Center for Electronic Texts in the Humanities (CETH). The center estimates that there are 8,000 series of source texts in the humanities that have been converted to machine-readable form worldwide. CETH is undertaking an international search for converted text in the humanities, compiling it into an electronic library, and preparing bibliographic descriptions of the sources for the Research Libraries Information Network's (RLIN) machine-readable data file. The library profession has begun to initiate large conversion projects as well, such as American Memory. While scholars have been making converted text available to one another, typically on disk or on CD-ROM, the clear trend is toward making these resources available through research and education networks. Thus, the American and French Research on the Treasury of the French Language (ARTFL) and the Dante Project are already available on Internet. MICHELSON summarized this section on interpretation and analysis by noting that: 1) increasing numbers of humanities scholars in the library community are recognizing the importance to the advancement of scholarship of retrospective conversion of source materials in the arts and humanities; and 2) there is a growing realization that making the sources available on research and education networks maximizes their usefulness for the analysis performed by humanities scholars. The fourth process of scholarly communication is dissemination of research findings, that is, publication. Scholars are using existing research and education networks to engineer a new type of publication: scholarly-controlled journals that are electronically produced and disseminated. Although such journals are still emerging as a communication format, their number has grown, from approximately twelve to thirty-six during the past year (July 1991 to June 1992). Most of these electronic scholarly journals are devoted to topics in the humanities. As with network conferences, scholarly enthusiasm for these electronic journals stems from the medium's unique ability to advance scholarship in a way that no other medium can do by supporting global feedback and interchange, practically in real time, early in the research process. Beyond scholarly journals, MICHELSON remarked the delivery of commercial full-text products, such as articles in professional journals, newsletters, magazines, wire services, and reference sources. These are being delivered via on-line local library catalogues, especially through CD-ROMs. Furthermore, according to MICHELSON, there is general optimism that the copyright and fees issues impeding the delivery of full text on existing research and education networks soon will be resolved. The final process of scholarly communication is curriculum development and instruction, and this involves the use of computer information technologies in two areas. The first is the development of computer-oriented instructional tools, which includes simulations, multimedia applications, and computer tools that are used to assist in the analysis of sources in the classroom, etc. The Perseus Project, a database that provides a multimedia curriculum on classical Greek civilization, is a good example of the way in which entire curricula are being recast using information technologies. It is anticipated that the current difficulty in exchanging electronically computer-based instructional software, which in turn makes it difficult for one scholar to build upon the work of others, will be resolved before too long. Stand-alone curricular applications that involve electronic text will be sharable through networks, reinforcing their significance as intellectual products as well as instructional tools. The second aspect of electronic learning involves the use of research and education networks for distance education programs. Such programs interactively link teachers with students in geographically scattered locations and rely on the availability of electronic instructional resources. Distance education programs are gaining wide appeal among state departments of education because of their demonstrated capacity to bring advanced specialized course work and an array of experts to many classrooms. A recent report found that at least 32 states operated at least one statewide network for education in 1991, with networks under development in many of the remaining states. MICHELSON next turned to the second element of the framework she proposed at the outset of her talk for evaluating the prospects for electronic text, namely the key information technology trends affecting the conduct of scholarly communication over the next decade: 1) end-user computing and 2) connectivity. End-user computing means that the person touching the keyboard, or performing computations, is the same as the person who initiates or consumes the computation. The emergence of personal computers, along with a host of other forces, such as ubiquitous computing, advances in interface design, and the on-line transition, is prompting the consumers of computation to do their own computing, and is thus rendering obsolete the traditional distinction between end users and ultimate users. The trend toward end-user computing is significant to consideration of the prospects for electronic texts because it means that researchers are becoming more adept at doing their own computations and, thus, more competent in the use of electronic media. By avoiding programmer intermediaries, computation is becoming central to the researcher's thought process. This direct involvement in computing is changing the researcher's perspective on the nature of research itself, that is, the kinds of questions that can be posed, the analytical methodologies that can be used, the types and amount of sources that are appropriate for analyses, and the form in which findings are presented. The trend toward end-user computing means that, increasingly, electronic media and computation are being infused into all processes of humanities scholarship, inspiring remarkable transformations in scholarly communication. The trend toward greater connectivity suggests that researchers are using computation increasingly in network environments. Connectivity is important to scholarship because it erases the distance that separates students from teachers and scholars from their colleagues, while allowing users to access remote databases, share information in many different media, connect to their working context wherever they are, and collaborate in all phases of research. The combination of the trend toward end-user computing and the trend toward connectivity suggests that the scholarly use of electronic resources, already evident among some researchers, will soon become an established feature of scholarship. The effects of these trends, along with ongoing changes in scholarly practices, point to a future in which humanities researchers will use computation and electronic communication to help them formulate ideas, access sources, perform research, collaborate with colleagues, seek peer review, publish and disseminate results, and engage in many other professional and educational activities. In summary, MICHELSON emphasized four points: 1) A portion of humanities scholars already consider electronic texts the preferred format for analysis and dissemination. 2) Scholars are using these electronic texts, in conjunction with other electronic resources, in all the processes of scholarly communication. 3) The humanities scholars' working context is in the process of changing from print technology to electronic technology, in many ways mirroring transformations that have occurred or are occurring within the scientific community. 4) These changes are occurring in conjunction with the development of a new communication medium: research and education networks that are characterized by their capacity to advance scholarship in a wholly unique way. MICHELSON also reiterated her three principal arguments: l) Electronic texts are best understood in terms of the relationship to other electronic resources and the growing prominence of network-mediated scholarship. 2) The prospects for electronic texts lie in their capacity to be integrated into the on-line network of electronic resources that comprise the new working context for scholars. 3) Retrospective conversion of portions of the scholarly record should be a key strategy as information providers respond to changes in scholarly communication practices. ****** +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ VECCIA * AM's evaluation project and public users of electronic resources * AM and its design * Site selection and evaluating the Macintosh implementation of AM * Characteristics of the six public libraries selected * Characteristics of AM's users in these libraries * Principal ways AM is being used * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Susan VECCIA, team leader, and Joanne FREEMAN, associate coordinator, American Memory, Library of Congress, gave a joint presentation. First, by way of introduction, VECCIA explained her and FREEMAN's roles in American Memory (AM). Serving principally as an observer, VECCIA has assisted with the evaluation project of AM, placing AM collections in a variety of different sites around the country and helping to organize and implement that project. FREEMAN has been an associate coordinator of AM and has been involved principally with the interpretative materials, preparing some of the electronic exhibits and printed historical information that accompanies AM and that is requested by users. VECCIA and FREEMAN shared anecdotal observations concerning AM with public users of electronic resources. Notwithstanding a fairly structured evaluation in progress, both VECCIA and FREEMAN chose not to report on specifics in terms of numbers, etc., because they felt it was too early in the evaluation project to do so. AM is an electronic archive of primary source materials from the Library of Congress, selected collections representing a variety of formats— photographs, graphic arts, recorded sound, motion pictures, broadsides, and soon, pamphlets and books. In terms of the design of this system, the interpretative exhibits have been kept separate from the primary resources, with good reason. Accompanying this collection are printed documentation and user guides, as well as guides that FREEMAN prepared for teachers so that they may begin using the content of the system at once. VECCIA described the evaluation project before talking about the public users of AM, limiting her remarks to public libraries, because FREEMAN would talk more specifically about schools from kindergarten to twelfth grade (K-12). Having started in spring 1991, the evaluation currently involves testing of the Macintosh implementation of AM. Since the primary goal of this evaluation is to determine the most appropriate audience or audiences for AM, very different sites were selected. This makes evaluation difficult because of the varying degrees of technology literacy among the sites. AM is situated in forty-four locations, of which six are public libraries and sixteen are schools. Represented among the schools are elementary, junior high, and high schools. District offices also are involved in the evaluation, which will conclude in summer 1993. VECCIA focused the remainder of her talk on the six public libraries, one of which doubles as a state library. They represent a range of geographic areas and a range of demographic characteristics. For example, three are located in urban settings, two in rural settings, and one in a suburban setting. A range of technical expertise is to be found among these facilities as well. For example, one is an "Apple library of the future," while two others are rural one-room libraries—in one, AM sits at the front desk next to a tractor manual. All public libraries have been extremely enthusiastic, supportive, and appreciative of the work that AM has been doing. VECCIA characterized various users: Most users in public libraries describe themselves as general readers; of the students who use AM in the public libraries, those in fourth grade and above seem most interested. Public libraries in rural sites tend to attract retired people, who have been highly receptive to AM. Users tend to fall into two additional categories: people interested in the content and historical connotations of these primary resources, and those fascinated by the technology. The format receiving the most comments has been motion pictures. The adult users in public libraries are more comfortable with IBM computers, whereas young people seem comfortable with either IBM or Macintosh, although most of them seem to come from a Macintosh background. This same tendency is found in the schools. What kinds of things do users do with AM? In a public library there are two main goals or ways that AM is being used: as an individual learning tool, and as a leisure activity. Adult learning was one area that VECCIA would highlight as a possible application for a tool such as AM. She described a patron of a rural public library who comes in every day on his lunch hour and literally reads AM, methodically going through the collection image by image. At the end of his hour he makes an electronic bookmark, puts it in his pocket, and returns to work. The next day he comes in and resumes where he left off. Interestingly, this man had never been in the library before he used AM. In another small, rural library, the coordinator reports that AM is a popular activity for some of the older, retired people in the community, who ordinarily would not use "those things,"—computers. Another example of adult learning in public libraries is book groups, one of which, in particular, is using AM as part of its reading on industrialization, integration, and urbanization in the early 1900s. One library reports that a family is using AM to help educate their children. In another instance, individuals from a local museum came in to use AM to prepare an exhibit on toys of the past. These two examples emphasize the mission of the public library as a cultural institution, reaching out to people who do not have the same resources available to those who live in a metropolitan area or have access to a major library. One rural library reports that junior high school students in large numbers came in one afternoon to use AM for entertainment. A number of public libraries reported great interest among postcard collectors in the Detroit collection, which was essentially a collection of images used on postcards around the turn of the century. Train buffs are similarly interested because that was a time of great interest in railroading. People, it was found, relate to things that they know of firsthand. For example, in both rural public libraries where AM was made available, observers reported that the older people with personal remembrances of the turn of the century were gravitating to the Detroit collection. These examples served to underscore MICHELSON's observation re the integration of electronic tools and ideas—that people learn best when the material relates to something they know. VECCIA made the final point that in many cases AM serves as a public-relations tool for the public libraries that are testing it. In one case, AM is being used as a vehicle to secure additional funding for the library. In another case, AM has served as an inspiration to the staff of a major local public library in the South to think about ways to make its own collection of photographs more accessible to the public. ****** +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ FREEMAN * AM and archival electronic resources in a school environment * Questions concerning context * Questions concerning the electronic format itself * Computer anxiety * Access and availability of the system * Hardware * Strengths gained through the use of archival resources in schools * +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Reiterating an observation made by VECCIA, that AM is an archival resource made up of primary materials with very little interpretation, FREEMAN stated that the project has attempted to bridge the gap between these bare primary materials and a school environment, and in that cause has created guided introductions to AM collections. Loud demand from the educational community, chiefly from teachers working with the upper grades of elementary school through high school, greeted the announcement that AM would be tested around the country. FREEMAN reported not only on what was learned about AM in a school environment, but also on several universal questions that were raised concerning archival electronic resources in schools. She discussed several strengths of this type of material in a school environment as opposed to a highly structured resource that offers a limited number of paths to follow. FREEMAN first raised several questions about using AM in a school environment. There is often some difficulty in developing a sense of what the system contains. Many students sit down at a computer resource and assume that, because AM comes from the Library of Congress, all of American history is now at their fingertips. As a result of that sort of mistaken judgment, some students are known to conclude that AM contains nothing of use to them when they look for one or two things and do not find them. It is difficult to discover that middle ground where one has a sense of what the system contains. Some students grope toward the idea of an archive, a new idea to them, since they have not previously experienced what it means to have access to a vast body of somewhat random information. Other questions raised by FREEMAN concerned the electronic format itself. For instance, in a school environment it is often difficult both for teachers and students to gain a sense of what it is they are viewing. They understand that it is a visual image, but they do not necessarily know that it is a postcard from the turn of the century, a panoramic photograph, or even machine-readable text of an eighteenth-century broadside, a twentieth-century printed book, or a nineteenth-century diary. That distinction is often difficult for people in a school environment to grasp. Because of that, it occasionally becomes difficult to draw conclusions from what one is viewing. FREEMAN also noted the obvious fear of the computer, which constitutes a difficulty in using an electronic resource. Though students in general did not suffer from this anxiety, several older students feared that they were computer-illiterate, an assumption that became self-fulfilling when they searched for something but failed to find it. FREEMAN said she believed that some teachers also fear computer resources, because they believe they lack complete control. FREEMAN related the example of teachers shooing away students because it was not their time to use the system. This was a case in which the situation had to be extremely structured so that the teachers would not feel that they had lost their grasp on what the system contained. A final question raised by FREEMAN concerned access and availability of the system. She noted the occasional existence of a gap in communication between school librarians and teachers. Often AM sits in a school library and the librarian is the person responsible for monitoring the system. Teachers do not always take into their world new library resources about which the librarian is excited. Indeed, at the sites where AM had been used most effectively within a library, the librarian was required to go to specific teachers and instruct them in its use. As a result, several AM sites will have in-service sessions over a summer, in the hope that perhaps, with a more individualized link, teachers will be more likely to use the resource. A related issue in the school context concerned the number of workstations available at any one location. Centralization of equipment at the district level, with teachers invited to download things and walk away with them, proved unsuccessful because the hours these offices were open were also school hours. Another issue was hardware. As VECCIA observed, a range of sites exists, some technologically advanced and others essentially acquiring their first computer for the primary purpose of using it in conjunction with AM's testing. Users at technologically sophisticated sites want even more sophisticated hardware, so that they can perform even more sophisticated tasks with the materials in AM. But once they acquire a newer piece of hardware, they must learn how to use that also; at an unsophisticated site it takes an extremely long time simply to become accustomed to the computer, not to mention the program offered with the computer. All of these small issues raise one large question, namely, are systems like AM truly rewarding in a school environment, or do they simply act as innovative toys that do little more than spark interest? FREEMAN contended that the evaluation project has revealed several strengths that were gained through the use of archival resources in schools, including: * Psychic rewards from using AM as a vast, rich database, with teachers assigning various projects to students—oral presentations, written reports, a documentary, a turn-of-the-century newspaper— projects that start with the materials in AM but are completed using other resources; AM thus is used as a research tool in conjunction with other electronic resources, as well as with books and items in the library where the system is set up. * Students are acquiring computer literacy in a humanities context. * This sort of system is overcoming the isolation between disciplines that often exists in schools. For example, many English teachers are requiring their students to write papers on historical topics represented in AM. Numerous teachers have reported that their students are learning critical thinking skills using the system. * On a broader level, AM is introducing primary materials, not only to students but also to teachers, in an environment where often simply none exist—an exciting thing for the students because it helps them learn to conduct research, to interpret, and to draw their own conclusions. In learning to conduct research and what it means, students are motivated to seek knowledge. That relates to another positive outcome—a high level of personal involvement of students with the materials in this system and greater motivation to conduct their own research and draw their own conclusions. * Perhaps the most ironic strength of these kinds of archival electronic resources is that many of the teachers AM interviewed were desperate, it is no exaggeration to say, not only for primary materials but for unstructured primary materials. These would, they thought, foster personally motivated research, exploration, and excitement in their students. Indeed, these materials have done just that. Ironically, however, this lack of structure produces some of the confusion to which the newness of these kinds of resources may also contribute. The key to effective use of archival products in a school environment is a clear, effective introduction to the system and to what it contains. |