Humanities and Cyberinfrastructure: Changes in Disciplinary Practice
Coalition for Networked Information, Portland Oregon, December 9th
Closing plenary panel, with Don Waters and Mark Kornbluh
1. Changes in disciplinary practices within the humanities, provoked by networked information technology.
To begin with data, consider Elaine Toms' survey on humanities scholars' use of electronic texts (carried out with Ray Siemens, Stéfan Sinclair, Lynn Siemens, and Geoffrey Rockwell)—work in progress, under the auspices of Text Analysis Portal for Research (http://tapor.ualberta.ca/), funded with $6M Canadian by the Canada Fund for Innovation:
"As of November 15th, ninety-six scholars (half male and half female) from more than a dozen countries had responded to the survey. Three quarters were under the age of 45 and most were long term and frequent users of computers and the Web. These respondents came from a range of disciplines working in a range of genre (mostly prose) and primarily using textual material for their research (it should be noted that the survey was especially directed at text-based humanities scholars; this focus is something that subsequent surveys may wish to broaden).
Over 80% use e-text and about half use text analysis tools. In general they believe that e-text are available for their use and expect to find them downloadable off the Web. They prefer to find them in a stable, legal form that is freely available from a reliable institution. In terms of mark-up, respondents appear to be a bipolar group with half expecting to acquire text with no mark-up and half with rich XML.
In general, respondents believe that they need text analysis tools, although not complex tools, and are not happy with the tools that are currently available. Somewhat surprisingly, over 50% did not know about commonly available tools such as TACT, WordCruncher and Concordancer. The one most highly used was TACT but few found it useful. In addition to our list of about ten tools, participants added another two dozen tools that they employ in their work. These included tools such as the Wordsmith Tools as well as common Microsoft Office products such as Word and Access.
We inquired about their collaboration and communication habits. Most use e-mail regularly and subscribe to listservs. But they tend to work as solitary scholars, rarely collaborating with their own graduate students and do not see the need for collaborating with other scholars. That said, they like to communicate with other scholars at various points in the research process. They share some of their materials, but tend not to share notes and tools, although they expect others to share tools."
SO: some of the changes I'm identifying in what follows are more widespread than others. All are supported by observation, but the Toms survey makes it clear that not all are widespread. Still, I would say that networked information technology has provoked the following changes in humanities scholarship (in varying degrees) [AHR cover article in December, out of the Valley of the Shadow]:
A. A growing expectation that adequate information resources should be freely available online, and that exceptional ones should be licensed and provided online by the library
B. Much greater use of images in research and teaching, in previously "textual" disciplines
C. A return to primary/archival sources
D. Collaboration, interdisciplinarity & community (Romantic Circles, H-Net, Blake Archive, Stoa Consortium, Perseus, etc.) including collaboration with computer professionals, librarians, others outside the discipline.
E. Increasing awareness of issues of scholarly communication, including copyright, ownership of research results, permanence, authenticity and reliability, audience.
F. Emerging understanding of the importance of ontologies for modeling disciplinary knowledge
G. A growing understanding of the importance of maps and GIS in a range of applications, especially in modeling processes that unfold over time, in a particular space
H. A growing interest in high-resolution models and reconstructions of physical structures, building sites, cities, landscapes, etc.
I. A growing interest in live internet-mediated collaborative performances in theater and dance.
Then there are the changes for which there's currently little evidence, but which we might predict, based on the foregoing:
A. Demand for a national humanities digital library, and a concomitant demand for tools that would allow us to *do* things (beyond searching and browsing) with digital libraries, like text-mining and visualization, search and retrieval of non-textual data (images, music, video, etc.). Demand for tools and standards to allow for the stand-off reprocessing (e.g., markup, annotation, etc.) of content in digital libraries.
B. Demand for better tools for annotating, comparing, sharing, overlaying, excerpting, and analyzing the semantic content of images. Demand for extremely high-fidelity imaging. Demand for extremely detailed information about the provenance and production of digital information, and demand for tools to authenticate it.
C. Demand for uniform terms of access to and use of digital representations of material from libraries, archives, and museums, for education and research. Questions of transferable rights are likely to surface here, too. Demand for better software for OCR for pre-modern and handwritten materials, video transcription and annotation,
D. A demand for better facilities to support online collaboration, e.g. access grid/conferencing tools, systems for managing project workflow, peer review, and other distributed, collaborative processes, better tools for sharing applications, annotating web-based materials, excerpting with metadata, etc., tools for multilingual collaboration.
E. Move toward open-access publishing, value placed on library stewardship of digital content
F. Greater involvement by scholars in the disciplines with the process of establishing standard schemas for interchange of disciplinary information; interest in interdisciplinary ontologies, perhaps organized around data types; interest in formal expressions of the semantics of markup
G. Interest in large-scale, user-manipulable models of historical processes, events, etc. driven by high-resolution data, which implies a need for high-level computing resources
H. Need to integrate very different types of models (buildings, landscapes, ornate objects) in a single multi-scale environment
I. Need for very high bandwidth
2. Based on 1), what would an advanced cyberinfrastructure for the humanities and social sciences have in common with cyberinfrastructure for science?
A. High Bandwidth
B. High-speed computing resources
C. Facilities for exploration and discovery as well as experimentation and testing of hypotheses.
D. Data available for tools that may be running elsewhere, for real-time manipulation
E. Tools and collections (distributed, federated) available over the web
F. Open-access as a design principle
G. Need for expert human resources to support the full exploitation of cyberinfrastructure in the humanities (and social sciences)
H. Need for representation of humanities in specialized applications programming centers (focused on visualization, for example, or data mining)
I. Need for support for standards development and maintenance
What needs or characteristics would be peculiar to the humanities?
Don has already mentioned a need for access to semantic content and for pluralistic content management systems. Mark's already mentioned the need for a National Humanities Digital Library, and for campus-level support of computational humanities. I'd some other things:
A. Libraries as data repositories
B. Less emphasis on the currency of information, more emphasis on its preservation
C. Multi-linguality as a design requirement
D. Auto-didactic tool design (necessary for non-computational users)
E. In many (but not all) cases, a focus on complexity rather than scale (though even relatively simple operations on complex data will rapidly scale up in terms of computational requirements): if the humanities and social sciences have a bid to become driver applications for computer-science research, it is probably on this point.
F. Compared to the sciences, much lower requirements for security than, say, defense-related research
G. Compared to the sciences, more critical examination of visualizations and other representations of patterns, processes, artifacts, and results with respect to both the artifacts of the tools used and the assumptions underlying their design
What should be disciplinary, what should be institutional?
Disciplines should commission and supervise the development and maintenance of ontologies, at the disciplinary level. Broader institutions (ACLS?) might take on the task of commissioning and supervising interdisciplinary ontologies and cross-walks between ontologies.
Institutions should transplant and carry out their traditional missions with respect to information in new media--publishers should publish, libraries should store and provide access, etc.. They will need to be capitalized to do this, by their provosts.
Funding agencies need to look after funding shared infrastructure--the human resources dedicated to shared systems and applications, for example.
Science funding agencies need to recognize that they are building not only the science and engineering infrastructure for information, teaching, and research in the 21st century, but the national infrastructure for information, teaching, and research. They need to enfranchise humanities and social sciences in the process from the beginning--because of what we know, because of the problems we bring into focus, because of what we can contribute to designing an effective, sustainable, and inclusive cyberinfrastructure.