Digital Resource Aggregation
These publications consider how resources are preserved, accessed, interacted with, and aggregated in the space of the digital archive. Some of these articles consider the structure of the archive (including the construction of metadata), and how this plays a crucial role in the way a scholar accesses and interacts with the information. Other publications discuss reading habits on screen, including the question of effective annotating in a digital environment. Additionally, the shifting practices of scholarly publication and research are interrogated. Scholars consider the differing qualities between print and digital publications – such as the static life of print and the dynamic, ever-changing qualities of the digital – and explore how these differences effect publication. Finally, some of these articles consider aggregation, specifically the challenges of interoperability and data collection that are innate to working across a diverse cross-section of digital resources.
Biblio Citation | Abstract |
---|---|
The Semantic Web. Scientific American. 284, 35–43. | (2001).
A new form of web content that is meaningful to computers will unleash a revolution of new possibilities |
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries. 262–270. | (2005).
Arwen Hutt and Jenn Riley discuss the heightened interest in and development of aggregated cultural heritage resource collections. They focus on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) whose goal is to provide a “low-barrier method for the sharing of metadata” between centralized repositories. Hutt and Riley conduct an in-depth study of the UIUC Digital Gateway to Cultural Heritage Materials' use of the creator, contributor, and data fields of Dublin Core. Their results found that while there is high compliance and accuracy in the use of the data and creator fields, the contributor field as a high level of inappropriate values indicating that there is confusion over this concept. Hutt and Riley emphasize that useful, shareable metadata relies on communicative relationships between OAI-the object and OAI-search environment. They argue that there is a disconnect between the structure of Dublin Core and the descriptive needs of institutions. In order to remedy this they suggest (i) removing the requirement of Dublin Core, (ii) developing best practice documentation, (iii) educating metadata providers, and (iv) facilitating the sharing of resources between service providers. |
Search and discovery across collections: the IMLS digital collections and content project. Library Hi Tech. 22, 307–322. | (2004).
Cole and Shreeves open this article by establishing that while the proliferation of digital resources has increased the ability for engagement with materials online, the magnitude of this potential "is tempered for many end‐users by the difficulties in locating specific, desired information resources within the almost overwhelming aggregation of information now available." In this essay, Cole and Shreeves examine a grant-funded research initiative tasked with addressing these issues of aggregation. The goal of the project is to collect "item-level metadata for digital collections and content associated with IMLS (Institute of Museum and Library Services) and NLG (National Leadership Grant) projects." The project objective was to facilitate a network/map of resources in order to situate them in context and make them more accessible to end-users. Cole and Shreeves argue that digital resources need to strike a middle-ground between Google-like approaches and the laborious undertaking of "large scale (and accordingly high‐cost) monolithic digital library solutions." In conclusion, Cole and Shreeves reiterate that the "advent of the Web and other related digital technologies presents a good opportunity for increased content sharing and collaboration in the development of information systems." Creating frameworks that make this content more visible and accessible will only increase its potential for research. |
The Next Big Thing in Humanities, Arts and Social Science Computing: 18thConnect. {HPCwire}. | (2008).
For the humanities scholar who may have only recently mastered library and archival finding aids beyond the archaic card catalog, the possibility of retrieving source materials at the flash of a keystroke (well maybe a few...) is very heady stuff. |
Metadata Quality for Federated Collections. | (2004).
he aim of the essay is to showcase an "approach to conceptualizing, measuring, and assessing metadata quality" by presenting the results from an empirical study on the quality of metadata across large corpuses. The methodology of this empirical study involved creating a framework of quality dimensions used to then assess the quality of the data. These dimensions were intrinsic information quality, relational information quality, and reputational information quality. Stvilia et al. concluded in their study that poor quality metadata is equal to a value loss, that quality is equal to the amount of interaction with an object, and that quality is also equal to the effectiveness (and efficiency) of the metadata. In a random sample of 150 OAI Simple DC records (taken from a total corpus of 154,782 records) the research team determined six major quality issues: "(1) lack of completeness, (2) redundant metadata; (3) lack of clarity; (4) incorrect use of DC schema elements or semantic inconsistency; (5) structural inconsistency and (6) inaccurate representation." In conclusion, Stvilia et al. state that their future research will "focus on user valuations of metadata quality." |
Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web: Theory and Technology. 1, 1–136. | (2011).
Heath and Bizer open their monograph by illustrating the predominance and prevalence of data in our modern world. Heath and Bizer argue that just "as the World Wide Web has revolutionized the way we connect and consume documents, so can it revolutionize the way we discover, access, integrate and use data." In order to facilitate this, they propose linked data. Heath and Bizer assert that a "key factor in the re-usability of data is the extent to which it is well structured." Heath and Bizer propose that well-structured, linked data across the Web would enable "the extension of the Web with a global data space." |
How Scholarly Is Google Scholar? A Comparison to Library Databases. College & Research Libraries. 70, 227–234. | (2009).
Howland, Wright, Boughan, and Roberts conduct a comparative study of resource results from Google Scholar and proprietary library databases. The object of their research is to address some of the central critiques and queries of Google Scholar: Are Google Scholar result sets more or less scholarly than licensed library database result sets?; Does the scholarliness of Google Scholar vary across disciplines?; and Is comprehensiveness of content the primary indicator of a resource’s usefulness? Howland, Wright, Boughan, and Roberts position their work against other studies by arguing that they are much more interested in evaluating the results than measuring the database. The authors developed a modified Kapoun collection-assessment rubric as the backbone of their research. The study's results demonstrated that, on average, that resources found on Google Scholar have higher average scholarliness ratings than those on the library database alone. Additionally, there wasn't a statistically significant difference in the results among disciplines. As way of conclusion, Howland, Wright, Boughan, and Roberts suggest several avenues for further studies. |
Getting the Word Out: Making Digital Project Metadata Available to Aggregators. First Monday. 11, | (2006).
In this action-oriented article, Hillman discusses the benefits of aggregated metadata for collections. Hillman argues that digital library collections currently rely on search engine results to promote their materials but this is not enough exposure. Hillman suggests that joining an aggregated database (specifically under the OAI-PMH) provides another avenue for user to access resources that might be missed if only searching through an individual portal. Hillman acknowledges several obstacles (including technical limitations) but suggests that project leaders use the following checklist to best prepare their collection for aggregation: adhere by the Simple Dublin Core metadata schema; consider context; use standard vocab; enforce consistency; and document practices. |
Killer Applications in Digital Humanities. Literary and Linguistic Computing. 23, 73–83. | (2008).
In this article Patrick Juola addresses stagnancy in digital humanities. Juola argues that, even though the discipline has been around for over forty years, digital humanities is still considered to be an emerging sub-discipline of the larger humanities. Juola acknowledges that this presents major problems for digital humanities scholars because tenure-track positions are rare, and digital humanities publications are significantly under-read and under-valued. Juola analyses these "patterns of neglect" and tries to identify their cause. Juola argues that a lack of participation and a lack of awareness are two main factors. Juola speculates that the lack of awareness and use of digital humanities tools is motivated by a mismatch between the tools needed/wanted by the humanities generally and the tools being produced. Juola suggests that digital humanities scholars focus on creating a "killer application": a tool whose uses justify its development and support. Juola identifies a couple of major areas that he thinks could use a super tool. To conclude, Juola advocates for the continued financial support of the digital humanities in order to fuel the creation of a "killer application". |
Taking the Long View: From e-Science Humanities to Humanities Digital Ecosystems. Historical Social Research / Historische Sozialforschung. 37, 147–164. | (2012).
In this article, Anderson and Blanke examine the digital research infrastructures and technological structures at work in the humanities. Anderson and Blanke argue that the development of technology must be understood as evolving alongside social, political, and legal elements. Despite the astonishing increase in digital content, Anderson and Blanke see the humanities clinging to traditional methods and only embracing the digital in fragmented, segregated, and conservative applications. This failure to penetrate mainstream humanities is seen as a major barrier. Anderson and Blanke then turn to e-science to uncover how this movement has transferred successfully and unsuccessfully to the humanities. As way of conclusion, Anderson and Blanke offer up understanding the humanities as a new digital ecosystem. |
The Semantic Web Revisited. IEEE Intelligent Systems. 21, 96–101. | (2006).
In this article, Berners-Lee, Hall, and Shadbolt examine the development of the semantic web over the last 50 years. The semantic web is characterized as a "web of actionable information" comprised of documents for humans to read and data for computers to manipulate. The authors argue that the semantic web necessitates a need for ontologies and data integration to function properly. An exploration of data frameworks, types of ontologies, and the rise of folksonomies are detailed in this article. As way of conclusion, Berners-Lee, Hall, and Shadbolt point towards a new wave of development in the methods, challenges, and techniques of the semantic web. |
A Scholar’s Guide to Research, Collaboration, and Publication in NINES. Romanticism and Victorianism on the Net:. | (2007).
In this article, Bethany Nowviskie discusses the rationale, development, and usefulness of the NINES tool Collex. The article functions both as a scholarly exploration of Collex and as a user handbook. Nowviskie recounts her personal experience working as the design editor for the Rossetti archive and working on overhauling the the digital project in 2004. As a component of this redesign, Nowviskie explains how she set to design a tool that "would combine the best elements of social bookmarking or collecting systems (such as Connotea and del.icio.us ) and of specialized online curation and exhibit architectures." This tool became Collex. Nowviskie's step-by-step explanation of working efficiently and effectively through Collex and NINES is a great resource for any beginner user. Nowviskie clearly describes the various features in Collex such as clouds, views, feeds, and permalinks. Nowviskie concludes this article by accounting for the trajectory of Collex and its role in the NINES environment. |
The Continuum of Metadata Quality: Defining, Expressing, Exploiting. | (2004).
In this article, Bruce and Hillmann explore the challenges of defining metadata. They argue that past efforts on the part of librarians have been relatively inconsequential in defining the parameters and standards for metadata. Bruce and Hillmann assert that there is a tension between metadata efficiency and quality that makes setting standards a particular challenge. Quality metadata is characterized in this article through the evocation and explanation of several criteria: completeness, accuracy, provenance, conformance to expectations, consistency, coherence, timeliness, and accessibility. In order to assist in the practical implementation of these standards, Bruce and Hillmann construct a table with each element, a series of questions that should be asked when addressing that element, and where the information should be located in the metadata schema. |
Iter: Where Does the Path Lead?. Early Modern Literary Studies: A Journal of Sixteenth- and Seventeenth-Century English Literature. 5, 2.1–26. | (2000).
In this article, Iter founder William Bowen traces the inception, progress, and future goals of the Iter project developed out of the University of Toronto's Centre for Reformation and Renaissance Studies, the Renaissance Society of American, and the Arizona Centre for Medieval and Renaissance Studies based at Arizona State University. Bowen begins by recounting the rapid growth of Iter between its founding in 1995 and its establishment as a main database in 1998. Within this period the Iter project received 2 substantial Mellon grants, financial and labour support from several other organizations, and witnessed the database subscriptions reach 115 institutions. Bowen continues by discussing the search and view functions of the site. With a goal of reaching 500,000 record by the end of 2001, Bowen explores the thoughtful design that has accompanied the creation of Iter's resource database. Finally, in conclusion, Bowen discusses the future plans for the project. Among these is the goal of providing access to materials "in new and exciting ways which are not always possible or practical in a print environment." |
Culture and technology: the way we live now, what is to be done?. Interdisciplinary Science Reviews. 30, 179–189. | (2005).
In this article, Jerome McGann demonstrates that the crisis in the humanities is not the result of a crisis in critical theory or cultural studies, but rather a failure to fully embrace and switch into a digital mode. He argues that the humanities should cultivate a realistic attitude and accept the inevitability of moving into online scholarly production, which could maintain its reliability by practicing online peer-review. McGann points to the NINES (Networked Infrastructure for Nineteenth-Century Electronic Scholarship) project, which implements “integrated online peer-reviewed research in nineteenth-century British and American studies,” as an example of a platform that has made use of digital resources and offers “functioning, standards based model for uniformly coded digital materials, along with a suite of computerized analytic and interpretive tools.” McGann also demonstrates how the reluctance to switch to a digital mode stems from institutional and political reasons, rather than technical or economic ones. |
Metadata practices and implications for federated collections. Proceedings of the American Society for Information Science and Technology. 41, 456–462. | (2004).
In this article, Palmer and Knutson present empirical survey evidence on the creation of federated metadata. Palmer and Knutson begin by arguing that digital collection research has shifted focus from content development to interoperability and that this transition calls for new research and empirical studies. Palmer and Knutson's work helps to fill this gap by presenting the results of their multi-method, large sample survey of institutions working to develop federated metadata. The results revealed that over 50% of the institutions used Dublin Core to structure their metadata and 34% of the survey respondents used multiple schemes (including MARC and TEI). The survey respondents noted that collaborative development of metadata was challenging but beneficial, overall, to the projects. Palmer and Knutson conclude by suggesting that institutions use the sub collection feature to denote unique characteristics in the collection. |
Collection Definition in Federated Digital Resource Development. Proceedings of the American Society for Information Science and Technology. 43, 1–16. | (2006).
In this article, the authors explore the use and meaning of the word "collection." While the authors observe that creating individual collections with individual metadata definitions is acceptable on a small scale, when the shift is made to federated or distributed collections, this is no longer a viable option. The authors use the IMLS Digital Collection Registry as an case-study for their research. They explore the results of a staged, multi-method research strategy that addresses issues such as collection scope, collection consistency, collection enrichment, and audience. Finally, in their discussion the authors conjecture about the influence of the digital world on federated collections. |
Toward principles for the design of ontologies used for knowledge sharing?. International Journal of Human-Computer Studies. 43, 907–928. | (1995).
In this article, Thomas Gruber explores the development of ontologies for knowledge sharing. Gruber defines an ontology as an explicit conceptualization, where conceptualization means a collection of like entities. Gruber emphasizes that most ontologies are about consistency not completeness. Because ontologies are designed, researchers can choose how something is represented. Gruber advocates for the use of five different standards when developing an ontology: clarity, coherence, extendability, minimal encoding bias, and minimal ontological commitment. Gruber acknowledges that some "tradeoffs" must be made because not all criteria work together. Gruber then transitions into a series of case studies where he create and critiques examples of ontologies in both engineering and bibliographical situations. Gruber concludes by articulating that it is critical to define conceptual entities instead of just specifying elements when creating an ontology. |
Using NINES Collex in the Classroom. The Chronicle of Higher Education Blogs: {ProfHacker}. | (2010).
In this blog post published on the popular ProfHacker feed, Amy Earhart discusses the learning benefits of integrating use of the NINES Collex in the literature classroom. Earhart's course requires students to navigate the NINES database to locate six items they could "use in a research paper on a topic related to our class readings and discussions." The students assemble these items into an exhibit with analysis and a conclusion. Earhart argues that the process of this project, not the outcome, was her main focus in its creation. Earhart uses the common hiccups and errors mades by students in using databases as teaching moments to emphasize quality research practices. By exposing the technological and economics behind scholarly databases, Earhart argues that students became more efficient and informed researchers. |
Creating an Online Portal into the Medieval World. The Abstract. | (2012).
In this brief article Matt Shipman provides an introduction to the Medieval node of ARC: Medieval Electronic Scholarly Alliance (MESA). Shipman notes that MESA covers the period between 450-1450 AD and is the first aggregated research platform for this period. Shipman emphasizes the usefulness of this resource and briefly addresses the project's ability to avoid violating copyright. |