About the Project
Project Description
This project will present emblem books in an innovative digital environment and develop a portal for a key genre of Renaissance texts and images. Emblematica Online is fulfilling its goals through its three constituent activities: 1) Emblem Digitization: the complete digitization of two premiere emblem collections of world-wide prominence; 2) The German Emblem Databases: the creation of extensive metadata with broad functionality for the German emblems of both institutions in mirror websites; and 3) The OpenEmblem Portal: the development of the portal as an open access research site incorporating book-level metadata from emblem digitization projects worldwide and emblem-level metadata from Illinois and the Herzog August Bibliothek (HAB). The OpenEmblem Portal hosted at Illinois will have a mirror portal at the HAB. The OpenEmblem Portal offers the ability to search and browse across significant levels of granularity, creating functional access to the entire collections of emblem books at Illinois and HAB, to book-level metadata for a number of projects worldwide, and to a large corpus of emblem-level metadata for German emblems from the collections of Illinois and the HAB. Because major search engines such as Google can find the data from these projects, the mass digitization undertaken for Emblematica Online will serve scholarly communities in Germany, the US, and beyond, for research and in higher education.
The term "semantic web" is so often used that it has almost become a meaningless buzzword. That is very unfortunate since a semantic web is exactly what the portal is spinning over a unique corpus of early modern imagery and texts. By gathering well over 10,000 specimens of one of the most popular and widespread art forms of the Renaissance and by offering access to its subject matter in unprecedented depth and detail, completely new forms of research become feasible. Creating a database of the mottos and indexing the meaning of the imagery and the visual means-situations, persons, objects-that were used to express it, make possible highly associative searching and browsing that by its very nature offers the opportunity of what may be called "knowledge discovery." This concept, often used to describe new forms of research that become possible when biomedical or chemical data are collected in huge databases such as PubMed, will also be applicable to Emblematica Online. The essential analogy is that a large quantity of material is combined with sophisticated information about its content. Reliable quantitative information will become available about the occurrence of themes and motives in artistic and literary sources, a hitherto unknown phenomenon. Scholars using this material will no longer have to describe many thousands of images to grasp their content; they can devote their energy to new research questions.
Methodology and Standards
Scanning
The project will involve digital capture of the entire book, cover to cover. The two institutions will scan all emblem books at 300dpi in color with an archival copy as JPEG2000 (Illinois) or tiff (HAB). The digital facsimiles of all emblem books digitized by the project will be available through the on-line catalogues of the respective institutions which also means that all digital volumes will be searchable via web search engines, such as Google. The data will also become part of the Illinois Harvest and the Open Archives Initiative, both at Illinois.For presentation on the internet the digital masters are reduced in size and converted into JPEGs, which allow researchers to investigate the resource in every scientifically relevant respect. At Illinois, the following observations apply. For books 14" x 9" or smaller, which represents the greatest part of our collection, we get 300 dpi or better. At the HAB scanning is carried out by using-depending on the size of the original object-either a Canon Marc III (21 Mio Pixel) camera or a Nikon DX2s (12 Mio Pixel).
Pilot digitization projects have allowed both Illinois and HAB to develop digitization standards and gain experience in digital reformatting and quality control. Illinois will leverage its ongoing relationship with Open Content Alliance (OCA) with whom we have digitized more than10,000 books to date. OCA uses trained staff working on custom-designed equipment, follows accepted best practices for setting and verifying scanner configurations for each book scanned, and uses sampling-based post-scan quality control techniques for all books digitized. Raw scans are saved as JPEG2000 files. Technical metadata detailing book-specific camera settings, skew angle, operator, date and time of scanning, etc., are saved as machine-readable XML. Checksums for all files are generated. Illinois downloads scanned page image files and does additional verification of quality as part of creating our added derivatives. While page images are captured and archived singly, Illinois has demonstrated the feasibility of stitching related page images together, e.g. to make viewable as a single image file all text and graphics associated with a given emblem. Such access derivative formats and views facilitate identification and use of resources. Illinois provides two digital copies of each work; e.g. Meinhard's Geistliche Emblemata. in the Internet Archive and searchable in the German Emblem Books Project.
The HAB explicitly declares that the project will be fully compliant with the so-called "Praxisregeln" (rules of practice) of the DFG. The library guarantees long term availability and reliable quoting. Scanning at the HAB is carried out in-house by using the so-called book reflector, or Buchspiegel. It operates at a 45-degree angle and was developed to digitize precious and rare books without causing stress or damage to the valuable objects during the reproduction process. The HAB has gained long experience in in-house scanning in various projects funded by the DFG such as Festival Culture Online, Digitization of Scientific, Technical and Medical Literature Read by Leibniz, Archaeological Finds in the Early Modern Period, and, most recently, the mass digitization project Dnnhaupt digital. Currently, more than 5,000 editions of the early modern period with almost one million pages are online at the HAB.
Back to Top
Metadata
The research teams will create book-level metadata from their on-line catalogue and transcribe all emblem mottos. The on-line catalogues have already been checked for quality control and will provide accurate data. However, in each case the digital copy must be checked against the bibliographic description. Some volumes contain several editions that must be carefully differentiated in the bibliographic description. While the main goal of the project is access to the individual emblem, the project must also provide the context. Therefore, the project presents the entire book in which the emblems are contained. We will also provide structural metadata for marking-up page numbers, covers, fly leafs, manuscript annotations, table of contents, divisions, paragraphs, and illustrations other than emblems such as printer devices to navigate more easily through the electronic books. At the HAB the project will draw on the structural data list provided by the so-called DFG-viewer project.The HAB will make the necessary structural metadata METS files (containing MODS descriptive metadata) to comply with requirements of the DFG book viewer. Illinois will also be creating METS files (with MODS) containing structural metadata for all digitized books and for the German subset only, but Illinois' structural metadata will be designed to work with a different, somewhat less sophisticated METS Navigator from Indiana University. We will specially enrich structural metadata detail of our METS files for the German subset of Illinois emblem books to be roughly as expressive and detailed as those created by the HAB, but meant to work with the Indiana University METS Navigator application. Before the end of the project we will analyze samples of the HAB METS files against those from Illinois and provide an assessment of what it might take to transform ours into DFG viewer compliant METS. Illinois will try to implement the viewer in a way that access to the German books through our respective book viewers will be similar, if not exactly the same. Illinois will also examine what it might take to transform metadata for a book so it could be viewed in either viewer. Both book viewers have somewhat similar approaches, and we want to learn more about how they relate.
The HAB has converted metadata from the Munich emblem project, funded previously by the DFG, into the Emblem Schema format; these data are ready to be integrated into the OpenEmblem Portal. Professor Dietmar Peil, PI for this project, has placed the metadata at the project's disposal (appendix). Further programming and conversion work concerning these data will be conducted by the HAB.
At Illinois, digitized books also will be made discoverable through our Illinois Harvest Portal, and metadata describing books digitized will be made available for open harvesting through our Open Archives Initiative-Protocol for Metadata Harvesting Data Provider. All German metadata from both projects will be made accessible on the OpenEmblem Portal. Besides making German emblem books accessible through CONTENTdm, all emblem books digitized by Illinois also will be browseable using the METS Navigator Web-based application developed by Indiana University. This allows cover-to-cover page access to an entire book, with all pages available in both medium resolution and high-resolution (near as-scanned dpi) views. Especially for longer works, this allows quick navigation to any page while preserving the ability to look closely at any scanned image.
The German emblem books of both projects will be outsourced for Iconclass notation and the creation of a caption. Iconclass is a hierarchical systematic notation with multi-language thesauri, thus all metadata from these notations will be available in both English and German, as well as other languages. The Iconclass system is available on-line in a new version that is being developed into a freely available webservice. All projects working in the OpenEmblem group have unanimously approved the classification system Iconclass. It is an internationally recognized system for indexing art objects and provides a hierarchical depth of indexing appropriate to emblem research. The projects expect to index with Iconclass notations in order to provide powerful search capabilities via the OpenEmblem Portal. They intend to make the German emblems available through book-level metadata (author, title, place, year, publisher), a union catalogue of the mottos in a searchable database, and Iconclass, that is emblem-level, metadata, from the pictura. This level of indexing will allow scholars to find the research materials they need and locate emblem objects for further analysis. Additionally, the OpenEmblem Portal will host a database of all book-level metadata for digitized emblem books from the universities Utrecht and Glasgow. The transcription of mottos requires subject expertise in early modern German, the ability to read old German typeface, and the skill to transcribe the early modern German motto and provide a standard German translation for searchability in the database.
Organization of and Access to Material
In the workshops on emblem digitization held at the HAB and Illinois in 2003, 2005, and 2007, all groups formally agreed to international standards, known as the "template," for emblem digitization. The template is intended as a means of identifying the basic requirements of description, image capture, and textual transcription that are appropriate in an emblem digitization project. All practices proposed here are compliant with the international emblem community as set down in the "Template" (formerly known as the "Spine").
Emblems and their text descriptions present special challenges in the creation and linking of images and textual information that we propose to address by using two metadata schemes: XML TEI and the Dublin Core element set. Currently in the metadata community, there exist only a handful of schemas for describing various types of objects and literature. A new Emblem Schema has been developed by Dr. Thomas Staecker, HAB, for validation of emblem metadata. The controlled vocabulary is inherent in the Iconclass notation. On the basis of the "template," the HAB has developed a special emblem XML-schema in order to allow data exchange between the various projects, which in part vary considerably in their application of formats (various TEI or DC dialects, or non XML data, such as data in databases). Partners planning to support the OpenEmblem group by sending data to the portal can convert their data according to the "electronic template." An XML-schema validation tool is provided to help projects create conformant metadata. In a pilot project data from various sources have been successfully converted to this format and then integrated into the union motto database. Emblem metadata from the portal will be offered according to this schema via OAI. The HAB will extend their OAI interface - already delivering DC, MODS and METS - by this emblem format. Both projects will include the emblem namespace in OAI, allowing other projects to harvest our Emblem data.
Storage, Maintenance, and Protection of Data
Because Illinois will be digitizing our emblem books in collaboration with the Open Content Alliance (OCA), part of the Internet Archive, a copy of all page scans will be maintained (publicly accessible) by the Internet Archive. The Internet Archive, founded in 1996, has a proven track record of maintaining large amounts of digital content long-term. The Internet Archive currently stores more than 3 PBs of data. Data is stored both on hard disk and on DLT Brand tape, with tape storage being phased out in favor of redundant/mirrored online storage. While pro-active format migration has not yet been implemented by the Internet Archive, plans for this are currently under development.
As a complement to OCA storage and archiving of scanned content, Illinois currently captures and downloads copies of all scans (JPEG2000 format) and derivatives created by OCA from our books as a matter of course. For this project, these files will be stored on Illinois' Grainger Library's EMC CX300 Storage Area Network (SAN) array (typical DAE: 15 disks x 300 GBs each disk) or possibly on an equivalent SANS located in Illinois' Main Library (current SANs capacity across the Library about 15TBs), with SANs-stored files backed up to tape on a routine and ongoing basis (we are currently evaluating tape-alternative near-line storage options).
To facilitate eventual long-term archiving and proactive preservation of raw page scans and all associated metadata, METS files are generated for each scanned book object, including both book-level descriptive metadata in MODS format and an enumeration of page scans with technical and PREMIS metadata elements. (The METS format used here is adapted from one we developed as part of our participation in the Library of Congress National Digital Information Infrastructure Preservation Program [NDIIPP] activity.) Per page scan records embedded in METS files include checksums in 3 different hashes for each digital page scan file. For those page scans having page-level/emblem-level metadata, these metadata will be included in the METS files created for each book. All metadata (including book-level MARC and METS with embedded MODS files, as well as project-specific book and emblem-level metadata) and all page scans associated with each emblem book digitized for this project will be maintained both online and copied onto gold archival DVDs, bar-coded and stored in environmentally appropriate conditions at the Library's off-campus remote storage facility.
At this time the Illinois Library does not yet have in place OAIS-conformant Trusted Digital Repository (TDR) technical architectures and administrative policies. These are currently being developed in collaboration with CITES, our campus academic networking office, and our Institute for Computing in Humanities, Arts, and Social Science (I-CHASS), a collaboration of the Illinois Informatics Institute and the National Center for Supercomputing Applications (NCSA). Though we cannot at this time promise a fully vetted TDR before completion of the proposed project, we do anticipate having, at least in its earliest form, a sustainable OAIS-based digital preservation implementation before the end of the proposed project.
The HAB stores its digital images redundantly (RAID 5) in TIFF uncompressed, an archive format which has been widely adopted and can be read by virtually all image software. Each digital image has on average 23 MB. At present its archival system (Windows 2003 Server (DELL PE 2850; RAID-Arrays (EonStor A24U, 2*easyRAIDQ16+, 2*easyRAIDS16) and TapeLibrary (Adic Scalar i500) contains 20 TB of data which are stored on RAID-Arrays. The calculated increase is 7 TB each year. Backups are stored on magnetic tape (LTO3) in an automated process using a tape library. All digital copies are catalogued together with pertinent metadata in the electronic library catalogue of the library. Backups are stored at different locations to prevent losses, e.g. by fire.
Webpage design based in part on designs by MyWebResource and Bartosz Brzezinski.