Latin American Digital Initiatives

The Latin American Digital Initiatives (LADI) repository is the Benson Latin American Collection’s primary access portal for collections created or acquired through post-custodial partnerships. I contributed in various capacities to both versions of the LADI site, as well as to the creation of the digital collections it contains.

LADI first launched in 2015, and was built to provide digital access to collections held by three project partners in Central America. These collections were digitized and described on-site by partner teams in El Salvador, Nicaragua, and Guatemala, with technical assistance provided by Benson archivists. I joined the project as a graduate student in 2015, and contributed to QC for the images and metadata in the collections. A snapshot of the original version of LADI is available via the Internet Archive’s Wayback Machine, although the collection images themselves are not included.

Screenshot from the first version of the LADI repository, welcoming users to the site and a sample image of a Salvadoran resistance poster, showing workers raising hammers and sickle
The welcome message greeting users to the first version of LADI, launched in 2015

The first phase of the project, which was funded by a Mellon Foundation planning grant, was followed by a second Mellon grant beginning in 2017. This second phase added collections from Mexico, Colombia, and Brazil, trilingual metadata functionality, and enhanced searching and browsing. I was involved in this phase of the project from the outset, beginning with the design of the digitization workflows for each new collection.

LADI Collections

The 2017 Mellon grant project digitized the following collections and added them to LADI:

These collections joined those of the three project partners from LADI’s first phase:

The collections contributed by these six partners span the 16th through the 21st centuries, and cover a wide variety of potential research topics, with particular focus on Afro-Latinx and Indigenous rights, environmental justice, and Cold War–era internal armed conflicts.

Digitizing partner collections

Creating digital surrogates for each of these collections required months of research, planning, and close communication with each partner. While we planned each project with certain archival standards in mind, our guiding principles emphasize flexibility and respect for our partners’ needs over a rigid one-size-fits-all approach.

In order to tailor our metadata templates, tools, and workflows to the needs of each collection, our team members traveled to partner sites in Mexico, Colombia, and Brazil to review the materials and learn about our partners’ priorities. The information we gathered on these scoping trips allowed us to return with a more concrete understanding of how a collection could be scanned, how to structure the digital surrogate, and what we might expect to accomplish in the time frame set by our grant. As the lead on the Fondo Real de Cholula project, I was responsible for the scoping trip to Puebla, Mexico.

In July 2017 I visited the Archivo Judicial del Estado de Puebla to review the Fondo Real de Cholula. Photo by Kelly McDonough

Following each scoping trip, our team divided the work of project planning and documentation in preparation for the project launch itself. I was primarily responsible for selecting digitization equipment and designing workflows, while my colleague Itza Carbajal led our metadata planning. Our supervisor Theresa Polk coordinated with project partners and stakeholders at UT.

I chose the digitization equipment for each project using a number of criteria, beginning with the most critical:

1. The ability to digitize the collection in question safely

The Fondo Real de Cholula collection is made up of extremely fragile bound documents called expedientes, the earliest of which dates to 1571. Like the Libros de Hijuelas project in Michoacán, I chose a horizontally-mounted DSLR camera approach for this collection that would digitize each expediente without touching it. The collections held by PCN and EAACONE, meanwhile, are modern and much less fragile, and could be safely digitized with flatbed scanners.

2. Digitization quality

For the PCN and EAACONE projects, evaluating the quality of a potential scanner was a relatively simple matter of reviewing its maximum optical resolution. Knowing that both partners’ collections included photographs, I ruled out any scanners with a maximum resolution below 600ppi. I also looked for large “book” scanners, with a scanning area running right up to the edge of the scanner that would allow bound items to be scanned more easily.

With a mounted camera approach, on the other hand, the resolution of the scans depends on the camera sensor’s resolution and the distance from the camera to the table, as determined by the height of the tripod or mount. The optimal height for the mount is itself determined by the focal length of the camera lens and the size of the materials to be scanned: larger objects require a taller tripod (which reduces the fine detail of the images) or a wide angle lens (which distorts the images). Following my scoping trip to Puebla, I knew the expedientes in the collection were slightly smaller than the Libros de Hijuelas we had digitized the year before, so I used the 20.2 megapixel cameras from that project as a baseline.

Scans produced using mounted cameras rather than flatbeds suffer as a result of ambient light and the distance between the documents and the camera sensor. This meant that the quality of the Fondo Real de Cholula scans would also be determined by the auxiliary equipment such as the tripod and lights. I looked for sturdy tripods that would minimize shutter shock when deployed horizontally, and high-wattage 5500K LED bulbs to minimize interference from ambient lighting.

3. Digital preservation needs and compatibility with target software

We bake digital preservation measures into our project workflows, first and foremost by selecting preservation-quality file formats. For projects using flatbed scanners, this is a simple matter of ruling out any scanner that cannot output TIFFs. For mounted camera digitization projects, our workflows produce both TIFF and raw image formats, both of which are useful for preservation purposes. I evaluated several manufacturers’ propriety raw formats to make sure they could be converted to the open Digital Negative (DNG) format without a loss of data.

I also approached this phase of the project with a good idea of the software we would ask the team to use. The Libros de Hijuelas project showed me the degree to which software could limit or expand the scope of a workflow, and as a result I looked for equipment that could work with my target software. For the flatbed scanner projects, I liked VueScan, which offers filename templating, image editing tools, near-universal scanner compatibility, and Spanish and Portuguese interface options.

For the Fondo Real de Cholula, I planned on using Adobe Lightroom, as I had with the Libros de Hijuelas project, but Adobe’s move towards the subscription-based Creative Cloud suite presented a serious challenge. I couldn’t guarantee our partners would have the on-site internet access needed to maintain such a subscription, and I wanted to be sure they actually “owned” the software, so they could continue to use it after our project ended. Put bluntly, the Creative Cloud model is fundamentally incompatible with our values and goals. Unfortunately, older license-based versions of Lightroom do not support tethered capture with newer camera models. In the end, I decided that ongoing access to the software justified purchasing a slightly lower-resolution camera, and I looked only at cameras that were compatible with Lightroom 5 or 6.

4. Cost, portability, and availability

After narrowing possible options using the above criteria, I looked for the best in-stock options that fit within our budget, and which could be packed in the suitcase we would be bringing with us to our partners. For the Fondo Real de Cholula, Ifound a reasonably inexpensive tripod and telescopic light stands that allowed me to use as much of our budget as possible on the camera itself. I chose a Canon EOS 6D, the same camera we had used successfully in the Libros de Hijuelas project, since the newly-released 6D Mark II required Adobe CC for tethered shooting.

Finding flatbed scanners was a challenge, as there are very few large-format (A3) scanners available to consumers: manufacturers seem to have shifted their focus to portable scanning wands and specialty photo scanners in recent years. I chose the Plustek OpticBook A300, which plays nicely with bound items and fit within our budget.

With the equipment in hand, I set to work testing, refining, and documenting the digitization workflows that our partners would be following. I aimed for fast, easy to understand digitization processes that our partners would find agreeable, and I did my best to be as precise, thorough, and approachable as possible in how I wrote the guides. The final guides make frequent use of screenshots and photos of the actual equipment our partners would be using.

The Fondo Real de Cholula workflow, available here, is based on the Libros de Hijuelas workflow, described (in Spanish) here. The updated workflow streamlines the photography process and makes use of additional post-processing to produce higher quality images faster. I compared the two workflows in my 2018 presentation at the Texas Conference on Digital Libraries, and those slides are available here.

The flatbed scanner workflow, using VueScan, was devised from scratch. It drew on the Lightroom workflow used in Puebla and Michoacán, incorporating the color correction and filename templating lessons we learned over the course of those projects. These workflows produced much more consistent images and filenames than we had received from our partners in the previous grant phase. The PCN workflow guide is available in Spanish here, and the EAACONE workflow guide is available in English and Portuguese here.

To launch each of the three projects, a member of our team traveled to our partner’s location, together with a member of the Benson Special Collections team and the researcher who had put us in touch with the partner. We brought the digitization equipment we had purchased, as well as printed copies of the digitization workflow and metadata documentation. Over the course of a week, we met with the team who would be doing the work on site, walked them through the process, and answered their questions. A week is very little time to learn the ins and outs of the digitization and metadata processes we had devised, but the teams took to the work very well. I led the Fondo Real de Cholula workshop in Puebla, alongside my colleague Dylan Joy and Professor Kelly McDonough.

Teaching the members of the Fondo Real de Cholula digitization team to use Adobe Lightroom in August 2018. Photo by Dylan Joy

Building the LADI repository

With the projects launched, our attention turned to supporting the teams remotely and building the repository that would provide access to the collections the teams were digitizing. We redeveloped LADI from the ground up using the latest version of the Fedora/Islandora repository infrastructure, which at that time was the still-in-development Islandora ISLE.

We didn’t set out with a radically different vision for the new version of the site: we liked the existing structure, and wanted to preserve its clean, staid look out of respect for the serious nature of the collections themselves. But the jump from Fedora 3/Islandora 7 to Fedora 4/Islandora ISLE was a major one: the new repository would have to be based on the Resource Description Framework (RDF) rather than XML as the original was. This would require a totally different approach to metadata and server infrastructure, a steep learning curve for our metadata librarian Itza Carbajal and software developer Minnie Rangel. We also recognized the need to formalize our content model, to support both aggregate metadata for multipage objects and page-level metadata.

Processed metadata from the Fondo Real de Cholula collection

The original version of LADI, launched in 2015, provided robust searching, faceting, metadata, and image viewing functionality. It was not truly multilingual, however: users could switch between an English and Spanish interface, but metadata field names and values could not be translated. For the first phase of the grant, featuring collections from Central America, this was an acceptable limitation, but the addition of the EAACONE collection from Brazil in the second phase pushed us to add a Portuguese interface and some metadata translation functionality.

Metadata translation was accomplished by designating certain controlled fields (such as subject heading, object type, language, and place of origin) as taxonomies. Metadata values in these fields would display to users as links rather than plain text, explicitly connecting all matching objects via the search interface. It also allowed us to add translations of each value, thereby explicitly defining, for example, Appraisals Avalúos, and Avaliações (in the English, Spanish, and Portuguese interfaces, respectively) as synonymous. Users could now see identical search results in every interface, and Spanish-language searches could turn up relevant Portuguese-language results. Likewise users would now see an object’s subject headings in whatever language they had selected, making the collections far more approachable for those with less than perfect fluency in the language of the original documents.

Building the new repository gave us the opportunity to properly recognize the contributions of each of our project partners by building out a dedicated landing page for each one. We also shared project documentation such as digitization workflows on a special Resources page, to allow users to build on our work in their own projects.

Populating the repository

Adding collection materials to the repository was a months-long process. Our graduate assistants Alejandra Martinez and Elizabeth Peattie helped me comb through thousands of images to spot scanning or pagination issues, while Itza Carbajal refined and consolidated the metadata created by our partners in order to prepare it for ingest and user access. One of my first forays into Python programming was prompted by the need to systematically reorganize the Fondo Real de Cholula prior to ingest into the repository.

An excerpt from a script written to reorganize Fondo Real de Cholula images into document-level folders. This rudimentary script was one of my first attempts at processing a collection using Python.

Our software developer designed a batch ingest workflow using an SFTP server, ingest manifests, and PHP ingest scripts to load assets into the Fedora server and convert CSV metadata tables into RDF triples. I also developed Python scripts of my own to check the repository’s taxonomy translations and add new values as needed.

The collection processing and ingest stage of the project was both a marathon and a sprint: it came at the very end of our second grant phrase, while we waited to find out if there would be a third phase. We had roughly six months to process thousands of objects (totaling tens of thousands of images) and ingest them via SFTP. Occasional server hiccups, metadata encoding issues, or misnamed files caused entire ingests to fail, setting us back hours or days a time. At the same time, I was getting ready to get married and missed a significant amount of work as a result.

Nevertheless, we documented our processes as thoroughly as possible, and the result is (we hope) a sustainable and scalable pipeline for adding more materials to the repository in the future. The Benson was awarded a third Mellon grant in 2020, to fund further digitization of partners’ collections. This third grant phase will add tens of thousands more images to LADI in the coming years.

We formally launched the new version of LADI in spring 2020. I wrote briefly about the new site in the UT Libraries Tex Libris blog, and highlighted a few collection items in Portal, the LLILAS Benson magazine. The English version of LADI can be accessed here.