Machine Reading the Archive aims to bring humanities researchers, archivists and computer scientists together to explore the challenges of working with archives in the digital age. Through a series of reading group sessions, practical workshops, technical demonstrations, field trips and a one day end-of-programme workshop, we hope to seed new collaborations and encourage the exchange of ideas and practices across professions and disciplines
While registration to Track 1 of the programme simply allows for participation in the events and workshops of the programme, Track 2 is designed to allow graduate students, post-docs and staff members to develop their own project in which any kind of archived information could be subjected to a broad range of digital methods and analytics. Participants may work on an archive or collection of their own or propose working with an existing collection on- or offline. The programme will provide access to expert support for participants wanting to develop digital skills such as text-mining, mapping, data-refining and automated text recognition.
Track 2 participants will benefit from 3 group sessions and 3 one-on-one mentoring sessions designed to help them get their digital archive project off the ground.
Examples of possible projects:
- An institutional paper-archive containing letters, protocols, minutes and agendas stretching over a couple of years. Objectives could be to scope out methods for digitising a sample of the material, and extract information such as dates, names, subjects and problems or issues to create a database of members, their involvement and their participation in meetings on subject x and y.
- A corpus of published reports, books and manuscripts. The aim is to extract information that occurs across the set of publications. However this information is organised (e.g. as a set of tables, an appendix with certain information or lists of medical or legal case files ) the aim is to try out methods of gathering the information and turn it into a structured dataset which can be analysed and visualised using digital methods.
- A collection of digital content in the Internet Archive. The collection is difficult to navigate as many documents have corrupted or machine-generated file names and don't adequately describe the contents. Objectives could include: finding out roughly what the collection contains, exploring how to generate basic metadata for the collection items automatically and seeing what tools are available to trace the provenance of individual collection items
- A collection of digitised hand written or printed texts. Using the Transkribus platform, the objectives could be to train an algorithm to automatically recognise the text, by transcribing a 50+ page subset of the full collection and testing its accuracy. (Projects working with Transkribus will also benefit from support from the Transkribus project team and a dedicated workshop in April – more information here)
Machine Reading the Archive is open to PhD students and members of staff at the University of Cambridge, with the exception of the final workshop which will be open for public registration.
We welcome informal enquiries from potential applicants - please contact Anne Alexander email@example.com to discuss your proposal
Read more about the programme online here
Sign up for Track 1 here
Apply for Track 2 here