Machine Reading the Archive aims to bring humanities researchers, archivists and computer scientists together to explore the challenges of working with archives in the digital age. Through a series of reading group sessions, practical workshops, technical demonstrations, field trips and a one day end-of-programme workshop, we hope to seed new collaborations and encourage the exchange of ideas and practices across professions and disciplines.
The programme is born out of a recognition that the practice of making, curating and using archives has been changed by the adoption of digital technologies, at both an institutional and individual level. Archives and library special collections are developing new roles as platforms for different kinds of data, held in a variety of formats from xml, to pdfs and tiffs, rather than physical containers for people, books and documents. Many researchers return from visits to the archive (or the archive’s website) having filled hard drives with collections of digital photographs of rare books, documents, manuscripts, maps, pictures and objects of scholarly interest whose fragility and immobility required the production of a digital copy. The digital archive thus seeds new private sub-collections on researchers’ laptops and tablets, at times a promising and overwhelmingly rich resources and at other times remaining invisible and inaccessible; while growing in scale and complexity over the trajectory of a scholarly life.
The primary aim of Machine Reading the Archive is to help participants develop a deeper understanding of the challenges and possibilities of working with archival data in the digital age, drawing on theory, methods and practice from the humanities, computer science and the archival profession. The program provides a chance to develop skills to engage with existing digital archives in new ways, to turn a cluttered hard drive of archival photographs into a refined dataset or to embark on the mission of text-mining to reveal new aspects in existing research or lay the groundwork for prospective projects.
In addition to providing participants the chance to learn practical skills and experiment with digital methods using their own or provided datasets, the framework of the course is designed to ignite reflection on the significance of the ways private and institutional digital archives are sorted, structured and accessed and to discuss how these insular knowledge infrastructures impact and influence writing, thinking and the development of research projects.
Joining the programme
Participants can follow the programme through two tracks: Track 1 is by participation and designed for those with punctual interests. Track 2 is by project and will require a larger commitment to the programme. We plan to offer Track 2 participants a series of group and individual mentoring / peer-to-peer learning sessions to support them as they build their own project developing and using archive data. Track 2 is directed both at researchers who have already collected archival materials in digital form and want to find out about new ways of analysing their data and at researchers who are interested in conducting research in digital archives but want to improve skills and gain a broader understanding of computational methods in archival research ahead of their research. Track 2 participants will also be invited to give a short presentation in the final course workshop reflecting on what they learnt over the course of the programme.
If you are looking for project ideas, and are interested in exploring the use of automated text recognition systems for either printed or handwritten texts, please see this briefing for our workshop in collaboration with the Transkribus project in April. Track 2 participants are encouraged to consider creating a training dataset and text recognition model using the Transkribus platform. We welcome informal inquiries about potential projects - please get in touch with Dr Anne Alexander (raa43 @ cam.ac.uk) if you have a project idea you would like to discuss.
Participants must be PhD students or staff at the University and Colleges of Cambridge with the exception of the final workshop which will be open for public booking
Pre-requisites and time commitment
There are no formal pre-requisites, but Track 1 participants are asked to complete the course readings in advance of the relevant sessions. Track 2 participants are asked to confirm their availability for the 2 group sessions and 3 individual mentoring sessions while working on their project, in addition to attending the final programme workshop on 15 June. Graduate students at Cambridge who would benefit from a more in-depth training course in working with literary corpora using Python are encouraged to apply to the Literary Critical Coding course directed by Dr Ewan Jones (Faculty of English) which will be running in parallel with Machine Reading the Archive (more details below).
Applicants for Track 1 should complete the online Programme Registration form Track 1.
Applicants for Track 2 should complete the Track 1 form AND Track 2 application form
The deadline for applications is 6 February and applicants will be notified if they have been successful in gaining a place by 13 February.
Provisional Programme (full details and information on how to register for individual sessions will be sent out to registered participants in February)
Further sessions will be added to this programme in April and May, including dates for optional field trips.
|20 Feb 2017||12.30 - 2.30pm||Group meeting for Track 2 participants|
|21 Feb 2017||12 - 1.30pm||Reading group session|
|28 Feb 2017||12 - 1.30pm||Reading group session|
|14 Mar 2017||12 - 1.30pm||Reading group session|
|20 Mar 2017||11.30 - 4pm||Born-Digital archives workshop (optional)|
|22 Mar 2017||1.30 - 3pm||Under the hood of the digital collection|
|25 April 2017||11.30 - 3.30pm||Automated Text Recognition workshop|
|27 April 2017||1-3pm||Group meeting for Track 2 participants|
|2 May 2017||TBC||Network Analysis methods for correspondence|
|6 Jun 2017||1-3pm||Group meeting for Track 2 participants|
|15 Jun 2017||11.30 - 4pm||Final programme workshop|
Literary Critical Coding
During Lent and Easter Terms, the Faculty of English will be running a course entitled Literary Critical Coding, for which applications are now welcomed. These weekly sessions introduce a range of computational resources that can then be contrasted (or brought into tension) with more traditional readerly practices. Participants will learn how to: identify and prepare relevant corpora and datasets; apply a variety of different analytic tools to those corpora (topic models, semantic analysis); visualise data in several ways; and learn the basics of coding (Python). There will also be the opportunity for broader discussion regarding the history, current scope and future prospects of the digital humanities. The course is free of charge. Interested students should send a brief (300 word) statement to Ewan Jones (email@example.com), outlining how such a course might relate to their current research. Both current PhD and MPhil students are encouraged to apply. Please note this course is only open to current graduate students at the University of Cambridge.