skip to primary navigationskip to content

Machine Reading the Archive - programme overview

A digital methods development programme organised by Cambridge Digital Humanities Network, Cambridge Big Data and the Cambridge Digital History Programme. Programme outline and registration information below.

Machine Reading the Archive aims to bring humanities researchers, archivists and computer scientists together to explore the challenges of working with archives in the digital age. Through a series of reading group sessions, practical workshops, technical demonstrations, field trips and a one day end-of-programme workshop, we hope to seed new collaborations and encourage the exchange of ideas and practices across professions and disciplines.

The programme is born out of a recognition that the practice of making, curating and using archives has been changed by the adoption of digital technologies, at both an institutional and individual level. Archives and library special collections are developing new roles as platforms for different kinds of data, held in a variety of formats from xml, to pdfs and tiffs, rather than physical containers for people, books and documents. Many researchers return from visits to the archive (or the archive’s website) having filled hard drives with collections of digital photographs of rare books, documents, manuscripts, maps, pictures and objects of scholarly interest whose fragility and immobility required the production of a digital copy. The digital archive thus seeds new private sub-collections on researchers’ laptops and tablets, at times a promising and overwhelmingly rich resources and at other times remaining invisible and inaccessible; while growing in scale and complexity over the trajectory of a scholarly life.

The primary aim of Machine Reading the Archive is to help participants develop a deeper understanding of the challenges and possibilities of working with archival data in the digital age, drawing on theory, methods and practice from the humanities, computer science and the archival profession. The program provides a chance to develop skills to engage with existing digital archives in new ways, to turn a cluttered hard drive of archival photographs into a refined dataset or to embark on the mission of text-mining to reveal new aspects in existing research or lay the groundwork for prospective projects.

In addition to providing participants the chance to learn practical skills and experiment with digital methods using their own or provided datasets, the framework of the course is designed to ignite reflection on the significance of the ways private and institutional digital archives are sorted, structured and accessed and to discuss how these insular knowledge infrastructures impact and influence writing, thinking and the development of research projects.

Joining the programme

*Please note: Track 2 applications for our current programme are closed, however you can still register for Track 1 in order to access priority booking for programme events.*

Participants can follow the programme through two tracks: Track 1 is by participation and designed for those with punctual interests. Track 2 is by project and will require a larger commitment to the programme. We plan to offer Track 2 participants a series of group and individual mentoring / peer-to-peer learning sessions to support them as they build their own project developing and using archive data. Track 2 is directed both at researchers who have already collected archival materials in digital form and want to find out about new ways of analysing their data and at researchers who are interested in conducting research in digital archives but want to improve skills and gain a broader understanding of computational methods in archival research ahead of their research. Track 2 participants will also be invited to give a short presentation in the final course workshop reflecting on what they learnt over the course of the programme.

If you are looking for project ideas, and are interested in exploring the use of automated text recognition systems for either printed or handwritten texts, please see this briefing for our workshop in collaboration with the Transkribus project in April. Track 2 participants are encouraged to consider creating a training dataset and text recognition model using the Transkribus platform. We welcome informal inquiries about potential projects - please get in touch with Dr Anne Alexander (raa43 @ if you have a project idea you would like to discuss. 


Participants must be PhD students or staff at the University and Colleges of Cambridge with the exception of the final workshop which will be open for public booking

Pre-requisites and time commitment

There are no formal pre-requisites, but Track 1 participants are asked to complete the course readings in advance of the relevant sessions. Track 2 participants are asked to confirm their availability for the 2 group sessions and 3 individual mentoring sessions while working on their project, in addition to attending the final programme workshop on 15 June. Graduate students at Cambridge who would benefit from a more in-depth training course in working with literary corpora using Python are encouraged to apply to the Literary Critical Coding course directed by Dr Ewan Jones (Faculty of English) which will be running in parallel with Machine Reading the Archive (more details below).

Application process

Applicants for Track 1 should complete the online Programme Registration form Track 1.


Further sessions will be added to this programme in April and May, including dates for optional field trips.

Date Time Session
20 Feb 2017 12.30 - 2.30pm Group meeting for Track 2 participants
21 Feb 2017 12 - 1.30pm Reading group session
28 Feb 2017 12 - 1.30pm Reading group session
14 Mar 2017 12 - 1.30pm Reading group session
20 Mar 2017 11.30 - 4pm Born-Digital archives workshop (optional)
22 Mar 2017 1.30 - 3pm Under the hood of the digital collection
25 April 2017 11.30 - 3.30pm Automated Text Recognition workshop
27 April 2017 1-3pm Group meeting for Track 2 participants
2 May 2017 1-3pm Network Analysis in the Digital Archive
15 May 2017 11.30am - 12.45pm Field trip to Digital Content Unit, University Library
22 May 2017 TBC Field trip to Cambridgeshire County Archives
6 Jun 2017 1-3pm Group meeting for Track 2 participants
15 Jun 2017 11.30 - 4pm Final programme workshop

Literary Critical Coding

During Lent and Easter Terms, the Faculty of English will be running a course entitled Literary Critical Coding, for which applications are now welcomed. These weekly sessions introduce a range of computational resources that can then be contrasted (or brought into tension) with more traditional readerly practices. Participants will learn how to: identify and prepare relevant corpora and datasets; apply a variety of different analytic tools to those corpora (topic models, semantic analysis); visualise data in several ways; and learn the basics of coding (Python). There will also be the opportunity for broader discussion regarding the history, current scope and future prospects of the digital humanities. The course is free of charge. Interested students should send a brief (300 word) statement to Ewan Jones (, outlining how such a course might relate to their current research. Both current PhD and MPhil students are encouraged to apply. Please note this course is only open to current graduate students at the University of Cambridge.

We are a network of researchers at the University of Cambridge who are interested in how the use of digital tools is transforming scholarship in the humanities and social sciences. This transformation spans both the content and practice of humanities research, as the diffusion of digital technologies opens up new fields of study and generates research questions which breach traditional disciplinary boundaries.

RSS Feed Latest news

Casebooks Exhibition: Six contemporary artists and an extraordinary medical archive

Feb 22, 2017

Ambika P3 and the Casebooks Project at the University of Cambridge present CASEBOOKS, a major exhibition engaging with one of the largest surviving sets of medical records in history.

Reminder - Project applications for Machine Reading the Archive closing soon

Jan 30, 2017

Machine Reading the Archive – A digital methods programme Reminder: Applications for projects close on 6 February (Track 2) Limited places available

View all news