skip to primary navigationskip to content

Machine Reading the Archive - invitation to register interest

A digital methods development programme organised by Cambridge Digital Humanities Network, Cambridge Big Data and the Cambridge Digital History Programme. Register your interest below.

Online applications now open

Machine Reading the Archive aims to bring humanities researchers, archivists and computer scientists together to explore the challenges of working with archives in the digital age. Through a series of reading group sessions, practical workshops, technical demonstrations, field trips and a one day end-of-programme workshop, we hope to seed new collaborations and encourage the exchange of ideas and practices across professions and disciplines.

The programme is born out of a recognition that the practice of making, curating and using archives has been changed by the adoption of digital technologies, at both an institutional and individual level. Archives and library special collections are developing new roles as platforms for different kinds of data, held in a variety of formats from xml, to pdfs and tiffs, rather than physical containers for people, books and documents. Many researchers return from visits to the archive (or the archive’s website) having filled hard drives with collections of digital photographs of rare books, documents, manuscripts, maps, pictures and objects of scholarly interest whose fragility and immobility required the production of a digital copy. The digital archive thus seeds new private sub-collections on researchers’ laptops and tablets, at times a promising and overwhelmingly rich resources and at other times remaining invisible and inaccessible; while growing in scale and complexity over the trajectory of a scholarly life.

The primary aim of Machine Reading the Archive is to help participants develop a deeper understanding of the challenges and possibilities of working with archival data in the digital age, drawing on theory, methods and practice from the humanities, computer science and the archival profession. The program provides a chance to develop skills to engage with existing digital archives in new ways, to turn a cluttered hard drive of archival photographs into a refined dataset or to embark on the mission of text-mining to reveal new aspects in existing research or lay the groundwork for prospective projects.

In addition to providing participants the chance to learn practical skills and experiment with digital methods using their own or provided datasets, the framework of the course is designed to ignite reflection on the significance of the ways private and institutional digital archives are sorted, structured and accessed and to discuss how these insular knowledge infrastructures impact and influence writing, thinking and the development of research projects.

Joining the programme

Participants can follow the programme through two tracks: Track 1 is by participation and designed for those with punctual interests. Track 2 is by project and will require a larger commitment to the programme. We plan to offer Track 2 participants a series of group and individual mentoring / peer-to-peer learning sessions to support them as they build their own project developing and using archive data. Track 2 is directed both at researchers who have already collected archival materials in digital form and want to find out about new ways of analysing their data and at researchers who are interested in conducting research in digital archives but want to improve skills and gain a broader understanding of computational methods in archival research ahead of their research. Track 2 participants will also be invited to give a short presentation in the final course workshop reflecting on what they learnt over the course of the programme.


Participants must be PhD students or staff at the University and Colleges of Cambridge with the exception of the final workshop which will be open for public booking

Pre-requisites and time commitment

There are no formal pre-requisites, but Track 1 participants are asked to complete the course readings in advance of the relevant sessions. Track 2 participants are asked to confirm their availability for the 2 group sessions and 3 individual mentoring sessions while working on their project (indicative timings for the group sessions are in the schedule below – individual sessions should be arranged with your project mentor). Graduate students at Cambridge who would benefit from a more in-depth training course in working with literary corpora using Python are encouraged to apply to the Literary Critical Coding course directed by Dr Ewan Jones (Faculty of English) which will be running in parallel with Machine Reading the Archive (more details below).

Please note, the timings and content of the schedule below are subject to change – firm dates and instructions on how to apply for a place on the programme will be sent out on 16 January 2017. Expressing your interest in the programme does not commit you to attending, nor does it guarantee you a place.

Register your interest online here:

w/c 20/02

Group meeting for Track 2 participants

w/c 20/02

Reading group session 1

w/c 27/02

Reading group session 2

w/c 06/03

Reading group session 3

w/c 13/03

Introductory session:

under the hood of the digital collection


Born-digital archives workshop

(optional, limited spaces available)

w/c 20/03

Introductory session - data-cleaning with OpenRefine

w/c 24/04

Group meeting for Track 2 participants

w/c 24/04

Technical demonstration 1

(Automated text recognition: from print to handwriting)

w/c 01/05

Technical demonstration 2 (Topic modelling)

w/c 08/05

Technical demonstration 3 (Network Analysis)

w/c 15 May

Field trip 1

w/c 22 May

Field trip 2

w/c 5 June

Group meeting for Track 2 participants

w/c 12 June

End of programme workshop

Literary Critical Coding

During Lent and Easter Terms, the Faculty of English will be running a course entitled Literary Critical Coding, for which applications are now welcomed. These weekly sessions introduce a range of computational resources that can then be contrasted (or brought into tension) with more traditional readerly practices. Participants will learn how to: identify and prepare relevant corpora and datasets; apply a variety of different analytic tools to those corpora (topic models, semantic analysis); visualise data in several ways; and learn the basics of coding (Python). There will also be the opportunity for broader discussion regarding the history, current scope and future prospects of the digital humanities. The course is free of charge. Interested students should send a brief (300 word) statement to Ewan Jones (, outlining how such a course might relate to their current research. Both current PhD and MPhil students are encouraged to apply. Please note this course is only open to current graduate students at the University of Cambridge.


We are a network of researchers at the University of Cambridge who are interested in how the use of digital tools is transforming scholarship in the humanities and social sciences. This transformation spans both the content and practice of humanities research, as the diffusion of digital technologies opens up new fields of study and generates research questions which breach traditional disciplinary boundaries.

RSS Feed Latest news

New project aims to support text and data-mining research in Cambridge

Jan 29, 2018

Want to explore the possibilities of text and data-mining using the collections of Cambridge University Library and Cambridge University Press? Our new project may be able to help.

Machine Reading the Archive 2017/8 - registration now open

Sep 18, 2017

Register now to join our Machine Reading the Archive programme for 2017/8

View all news