skip to primary navigationskip to content

Introduction to OCR: tools for turning pdfs into machine-readable data

Doing research in the digital age Introductory digital methods training from Cambridge Digital Humanities
When Jan 23, 2018
from 11:00 AM to 12:30 PM
Where S3, Alison Richard Building
Add event to calendar vCal

Led by Dr Gabe Recchia

Optical character recognition (OCR) is a term used to describe techniques for converting images containing printed or handwritten text into a format that can be searched and analysed computationally. Despite recent advances in OCR technology, OCR tools available to researchers are not always as accurate as one might hope, and are unable to work with handwritten text without significant time investment and significant amounts of source material written in the same hand. Nevertheless, there are several computational tools that can be applied to images and PDFs to enable text mining and to make scanned documents more searchable. This workshop will introduce several such tools along with some practical techniques for using them, and will also highlight OCR and related services offered by the Digital Content Unit at the Cambridge University Library. 

This event is now fully booked.