The Discovery and Early Development of Insulin

Technical Information

The objective of the Discovery and Development of Insulin project was to digitize and provide web access to a collection of approximately 7,000 images.

The project implementation can be divided into four separate processes:

  • Scanning
  • Optical Character Recognition (OCR) / Transcription
  • Metadata creation
  • Database creation and web access

Scanning

The scanning was done by Preservation Services, University of Toronto Library, on the Eyelike digital camera system V 3.04. The master images have been captured to the following standard:

File Format:

TIFF

Scanning Bit Depth:

8-bit per channel of colour information (24 per pixel)

Compression:

Uncompressed

Digital Resolution:

600 dpi

Since the chosen standard for archiving generates large files (average file size 48 MB), the project team chose to convert the images to JPGs for online delivery. The site provides users with 4 sizes of JPGs -- a thumbnail for quick reference and three varying sizes for examination and usage. In order to balance the onscreen quality with the overall size of the download, JPG images were created at a medium compression level.

Optical Character Recognition (OCR) and Transcription

The Insulin project team chose to perform OCR on the page images to allow for full-text searching and thus optimise the database for online access. OCR was done by Preservation Services using the Prime Recognition software package. For columnar text, Abbyy FineReader 5.0 Office was used to produce the OCR.

Materials that could not be processed through OCR software such as manuscript letters and writings, notebooks, and many of the newspaper clippings have been transcribed by retyping page text. Transcribed files and the files processed by OCR software have been saved as ASCII text files and are used for full-text indexing and searching.

Metadata

The structural metadata was captured in a relational database during the scanning process. The structural metadata tables contain information about the pagination of the document, correlation between filename and page numbers, features of the document as well as particular comments about the quality of the original material.

The descriptive metadata was also saved in a relational database format. The descriptive metadata contains fields such as title, author, extent, subject and others that facilitate in the discovery and retrieval of information. During the 2017 migration of the Insulin site, this descriptive metadata was converted to MODS.

Database creation and Web access

This digital collection was initially built using ColdFusion and then migrated to Islandora 7 in 2017. Click here to view a web archived copy of the original ColdFusion site. In 2024 the digital collection was migrated from Islandora 7 to the University of Toronto's homegrown Collections U of T Platform. Click here to view a web archived copy of the original Islandora site.