What is OCR

Definition

Optical character recognition (OCR) text is the text that OCR software has translated from the original newspaper page to online format. It is electronically scanned in and has not been manually reviewed or corrected. OCR makes it possible to search large quantities of full text information but it is not 100% accurate. The accuracy depends on a variety of factors: condition of the original newspaper or microfilm, quality of the paper, size and style of the font and column layouts, for example.

How accurate is the OCR text in the articles?

We have scanned the original newspaper pages using high quality scanners and an optical character recognition (OCR) process which converts the printed text to electronic text. Both these processes produce the most accurate results possible; however, it is inevitable that some errors slip through. The quality of the original newspaper affects the outcome and accuracy of the OCR scanning process. A range of factors are taken into account, including:

Highly complex layout
Radical differences in layout over time
Variable font sizes and character types (especially Gothic)
Narrow space between lines
Narrow gutter between columns
Missing or misprinted text
Poor quality or deteriorated inks
Poor quality or deteriorated papers
Irregular alignment of characters in hand-set press
Annotations by hand
Graphic devices and/or elements

What to do when you notice a mistake in the OCR text?

When you have selected an article on a page in the viewer, you can view the original OCR text. As the Optical Character Recognition text is electronically translated, there are often errors. You can fix any errors line by line to make the necessary corrections. Using Microsoft Word is an ideal solution to find and fix errors.