What is OCR
Definition
Optical character recognition (OCR) text is the text that OCR software has translated from the original newspaper page to online format. It is electronically scanned in and has not been manually reviewed or corrected. OCR makes it possible to search large quantities of full text information but it is not 100% accurate. The accuracy depends on a variety of factors: condition of the original newspaper or microfilm, quality of the paper, size and style of the font and column layouts, for example.
How accurate is the OCR text in the articles?
We have scanned the original newspaper pages using high quality scanners
and an optical character recognition (OCR) process which converts the
printed text to electronic text. Both these processes produce the most
accurate results possible; however, it is inevitable that some errors
slip through. The quality of the original newspaper affects
the outcome and accuracy of the OCR scanning process. A range
of factors are taken into account, including:
Highly complex layout
Radical differences in layout over time
Variable font sizes and character types (especially
Gothic)
Narrow space between lines
Narrow gutter between columns
Missing or misprinted text
Poor quality or deteriorated inks
Poor quality or deteriorated papers
Irregular alignment of characters in hand-set press
Annotations by hand
Graphic devices and/or elements
What to do when you notice a mistake in the OCR text?
When you have selected an article on a page in the viewer, you can view the original OCR text. As the Optical Character Recognition text is electronically translated, there are often errors. You can fix any errors line by line to make the necessary corrections. Using Microsoft Word is an ideal solution to find and fix errors.