Infinite Monkeys of Babel – Crowdsourcing for the betterment of OCR language material

  • Wouter Van Hemel National Library of Finland Library Network Services PL 26 (Teollisuuskatu 23) FI-00014 University of Helsinki
  • Jussi-Pekka Hakkarainen National Library of Finland Library Network Services PL 26 (Teollisuuskatu 23) FI-00014 University of Helsinki

Abstract

The OCR editor is the National Library of Finland’s most recent foray into the budding phenomenon of crowd-sourcing. Under the motto of many hands make light work, users can swiftly correct the typical mistakes in OCR scanned text of source materials – often of challenging visual quality – using nothing more than their browser. Improving the quality and availability of the digital text would make it easier to directly study the original sources, and indirectly contribute to other tools depending on accuracy such as word list generators and dictionaries.
Published
2015-06-17