16 March 2017: Tested a Trial Copy of ABBYY FineReader

From Noisebridge
Revision as of 22:17, 16 March 2017 by Plausible deniability (talk | contribs) (Created page with "=== Experiments === * In order to better judge what's possible for OCR, we are sampling both proprietary and open-source softwares * We installed ABBYY FineReader 12.1.x onto...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Experiments

  • In order to better judge what's possible for OCR, we are sampling both proprietary and open-source softwares
  • We installed ABBYY FineReader 12.1.x onto the dorkroom mac mini
  • Asked it to convert the images from the previous experiment with Tesseract
  • It produced a PDF containing the first three images (a limit of their trial version), with the following issues-
* Positive
 * Pages were automatically oriented for English LRTB
 * Pages were automatically straightened
 * It produced indexed, searchable PDF
 * It indexes scientific terminology
 * It recognizes images, tables and diagrams, and paginates them in the resulting file
* Neutral
 * The resulting PDF contains not only text, diagrams and images, it also contains the entire original scanner image
* Negative
 * None of the pages was automatically cropped, so the scanner platen occupies most of the image
 * One of the pages was cropped, by FineReader, but incorrectly (page 3), removing all text content
* Vis-a-vis Tesseract
 *