4 March 2017: A session with Tesseract

From Noisebridge
Revision as of 15:50, 5 March 2017 by Plausible deniability (talk | contribs) (Created page with "=== Experiments === * installed tesseract-ocr via homebrew onto the mac mini attached to the book scanner * Took a book page image from the scanner (using scan.py, which stil...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Experiments

  • installed tesseract-ocr via homebrew onto the mac mini attached to the book scanner
  • Took a book page image from the scanner (using scan.py, which still works), and ran it through tesseract to see what it would produce.
  • We made three attempts-
    1. Original uncropped page image (img00001.jpg). Tesseract produced total gibberish (see out0.txt)

Example.jpg