Thread
:
Can you OCR the images inside of .pdf files?
View Single Post
09-13-2014, 07:30 PM
#
35
rkomar
Wizard
Posts: 3,058
Karma: 18821071
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
It's possible to get more information from tesseract by specifying HTML output (hOCR) rather than straight text. Tesseract is great, but requires a fair bit of work from the user to tune it properly. It's definitely not a point'n'grunt solution.
rkomar
View Public Profile
Find More Posts by rkomar
Track Posts by rkomar via RSS