View Single Post
Old 05-13-2013, 04:00 PM   #423
kundor
Junior Member
kundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toyskundor shares his or her toys
 
Posts: 5
Karma: 5998
Join Date: Oct 2011
Device: Kindle 3
Thanks for answering. On reflection, I guess I'm unlikely to search for a formula, rather than text, so it doesn't really matter much. I also learned that because of Tesseract's linear design, it can't handle a lot of math notation (fractions, radicals, superscripts, subscripts, matrices, cases...) regardless of training data.

By the way, it takes about 12 hours to OCR this document, which seems kind of silly when there is already a hidden text layer. Since it includes the location data, it seems like it might be possible to keep track of which words go with each chunk while you're slicing up the pages. Have you considered doing that?
kundor is offline   Reply With Quote