Quote:
Originally Posted by kundor
I'm using k2pdfopt to convert a large mathematical text. On the Tesseract download page, I noticed a file "tesseract-ocr-3.02.equ.tar.gz" which says it's a "Math / equation detection module for Tesseract 3.02." This sounds like it would help to OCR the math part correctly. The majority of the text is English. Is there some way to get the OCR engine to use this, in combination with the English training data?
|
I have no idea on this one--I suppose I need to do some homework on Tesseract and if there is a way to use multiple training files. You might try using the math training file and just see what you get.