Quote:
Originally Posted by kundor
I'm using k2pdfopt to convert a large mathematical text. On the Tesseract download page, I noticed a file "tesseract-ocr-3.02.equ.tar.gz" which says it's a "Math / equation detection module for Tesseract 3.02." This sounds like it would help to OCR the math part correctly. The majority of the text is English. Is there some way to get the OCR engine to use this, in combination with the English training data?
|
Have you tried out kindlepdfviewer already? it reads djvu and allows fit-to-document-width(hight), fit-to-content-width(hight) in portraite and landscape and two-point cropping.
Reflow also.
https://www.mobileread.com/forums/sho....php?p=2466450
You can also convert djvu to pdf image and then after k2pdfopt use Abbyy Finereader, Acrobat etc. for OCR-ing that k2pdfopt pdf image (in text under image mode).
OCR-ing should take about hour for detailed or half an hour for quick ocr-ing of an average book.
https://www.mobileread.com/forums/sho...&postcount=413