MobileRead Forums - View Single Post

willus · 06-10-2020, 11:31 PM

Quote:

Originally Posted by roger64

@willus

Thanks for your explanations and patience...

So I set up TESSDATA_PREFIX in /etc/environment and resumed testing. I thought I had succeeded, but...

Please, look at the joint files: have you any idea about what went wrong? In the file "exemple", you'll find a copy of the terminal commands I used to process Parquin.pdf.

I can search the text from the _k2opt file, but does not know how to select or extract text. Is this normal?

You ran OCR correctly with Tesseract, but: a couple things--first off, you don't need to do OCR. The original document already has selectable text. Second, both documents you attached allow me to select the text with my PDF viewer--Sumatra PDF running on Windows 10.

Note that there's a bug in k2pdfopt for how it does the selection sizes of the French accented "a". This will be resolved in the next release, which I hope to get out reasonably soon.