Quote:
Originally Posted by roger64
@willus
Thanks for your explanations and patience...
So I set up TESSDATA_PREFIX in /etc/environment and resumed testing. I thought I had succeeded, but...
Please, look at the joint files: have you any idea about what went wrong? In the file "exemple", you'll find a copy of the terminal commands I used to process Parquin.pdf.
I can search the text from the _k2opt file, but does not know how to select or extract text. Is this normal?
|
You ran OCR correctly with Tesseract, but: a couple things--first off, you don't need to do OCR. The original document already has selectable text. Second, both documents you attached allow me to select the text with my PDF viewer--Sumatra PDF running on Windows 10.
Note that there's a bug in k2pdfopt for how it does the selection sizes of the French accented "a". This will be resolved in the next release, which I hope to get out reasonably soon.