Quote:
Originally Posted by MarjaE
|
For the first case, you need to specify this option:
-ocrcol 2
E.g. k2pdfopt -mode copy -ocrcol 2 -ocr t myfile.pdf
That will get k2pdfopt to correctly OCR a 2-column document where you only want OCR applied. I've attached an OCR of page 100.
Not sure what's wrong with the antonov documents--I tried OCR-ing in Russian and it did work, though it had a number of mistakes.
k2pdfopt -mode copy -ocrlang rus -ocr t myfile.pdf
You can add the -p option to quickly test just one page of conversion, e.g.
k2pdfopt -mode copy -ocrlang rus -ocr t -p 95 myfile.pdf
I've attached this conversion as well (physical page 95). You might seriously consider getting Office 365 if you do this kind of thing a lot. I loaded one of the Russian volumes into MS Word and it did a remarkably good job converting it to Russian text. I've attached a screen shot of page 90 loaded into MS Word, with some text selected, along with a graphic of the same page directly from the original PDF file.