MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 08-17-2013, 09:25 PM

Quote:

Originally Posted by MaxStirner

Sorry to bother you again Wilus but maybe you remeber my question about multilanguage support. Yesterday I was perusing through Tesserract google group without any speciffic reason and suddenly stumbled accross this post
https://groups.google.com/forum/#!ms...I/QMMHDV_GWRIJ
Don't know if this is of any help to you but just in case..

Tesseract's dual language OCR actually seems to work in k2pdfopt v1.66, though not very well at all in my test case, where I mixed English and Chinese. I used this command:

k2pdfopt -ocr dual_english_chinese.pdf -mode copy -ocrlang language

where I substituted different values for language: eng, chi_tra, chi_tra+eng, and eng+chi_tra. See the attached files. The best results, by far, were using only chi_tra alone, which sort of defeats the purpose of dual language OCR(!), but each result was different, so I am assuming that the actual mechanism of passing lang1+lang2 to Tesseract is working and that this was just a particularly poor case for Tesseract. Maybe mixed European languages will work better?