Quote:
Originally Posted by MaxStirner
Sorry to bother you again Wilus but maybe you remeber my question about multilanguage support. Yesterday I was perusing through Tesserract google group without any speciffic reason and suddenly stumbled accross this post
https://groups.google.com/forum/#!ms...I/QMMHDV_GWRIJ
Don't know if this is of any help to you but just in case..
|
Tesseract's dual language OCR actually seems to work in k2pdfopt v1.66, though not very well at all in my test case, where I mixed English and Chinese. I used this command:
k2pdfopt -ocr dual_english_chinese.pdf -mode copy -ocrlang language
where I substituted different values for
language:
eng,
chi_tra,
chi_tra+eng, and
eng+chi_tra. See the attached files. The best results, by far, were using only
chi_tra alone, which sort of defeats the purpose of dual language OCR(!), but each result was different, so I am assuming that the actual mechanism of passing
lang1+
lang2 to Tesseract is working and that this was just a particularly poor case for Tesseract. Maybe mixed European languages will work better?