Quote:
Originally Posted by MarjaE
Thanks.
I finally got Tesseract working in English, but still can't get it working in the other languages I've downloaded. -ocr lan[guage] ignores the lan[guage] and does English. -ocrlang lan[guage] skips ocr.
|
I just tried this (in Windows 10):
k2pdfopt -ocrlang chi_tra -ocr t mydoc.pdf
And it seems to work, telling me it selected Chinese. Are you sure you have the other language training files in place? Here is how my Tesseract Data folder looks:
Code:
DATE TIME SIZE FILE
08/23/11 02:13p 139 rus.cube.fold
08/23/11 02:13p 317 rus.cube.params
08/23/11 02:13p 912,800 rus.cube.nn
08/23/11 02:13p 278 rus.cube.lm
08/23/11 02:13p 7,064,074 rus.cube.word-freq
08/23/11 02:13p 15,241,687 rus.cube.size
10/08/12 03:42p 15,636,141 rus.traineddata
10/16/12 01:00p 39,973,777 chi_sim.traineddata
10/16/12 01:00p 54,349,418 chi_tra.traineddata
10/17/12 07:55a 254 eng.cube.params
10/17/12 07:55a 857,304 eng.cube.nn
10/17/12 07:55a 171,918 eng.cube.bigrams
10/17/12 07:55a 181 eng.cube.lm
10/17/12 07:55a 996 eng.tesseract_cube.nn
10/17/12 07:55a 2,444,187 eng.cube.word-freq
10/17/12 07:55a 13,020,078 eng.cube.size
10/17/12 07:55a 38 eng.cube.fold
09/01/13 12:25p 21,876,572 eng.traineddata