MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 01-16-2018, 10:16 PM

Quote:

Originally Posted by MarjaE

Thanks.

I finally got Tesseract working in English, but still can't get it working in the other languages I've downloaded. -ocr lan[guage] ignores the lan[guage] and does English. -ocrlang lan[guage] skips ocr.

I just tried this (in Windows 10):

k2pdfopt -ocrlang chi_tra -ocr t mydoc.pdf

And it seems to work, telling me it selected Chinese. Are you sure you have the other language training files in place? Here is how my Tesseract Data folder looks:

Code:

DATE      TIME                    SIZE FILE
08/23/11  02:13p                   139 rus.cube.fold
08/23/11  02:13p                   317 rus.cube.params
08/23/11  02:13p               912,800 rus.cube.nn
08/23/11  02:13p                   278 rus.cube.lm
08/23/11  02:13p             7,064,074 rus.cube.word-freq
08/23/11  02:13p            15,241,687 rus.cube.size
10/08/12  03:42p            15,636,141 rus.traineddata
10/16/12  01:00p            39,973,777 chi_sim.traineddata
10/16/12  01:00p            54,349,418 chi_tra.traineddata
10/17/12  07:55a                   254 eng.cube.params
10/17/12  07:55a               857,304 eng.cube.nn
10/17/12  07:55a               171,918 eng.cube.bigrams
10/17/12  07:55a                   181 eng.cube.lm
10/17/12  07:55a                   996 eng.tesseract_cube.nn
10/17/12  07:55a             2,444,187 eng.cube.word-freq
10/17/12  07:55a            13,020,078 eng.cube.size
10/17/12  07:55a                    38 eng.cube.fold
09/01/13  12:25p            21,876,572 eng.traineddata