MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

polarisrising · 11-04-2018, 01:15 PM

I'm having some trouble converting a pdf and I was hoping I could get some advice. My goal is to turn a pdf with varying 2-column and 1-column text blocks, into a single column .epub. My thought process was to first run the pdf through k2pdfopt to generate the ocr correctly, in a single column, then run it through calibre.

I'm using k2pdfopt in terminal, on Arch Linux and I have Tesseract setup correctly. Here are my arguments:

Code:

-m 0.1in,0.8in,0.1in,0.2in -ocr t -ocrhmax .4 -ocrvis t -n- -wrap- -ws -.5 inmemoriarichar00kirk.pdf

Attached is the original pdf and the output that I'm getting.

Basically, the ocr font looks very squished and distorted, and when I run it through calibre, it's treating the work gaps as new <p>.

Thanks!