I'm having some trouble converting a pdf and I was hoping I could get some advice. My goal is to turn a pdf with varying 2-column and 1-column text blocks, into a single column .epub. My thought process was to first run the pdf through k2pdfopt to generate the ocr correctly, in a single column, then run it through calibre.
I'm using k2pdfopt in terminal, on Arch Linux and I have Tesseract setup correctly. Here are my arguments:
Code:
-m 0.1in,0.8in,0.1in,0.2in -ocr t -ocrhmax .4 -ocrvis t -n- -wrap- -ws -.5 inmemoriarichar00kirk.pdf
Attached is the original pdf and the output that I'm getting.
Basically, the ocr font looks very squished and distorted, and when I run it through calibre, it's treating the work gaps as new <p>.
Thanks!