Originally Posted by gadd
i am having difficulty to convert a scanned two page in a row file into k2opt format. i have tried -ocr command but the result was ineffective. which command lines would you suggest to get a solid version of this pdf for kindle device?
p.s. i attach those files to the post.
This is a tough one because the scan isn't great quality. Try both the commands below in succession. (I've re-named your file to be more manageable on the command-line):
k2pdfopt -mode copy -n -grid 2x1x1.5 -w 1t -h 1t century20.pdf -o temp.pdf
k2pdfopt -dev kpw -as temp.pdf -m 0.4,0.2,0.4,0.2 -de 1.5 -gtr .015 -o century20_k2pdfopt.pdf
The first command splits the book into a one-page-per-page temp file so that each page can be auto-straightened. The second command processes the new temp file to create the final output. If you want to turn on OCR, Tesseract does a decent job of OCR-ing the temp file if you have it installed. Just add -ocr
to the second command. Here's what some of the other options do (a complete list of command-line options is here
sets for paperwhite. You can leave that off if you have an older kindle.
will auto-straighten each page, which will help k2pdfopt break up the rows correctly.
will ignore 0.4 inches on the sides and 0.2 inches on the top and bottom of the temp file. This will chop off some of the unwanted marks in the margins that are keeping k2pdfopt from re-flowing the document correctly.
sets the defect size to 1.5 points so that little marks that are up to 1.5 points in size will be ignored (the scan quality is poor and you have lots of these).
will make k2pdfopt a little more aggressive than normal in breaking lines of text since some of them are pretty close together.
Of course, you can try adjusting any or all of the options above--I only had four pages of your book to work with, so they might no be tuned quite correctly. I wasn't able to get a perfect result, but it's an improvement over what you got, I think.