MobileRead Forums - View Single Post - Optimize PDFs from archive.org for E-Ink devices

willus · 02-28-2020, 10:48 PM

Quote:

Originally Posted by ctop

Wow, this looks really great, exactly what I had in mind! Awesome! One question though, the file you created has the page breaks at different places than the original, which is astonishing. What is the reason for this?

And one more question, since I like to highlight things in my PDFs, is the text layer the same as before, or does k2pdfopt do its own OCR?

All the best,

Ctop

The default behavior of k2pdfopt in "fitwidth" mode is to concatenate pages as it fits them into the converted PDF, and it disregards page breaks in the source document. You can add the -bp option to force a page break in the converted document wherever there is a page break in the source. There are other options that are better if you prefer to have a 1-to-1 source page to converted page correlation. The k2pdfopt options are documented here.

By default, k2pdfopt keeps the OCR layer from the source PDF, but it can also do its own OCR.