MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 11-22-2018, 08:35 AM

Quote:

Originally Posted by gg4u

oh thank you Willus,

keeping eng.file only will free up some space on disk.

Would you suggest hpw to make best use of k2pdfopt ?

I'd like to reflow a pdf - of scanned images - in a epub containins figures, and chapters.

k2pdfopt seems to detect where images are, I processed the original pdf into OCRed version, and characters are blurred.

I tried to make comparison by using ghostscript and tesseract:
from pdf to tiff, from tiff to txt.

Here, results where quite good but I miss all the figures and markup for chapters.

As final result for written text, I would like to have epub or mobi (sharp rendering of chars) , not pdf , but yet with the figures - and TOC .

Maybe is there another file but txt, that tessearct export to and that will keep images (RTF)?

I could eventually manually mark the TOC - which is correct markup?

What kind of steps should I take to convert pdf in epub containing images and markup ?

I also shared this thread https://www.mobileread.com/forums/sh...d.php?t=312652

Can I also ask you how you approached the problem to be able detect figures in PDF - interested in problem solving

I've collected together my ideas on converting PDFs onto a web page that I've had up for a while. No magic solution, other than maybe trying MS Word if you have access to it. I don't really detect figures--just places where I don't see gaps that would occur between normal rows of text. If I don't find a gap for more than a given span (~1.5 inches), I consider that a figure. Very simplistic.