Quote:
Originally Posted by gg4u
oh thank you Willus,
keeping eng.file only will free up some space on disk.
Would you suggest hpw to make best use of k2pdfopt ?
I'd like to reflow a pdf - of scanned images - in a epub containins figures, and chapters.
k2pdfopt seems to detect where images are, I processed the original pdf into OCRed version, and characters are blurred.
I tried to make comparison by using ghostscript and tesseract:
from pdf to tiff, from tiff to txt.
Here, results where quite good but I miss all the figures and markup for chapters.
As final result for written text, I would like to have epub or mobi (sharp rendering of chars) , not pdf , but yet with the figures - and TOC .
Maybe is there another file but txt, that tessearct export to and that will keep images (RTF)?
I could eventually manually mark the TOC - which is correct markup?
What kind of steps should I take to convert pdf in epub containing images and markup ?
I also shared this thread https://www.mobileread.com/forums/sh...d.php?t=312652
Can I also ask you how you approached the problem to be able detect figures in PDF - interested in problem solving 
|
I've collected together my ideas on converting PDFs onto a
web page that I've had up for a while. No magic solution, other than maybe trying MS Word if you have access to it. I don't really detect figures--just places where I don't see gaps that would occur between normal rows of text. If I don't find a gap for more than a given span (~1.5 inches), I consider that a figure. Very simplistic.