View Single Post
Old 11-22-2018, 08:35 AM   #1623
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by gg4u View Post
oh thank you Willus,

keeping eng.file only will free up some space on disk.

Would you suggest hpw to make best use of k2pdfopt ?

I'd like to reflow a pdf - of scanned images - in a epub containins figures, and chapters.

k2pdfopt seems to detect where images are, I processed the original pdf into OCRed version, and characters are blurred.

I tried to make comparison by using ghostscript and tesseract:
from pdf to tiff, from tiff to txt.

Here, results where quite good but I miss all the figures and markup for chapters.

As final result for written text, I would like to have epub or mobi (sharp rendering of chars) , not pdf , but yet with the figures - and TOC .

Maybe is there another file but txt, that tessearct export to and that will keep images (RTF)?

I could eventually manually mark the TOC - which is correct markup?

What kind of steps should I take to convert pdf in epub containing images and markup ?

I also shared this thread https://www.mobileread.com/forums/sh...d.php?t=312652


Can I also ask you how you approached the problem to be able detect figures in PDF - interested in problem solving
I've collected together my ideas on converting PDFs onto a web page that I've had up for a while. No magic solution, other than maybe trying MS Word if you have access to it. I don't really detect figures--just places where I don't see gaps that would occur between normal rows of text. If I don't find a gap for more than a given span (~1.5 inches), I consider that a figure. Very simplistic.
willus is offline   Reply With Quote