View Single Post
Old 10-07-2018, 07:58 PM   #92
shalym
Wizard
shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.shalym ought to be getting tired of karma fortunes by now.
 
shalym's Avatar
 
Posts: 3,058
Karma: 54671821
Join Date: Feb 2012
Location: New England
Device: PW 1, 2, 3, Voyage, Oasis 2 & 3, Fires, Aura HD, iPad
Quote:
Originally Posted by sealbeater View Post
Sorry for taking so long to respond.

I found your pdf samples very interesting. I've never before seen a pdf with both images and txt in the wild. Interestingly, my normal go to "pdfimages", didn't work on any of them. It was only when I extracted to xml using pdftohtml that I thought any of them had images at all.


Anyway, here's my point. If I have the images, why would I bother to OCR or covert them to text? I have the images. From what I understand, EPUB is just compressed HTML. Why couldn't I just strip the images and reference them in HTML and compress them?
You could...but then you couldn't change the font, or the font size, or use any of the other functions of epub. In other words, you may as well just leave it in pdf format.

Shari
shalym is offline   Reply With Quote