MobileRead Forums - View Single Post

shalym · 10-07-2018, 07:58 PM

Quote:

Originally Posted by sealbeater

Sorry for taking so long to respond.

I found your pdf samples very interesting. I've never before seen a pdf with both images and txt in the wild. Interestingly, my normal go to "pdfimages", didn't work on any of them. It was only when I extracted to xml using pdftohtml that I thought any of them had images at all.

Anyway, here's my point. If I have the images, why would I bother to OCR or covert them to text? I have the images. From what I understand, EPUB is just compressed HTML. Why couldn't I just strip the images and reference them in HTML and compress them?

You could...but then you couldn't change the font, or the font size, or use any of the other functions of epub. In other words, you may as well just leave it in pdf format.

Shari