A friend asked me to clean up his 26 MEGABYTE epub!!! I said alright figuring it was just a bunch of enormous images that could be squished down to a reasonable size and some nasty bloat like:
<div class="paragraph"><p class="p"><span><font class="normal">Blah Blah <span class="italic">BLAH</span><span class="normaltext">!</span> And furthermore <span class="boldfont">Blahdy Blah</span><span class="normaltext">...</span></font></span></p></div>
I could easily make into:
<p>Blah Blah <em>BLAH</em>! And furthermore <strong>Blahdy Blah...</strong></p>
But nooooo......what do I find......
307 .png images of each page of a book!!!
I would certainly HOPE that a "reputable" company would not create such a mess....or maybe it was someones miserable attempt at a fixed layout?? In any case, I told him to get his money back and buy a different copy from a different bookseller...maybe one that was a reasonable <1Mb file.
Soooo.....can we get Sigil to add a button/function to OCR all the selected images and put them in proper html/epub-ese??? Pretty Please???
[/sarcasm]