View Single Post
Old 11-04-2014, 02:39 AM   #1
sherman
Guru
sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.
 
Posts: 876
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
Some ebook creators just can't be helped...

So I'm not really sure if this is the most appropriate place to put this. Anyway...

So, I got a book from amazon, imported into Calibre, then converted to epub as you do. The first sign that something was amiss was that it took calibre over two and a half minutes to convert the book. Normal conversions only take around 10s or so. Naturally I wanted to see what the problem was, so I opened the book in the Editor.

I let loose a string of expletives.

Then I let loose a few more for good measure.

I have then gone and opened the original azw3 in the ebook editor.

The editor reports that the size of the HTML file is 6.5 Megabytes . This is not the bible mind you, just a run of the mill length novel.

So, what does the body of the text look like? Here is a very small sample:
Code:
<p class="MsoNormal" style="text-align:justify;text-indent:.25in"><span style="font-size:0.92rem">&quot;<span style="letter-spacing:-.05pt">W</span>h<span style="letter-spacing:-.05pt">a</span>t<span style="letter-spacing:1.4pt"> </span><span style="letter-spacing:-.1pt">s</span><span style="letter-spacing:.05pt">a</span>y<span style="letter-spacing:1.25pt"> </span><span style="letter-spacing:-.3pt">y</span><span style="letter-spacing:.05pt">o</span>u<span style="letter-spacing:.05pt">?</span>&quot;<span style="letter-spacing:1.5pt"> </span>
That's just a small portion of a paragraph.

This happens throughout the entire book.


Sorry about the rant, I just felt I had to get it out.


EDIT: This bit of regex solved the problem...

Code:
<span style="letter-spacing[^>]+>([^<]+)</span>

Last edited by sherman; 11-04-2014 at 02:46 AM.
sherman is offline   Reply With Quote