hi all,
wasn't sure where this sort of question should be posted, so I'm posting here...
I bought an ebook from amazon, and noticed it had pretty bad formatting errors that was making the book very difficult to read on my kindle, so I thought I'd try to fix it myself.
but when I used Calibre to convert azw3 to ePub, I saw this horrible coding where almost every single word has its own class/span/etc!!!! and I can't make heads or tails of removing and fixing weird indentations and margin problems. (I think I can fix the pagination problems, but that's minor compared to what I saw below)
here's an excerpt of the code. the entire book is like this:
<p class="block_21">“How<span class="text_14"> </span>can<span class="text_14"> </span>I<span class="text_14"> </span>persuade<span class="text_14"> </span>you<span class="text_14"> </span>that<span class="text_14"> </span>I<span class="text_14"> </span>mean<span class="text_14"> </span>you<span class="text_14"> </span>no<span class="text_14"> </span>harm?”<span class="text_14"> </span>he<span class="text_14"> </span>asked.<span class="text_14"> </span>“I<span class="text_14"> </span>swear to you that I will do nothing to you.”</p>
<p class="block_22">“Will<span class="text_18"> </span>you<span class="text_18"> </span>swear<span class="text_18"> </span>by<span class="text_18"> </span>the<span class="text_18"> </span>Blessed<span class="text_18"> </span>Virgin<span class="text_18"> </span>Mary?”<span class="text_18"> </span>she<span class="text_18"> </span>asked<span class="text_18"> </span>disbelievingly. “I swear it.”</p>
Each paragraph has its own block_# with its own Class="text_#" on almost every single word in the paragraph. (and the block# and text# are different pairing for the paragraphs...)
I took a peek at the original azw3 file, and it is just as bad. So azw-->epub conversion didn't do this. it's the horrible amazon encoding...
is there anyway to clean up the mess like above that will strip most of the junk to something resembling sane text file that I can fix the incorrect pagination, margins, and linefeeds? if I have to manually delete these things, it will be faster if I retype the entire book from scratch. I have the paperback copy of the book as well as the eBook version, so... I have a reference what it's SUPPOSED to look like..
I'm hoping there are ways to export to some other format that can be re-converted back into a simpler ePub doc with excessive use of <class> stripped... (I'm a novice and can edit existing ePub file, but won't know where to start if I have to start from scratch..)
I do use Sigil for minor cleaning up texts/typos and some formatting, etc, and use Calibre on MacOS... I'm generally computer savvy, but far from expert on stuff like this.
any help is appreciated...