Quote:
Originally Posted by theducks
INHO what is also important is the ORDER you fix them. If you don't get it right, the next fix (or join) will be more difficult
|
Yep! And this is why you should try to normalize/clean the code as much as possible FIRST.
For example, here is some hideous code right out of an InDesign EPUB:
Quote:
<p class="body-text" xml:lang="en-us"><span class="no-style-override-5">The point is, as we can readily see, the ability to</span> <span class="no-style-override-4">foresee</span> <span class="no-style-override-5">an event is not at all equivalent to</span> <span class="no-style-override-4">agreeing</span> <span class="no-style-override-5">to it. Yes, I can full well</span> <span class="no-style-override-4">predict</span> <span class="no-style-override-5">that if I move to the South Bronx, I’ll likely be victimized by street crime. But this is not at</span> <span class="no-style-override-4">all</span> <span class="no-style-override-5">the same thing as</span> <span class="no-style-override-4">acquiescing</span> <span class="no-style-override-5">in such nefarious activities. Yet, according to the “libertarian” argument we are considering, the two are indistinguishable.</span></p>
|
First thing I do is go through the code and strip it down to this:
Quote:
<p>The point is, as we can readily see, the ability to <i>foresee</i> an event is not at all equivalent to <i>agreeing</i> to it. Yes, I can full well <i>predict</i> that if I move to the South Bronx, I’ll likely be victimized by street crime. But this is not at <i>all</i> the same thing as <i>acquiescing</i> in such nefarious activities. Yet, according to the “libertarian” argument we are considering, the two are indistinguishable.</p>
|
and then it makes it much easier to do later fixes.
Diap's Editing Toolbag is great for cleaning up code:
https://www.mobileread.com/forums/sho....php?p=2980740
It is also great for helping get rid of a ton of the useless classes (<span class="no-style-override-5">), or changing certain tags into other tags (<span class="no-style-override-4"> -> <i>).
Each book is different, so you can't just have a big list of "Regexes to clean page numbers" that you can run on Book A + Book B + [...] + Book Z.
And with Calibre conversion code on top of this... the calibre# classes are completely different in each EPUB:
- calibre2 in Book A might be the page numbers
- calibre2 in Book B might be italics
- [...]
- calibre2 in Book Z might be headings
Quote:
Originally Posted by theducks
And all of the above in the same book (OCR of scan)
[...]
I remove all Page Header type (Section/Title or Author) With a page number first (this is more than 1 template as there are right - left side variations)
|
Headers/Footers in the actual text? Ouch. I haven't run across that one in quite a few years. What tools are being used to create that? I know Finereader does a pretty great job at ignoring Headers/Footers, and never exporting them in the first place.