Quote:
Originally Posted by DaveLessnau
EDIT: Tex2002ans kindly provided a lot of information in another thread:
|
Quote:
Originally Posted by JSWolf
KISS is the way to do it.
|
Agreed. I wouldn't use that complex stuff.
Overly convoluted CSS—like "Blitz" or "CSS Resets"—should also be avoided. (In ebooks, those cause WAY more harm than "good".)
Side Note: I took a look at the linked ebook, and don't see anything SUPER egregious though. These things are squeaky clean compared to the horrors I've seen.
Quote:
Originally Posted by DaveLessnau
But, I was hoping for a more "nutshell" version for converting between Standard Ebooks' formats to my everyday formats.
|
No. There isn't.
There's no working around it:
If you're doing an HTML->HTML conversion, you're going to have to:
- rip apart each book.
- figure out what's needed and what's not.
You can throw away most stuff ("Remove Unused CSS"), but beyond that point, you'll have to:
- "surgically" remove/adjust the HTML+CSS.
Why Can't I Just Push a Button And "Fix" All Books?
Like JSWolf said, who knows what the heck publishers might do inside THIS book or how they might handle THIS specific case.
Like a book might have any/all of the following:
- <span class="calibre123">italics</span>
- <span class="italics">italics</span>
- <span class="book-i">italics</span>
- <span class="i">italics</span>
- <em>italics</em>
- <i>italics</i>
- <i class="i">italics</i>
- <i class="italics">italics</i>
or who knows what else!
- - - - -
Side Note: For example, in this
2021 post, I described how I've seen InDesign books with:
Code:
<span class="CharOverride-4">italic</span>
where:
Code:
span.CharOverride-4 {
font-family:"Minion Pro Italic";
font-style:normal; <----- See here. Should say italic.
font-weight:normal;
}
This is an "italic" font, but isn't properly marked as italics!!!
(Hitch explained that it's some InDesign cloud font nonsense.)
If you randomly stripped all fonts + did a conversion, all italics in that book would disappear.
Because, according to the computer, all that text is NORMAL.
- - - - -
Basic Cleanup Method That I Do
1. Toss away most of the junk. (Using "Remove Unused CSS" + the other detailed tricks described in that thread.)
2. Judiciously use:
- Reports
- Sigil: Tools > Reports > Style Classes in HTML Files
- Calibre: Tools > Reports > Style classes
- These show all the classes still left in the book.
- Diap's Editing Toolbag.
- This helps change:
- <span class="i"> -> <i>
- <p class="heading1"> -> <h1>
I scroll through the book, tackling each one, mapping it to:
- Basic, simple, HTML
- + a handful of CSS classes.
Quote:
Originally Posted by DaveLessnau
Also, JSWolf, how did you get rid of the <section> stuff?
|
Another trick I like to do is:
Search: <section>
Replace: <--- [COMPLETELY BLANK]
Yes, it "breaks the HTML", but then I "Mend & Prettify" and let Sigil/Calibre do the cleanup for me. :P
They'll disappear the closing </section> + reindent everything for you.
Quote:
Originally Posted by DaveLessnau
Did you use something like Diap's Editing Toolbag? Or, go through it manually and delete?
|
Big-picture stuff like <section>? You can do the Mend & Prettify trick.
If you are dealing with heavily nested <span>s, definitely use Diap's Editing Toolbag.
- - - - -
You can also do many rounds of:
Search: <span class="useless">
Replace: <span>
and then use Diap's Toolbag to wipe away all empty <span>s in one shot.
- - - - -
Quote:
Originally Posted by Karellen
For your <section search...
Code:
\n\s\s<section data\-parent=".*?" id=".*?" epub:type=".*?">\n
and then replace with nothing
|
Or just a simple:
Search: <section [^>]+>
Replace: <--- [COMPLETELY BLANK]
That will wipe away all <section>s (with extra stuff in them).
What does that regex say, in plain English?
- "Look for a <section"
- "Then keep on going until you hit a closing '>'."
- "Replace with NOTHING."
Quote:
Originally Posted by DaveLessnau
for my purposes, I just need a simply formatted book that I can adjust as needed. I don't need thousands of lines of styling information across scores of classes. In this book, after I was done with it, I ended up with about a dozen:
[...]
And, it was that many only because of the footnotes and a couple of more-specific classes that I added in since they keep recurring in various books.
|

Fantastic.
And I've done over 650+ ebooks, mostly Non-Fiction, and I have everything boiled down to:
- The same basic CSS file I've been using for YEARS.
- + rarely, if ever, minor book-specific tweaks
- A handful of extra classes.
- Like the last book I worked on was a compilation, and had a special "AuthorBio" in the beginning of each chapter.
I've yet to see a book that completely breaks the mold.
Maybe it exists out there, but I'm having a tough time imagining it.
Quote:
Originally Posted by DaveLessnau
|
I don't understand how you could've reached that conclusion from that thread... but okay.
Quote:
Originally Posted by DaveLessnau
Now, if the various readers would hide <aside> stuff as the reference material I find on it says they should, I'd be really happy. But, again, for fiction, that's a pretty minor complaint.
|
Yes, theoretically, EPUB3 readers would hide those <aside>s (some do, like AZARDI).
But reality, almost all don't hide it.
That's why you shift+insert the footnotes at the end-of-file or end-of-book, so they're out of the way.
(Shoving them in the middle of the book is very similar to
people shoving page numbers smack dab in the middle of their text! 
)
Quote:
Originally Posted by DNSB
Oddly, I've been adding header, section, role, type, etc. into epub3 ebooks I edit simply to get ebooks that are close to meeting the DAISY accessibility standards. While I don't miss acessibility features at this time, there are people out there using non-visual devices to consume ebooks and for them, they are rather important.
|

If you use the ARIA markup properly, there is no harm in the extra stuff.
The problem is, most people don't apply it properly. :P
See the fantastic article at:
and their warning:
Quote:
Warning: Many of these widgets were later incorporated into HTML5, and developers should prefer using the correct semantic HTML element over using ARIA, if such an element exists. For instance, native elements have built-in keyboard accessibility, roles and states. However, if you choose to use ARIA, you are responsible for mimicking the equivalent browser behavior in script.
|
But yes, precise usage of ARIA... like
marking columns in your <table>s, good.