Quote:
Originally Posted by DNSB
Interesting. I just did a search on my ebook archive (ebooks before they are loaded into calibre for fixup). Out of 7914 epubs, 7401 had <hx> tags while the remaining 513 did not. I looked at 20 or so of the ones that didn't and they used <p> tags for the headers. Makes building the TOC whether nav.xhtml or toc.ncx) more of a pain than it needs to be.
Most used at most 2 levels <h1> and <h2> while one coded by an anal retentive went all the way from <h1> to <h6>.
Some of them created by Vellum wrapped 4 or more levels of <div> around the <hx> tag. The funny part being that 2 or 3 of the <div> tags had a class that did not exist in the CSS.
That search ran most of the day while I was busily working from home. I suspect that having to unpack the epubs into a temp directory before searching did nothing for the search speed.
|
Vellum code can be pretty bad. What I do in the case of <div> and <span> with classes that don't exist is gen rid of the classes with removing all unused classes. Then I'll dump <divs> an <spans> that have no classes. If it turns out that an empty <div> is used for an <img>, I'll fix that. But it does seem that more eBooks these days are using h? in the chapter header. But there are still a lot of eBooks out there with some not nice code. The code in HTML is not that bad for a lot of eBooks these days. It's still the CSS that needs to be made better.