View Single Post
Old 06-13-2020, 04:13 AM   #50
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,796
Karma: 146391129
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by DNSB View Post
Interesting. I just did a search on my ebook archive (ebooks before they are loaded into calibre for fixup). Out of 7914 epubs, 7401 had <hx> tags while the remaining 513 did not. I looked at 20 or so of the ones that didn't and they used <p> tags for the headers. Makes building the TOC whether nav.xhtml or toc.ncx) more of a pain than it needs to be.

Most used at most 2 levels <h1> and <h2> while one coded by an anal retentive went all the way from <h1> to <h6>.

Some of them created by Vellum wrapped 4 or more levels of <div> around the <hx> tag. The funny part being that 2 or 3 of the <div> tags had a class that did not exist in the CSS.

That search ran most of the day while I was busily working from home. I suspect that having to unpack the epubs into a temp directory before searching did nothing for the search speed.
Vellum code can be pretty bad. What I do in the case of <div> and <span> with classes that don't exist is gen rid of the classes with removing all unused classes. Then I'll dump <divs> an <spans> that have no classes. If it turns out that an empty <div> is used for an <img>, I'll fix that. But it does seem that more eBooks these days are using h? in the chapter header. But there are still a lot of eBooks out there with some not nice code. The code in HTML is not that bad for a lot of eBooks these days. It's still the CSS that needs to be made better.
JSWolf is offline   Reply With Quote