Quote:
Originally Posted by Aleron Ives
That sounds like the cleaning process could easily become extremely time consuming if your goal is to reverse all the stupid formatting decisions made by the publisher...
Out of curiosity, can you be naughty and delete the CSS entirely and use HTML4 font tags, instead? 
|
That's essentially what I do. My goal is to open a cleaned book and have the font size, line height, and margins always be the same; I set them with the sliders in the Kobo's reading settings. Before I nuke the book's original css I go through and find all of the classes that set bold, italic, and italic bold and add them to my css. My search string for that is
Code:
bold|italic|font-weight: [56789]
The vast majority of these are books that have been converted to EPUB from the Kindle formats and I have calibre "normalize" a lot of stuff in the process. I suspect that calibre fixes weird stuff like inline styles. Doing this I rarely have to do anything to the html files, although occasionally there are books that use div tags instead of p tags, and recently there was one that had a space entity wrapped in p tags between every paragraph (I guess ensuring that there was a blank line between paragraphs).
This is the css that I replace the book's with.
Code:
body {
font-size: 100%;
border: 0;
margin: 0;
padding: 0;
width: auto;
}
body * {
line-height: inherit;
}
p {
font-size: 100%;
margin: 0;
padding: 0;
border: 0;
text-indent: 2em;
}
h1,h2,h3,h4 {
text-align: center;
}
.bold {
font-weight: bold;
}
.italic {
font-style: italic;
}
.bold_italic {
font-weight: bold;
font-style: italic;
}
It's not pretty since all text is the same size, in particular the chapter headings/titles since most books use p tags instead of h tags, and the chapter headings are often right above the first paragraph with no space. But it's a small price to pay for how quick it is to clean a book. I probably download at least 7 free books a week by using ereaderiq.com so speed is important. (Also realizing a book is a dud and stopping after a few chapters.)