MobileRead Forums - View Single Post - how are nested+contradictory CSS dealt with ?

Thread: how are nested+contradictory CSS dealt with ?

View Single Post

Old

10-21-2011, 11:15 AM

#11

Serpentine

Evangelist

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Serpentine ought to be getting tired of karma fortunes by now.

Posts: 416

Karma: 1045911

Join Date: Sep 2011

Location: Cape Town, South Africa

Device: Kindle 3

Quote:

Originally Posted by cybmole

View Post

but I think your code misses the fact that there's both a chapter number and a chapter title within the dross, both of which should ideally be salvaged.

Yes and no, it disregards attributes which are important if you do not want to regenerate a Table of Contents; however it's usually lot safer to regenerate, since Sigil does this pretty well, using the text between the <h> tags.

On a more general note : if you're like me and you just like extremely simple - near plain html - books, something that is quite handy would be - this is rather dangerous - read and understand it first.

JGsoft syntax:
(?<=</?(h\d|[uod]l|[uisbqpa]|hr|abbr|acronym|address|area|base|basefont|bdo|bi g|blockquote|body|button|caption|center|cite|code| col|colgroup|dd|del|dfn|dir|div|dt|em|fieldset|fon t|hr|ins|kbd|label|legend|li|map|object|param|pre| samp|script|select|small|span|strike|strong|sub|su p|table|tbody|td|textarea|tfoot|th|thead|title|tr| tt|var))\s[^<>/]*(?=/?>)
replace : blank

perl compatible (i.e Python) syntax:
(</?)(h\d|[uod]l|[uisbqpa]|hr|abbr|acronym|address|area|base|basefont|bdo|bi g|blockquote|body|button|caption|center|cite|code| col|colgroup|dd|del|dfn|dir|div|dt|em|fieldset|fon t|hr|ins|kbd|label|legend|li|map|object|param|pre| samp|script|select|small|span|strike|strong|sub|su p|table|tbody|td|textarea|tfoot|th|thead|title|tr| tt|var)(\s[^<>/]*)(/?>)
replace : \1\2\4

This will strip all attributes from the html tags - i.e :
<p class="calibre2"><span class="blarg">This is some text</span></p>
becomes:
<p><span>This is some text</span></p>

You can then apply whatever styles you want directly to all elements - however you usually need two <p> styles - one indented and one flush. Also note that it will remove location markers from your <h1/2..x> headers, so only use this if you plan on regenerating the ToC. You can remove tags from the 'or' list to avoid them entirely - I most likely have forgotten a few header ones in there.

If you know the book you're working with also does not contain any 'useful' formatting in the spans, you can use something like : </?span[^/>]*> to remove them all. But read the CSS first, often they are used only to apply italic/bold/underlines - in which case convert those first to their html tags like <i>.

All in all it's usually easier to just use the HTMLZ with the CSS set to use tags from the get-go

Serpentine is offline

Reply With Quote