Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-20-2011, 03:09 AM   #1
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
how are nested+contradictory CSS dealt with ?

a line from a calibre conversion with an overkill of styles
Code:
<h1 class="calibre11" id="calibre_pb_34"><span class="calibre12 bold"><strong class="calibre13"><span class="calibre14 calibre15">16</span> KARMIC KILTER</strong></span></h1>

in the css, calibre11 thru 15 each have different font size settings. which ones actually get applied to the text ?

or to put it another way - how much of that code could be deleted with no side effects ?

and do strong and font-weight: bold both do the same thing ?
.calibre11 {
display: block;
font-size: 1.2em;
font-weight: bold;
line-height: 1.2;
margin-bottom: 0.67em;
margin-left: 0;
margin-right: 0;
margin-top: 0.67em;
text-align: right
}
.calibre12 {
font-size: 0.8m;
line-height: 1.2
}
.calibre13 {
font-weight: bolder;
line-height: 1.2
}
.calibre14 {
line-height: 1.2
}
.calibre15 {
color: #BDBDBD;
font-size: 1.2em;
line-height: 1.2
}

final challenge question - a regex which would remove all the unnecessary stuff from that line & from all similar <h1 lines ?

Last edited by cybmole; 10-20-2011 at 03:13 AM.
cybmole is offline   Reply With Quote
Old 10-20-2011, 01:38 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,249
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
At the risk of woefully exposing my limited css knowledge, I believe the following to be true:

With nested html elements, the various css attributes are not all treated the same.
  • The font-size attribute is multiplicative, at least when using em units.
    Therefore, in your example, I would expect the text 'KARMIC KILTER' to render at 1.2 x 0.8em, i.e 0.96em.
    Similarly, I would expect the text '16' to render at 1.2 x 0.8 x 1.2em, i.e. 1.152em
  • I believe nested margins are additive not multiplicative.
  • Nested attributes like color, font-family are neither.
    Using your example, if .calibre11 had color: red and .calibre15 has color: #BDBDBD
    'KARMIC KILTER' would be in red, '16' would be in #BDBDBD (some shade of grey).
Quote:
do strong and font-weight: bold both do the same thing?
They appear to to the same thing on an ereader but whether they necessarily do in a web browser, I don't know.
<strong> and <b> seem to do the same thing.

Quote:
how much of that code could be deleted with no side effects
In your example I don't think <span class="calibre14"> is doing anything useful. I think line-height only works on block elements e.g. <body>, <h?>, <p>, <div>, <blockquote>.

<span>, <i>, <em>, <b>, <strong> are inline elements, i.e. they work on a selected portion of a block element.

Time to read up on html and css, I think, cybmole It will help a lot. You might find this site useful, it lets you play around typing in sample html/css and immediately viewing the results.

Re: the Regex, I'll leave that for someone else

Last edited by jackie_w; 10-20-2011 at 09:46 PM. Reason: correction re em units
jackie_w is offline   Reply With Quote
Advert
Old 10-20-2011, 02:00 PM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
i did a little experimentation in sigil & + a little googling-
I got the impression that with most attributes the outer one takes preference . so font size is taken from calibre 11 only.

could well be though that different programs / readers interpret differently

i would strip or comment out colors anyway for e-reader as they are not supported.

thanks for the tutorial link.
cybmole is offline   Reply With Quote
Old 10-20-2011, 11:33 PM   #4
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Amusing you want to catch all h1/2 etc:

find : <(h\d)[^<>]*>(.+)</\1>
replace : <\1>\2</\1>

If you are cleaning up books or something, it's often best to work from an HTMLZ export - In the format options you can select how CSS is handled, the 'tag' option will not add any extra stuff, tho I cant remember if it preserves bold/italic etc formatting - i.e placing <i> tags..
Serpentine is offline   Reply With Quote
Old 10-21-2011, 01:16 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Serpentine View Post
Amusing you want to catch all h1/2 etc:

find : <(h\d)[^<>]*>(.+)</\1>
replace : <\1>\2</\1>

If you are cleaning up books or something, it's often best to work from an HTMLZ export - In the format options you can select how CSS is handled, the 'tag' option will not add any extra stuff, tho I cant remember if it preserves bold/italic etc formatting - i.e placing <i> tags..
impressive, I wimped out & did multiple simpler reductions.

but I think your code misses the fact that there's both a chapter number and a chapter title within the dross, both of which should ideally be salvaged.
cybmole is offline   Reply With Quote
Advert
Old 10-21-2011, 01:21 AM   #6
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by cybmole View Post
final challenge question - a regex which would remove all the unnecessary stuff from that line & from all similar <h1 lines ?
I do simple search & replace in Sigil, no complex regex usually needed.
DoctorOhh is offline   Reply With Quote
Old 10-21-2011, 02:19 AM   #7
Agama
Guru
Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.Agama ought to be getting tired of karma fortunes by now.
 
Agama's Avatar
 
Posts: 776
Karma: 2751519
Join Date: Jul 2010
Location: UK
Device: PW2, Nexus7
Quote:
Originally Posted by cybmole View Post
I got the impression that with most attributes the outer one takes preference . so font size is taken from calibre 11 only.
As jackie_w said, font sizes in em are multiplicative. I have attached an example to show this in action. Change the font-size in style "calibre", (e.g. to font-size:1.5em), and you will see that all the other styles get mulitpliied. Similarly you can see that the <span> within the <p> is multiplying the "calibre4" font-sizes and the "calibre" size.

(Save/Rename the attached css.txt file as css.html to try it out).
Attached Thumbnails
Click image for larger version

Name:	css.gif
Views:	186
Size:	4.0 KB
ID:	78014  
Attached Files
File Type: txt css.txt (540 Bytes, 150 views)

Last edited by Agama; 10-21-2011 at 02:29 AM. Reason: Added GIF screenshot to show results
Agama is offline   Reply With Quote
Old 10-21-2011, 06:42 AM   #8
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,791
Karma: 30548723
Join Date: Dec 2006
Location: Singapore
Device: Boyue
What kind of ebook needs so many styles. I mostly read fiction so I just convert to text with textile. Then convert to epub from that you can add your custom styles for headings paragraphs italics while converting from text to epub if you are so inclined.
Dopedangel is offline   Reply With Quote
Old 10-21-2011, 07:38 AM   #9
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by dwanthny View Post
I do simple search & replace in Sigil, no complex regex usually needed.
so do I, I take stuff out a bit at a time & check for unexpected damage; but sometimes I like to ponder how/if things can be done in one pass.

I see going to text & back to epub as an absolute last resort.

if the source is epub I prefer to retain the original styling, except for tweaking font sizes & centering headers. Sometimes that leads to wanting to clean up the code.

I prefer to have everything in the 0.8 - 1.2 em font size range. anything smaller is hard to read & anything larger looks like overkill on the actual e-reader.
cybmole is offline   Reply With Quote
Old 10-21-2011, 08:41 AM   #10
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,913
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
nested spans (or divs) are problematic to replace/remove with REGEX (unless I have been missing a trick ) in that they must match the correct Closing Tag, not the first occurrence of a closing tag.

BTW a simple What If I took it out test.
/* just comment out the selector in the style sheet */
and see what gets ugly you can always remove the comment marks.

Unlike CSS errors, undefined styles do not seem to cause problems with ADE on my PEz

Last edited by theducks; 10-21-2011 at 08:44 AM.
theducks is offline   Reply With Quote
Old 10-21-2011, 11:15 AM   #11
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Quote:
Originally Posted by cybmole View Post
but I think your code misses the fact that there's both a chapter number and a chapter title within the dross, both of which should ideally be salvaged.
Yes and no, it disregards attributes which are important if you do not want to regenerate a Table of Contents; however it's usually lot safer to regenerate, since Sigil does this pretty well, using the text between the <h> tags.

On a more general note : if you're like me and you just like extremely simple - near plain html - books, something that is quite handy would be - this is rather dangerous - read and understand it first.

JGsoft syntax:
(?<=</?(h\d|[uod]l|[uisbqpa]|hr|abbr|acronym|address|area|base|basefont|bdo|bi g|blockquote|body|button|caption|center|cite|code| col|colgroup|dd|del|dfn|dir|div|dt|em|fieldset|fon t|hr|ins|kbd|label|legend|li|map|object|param|pre| samp|script|select|small|span|strike|strong|sub|su p|table|tbody|td|textarea|tfoot|th|thead|title|tr| tt|var))\s[^<>/]*(?=/?>)
replace : blank

perl compatible (i.e Python) syntax:
(</?)(h\d|[uod]l|[uisbqpa]|hr|abbr|acronym|address|area|base|basefont|bdo|bi g|blockquote|body|button|caption|center|cite|code| col|colgroup|dd|del|dfn|dir|div|dt|em|fieldset|fon t|hr|ins|kbd|label|legend|li|map|object|param|pre| samp|script|select|small|span|strike|strong|sub|su p|table|tbody|td|textarea|tfoot|th|thead|title|tr| tt|var)(\s[^<>/]*)(/?>)
replace : \1\2\4

This will strip all attributes from the html tags - i.e :
<p class="calibre2"><span class="blarg">This is some text</span></p>
becomes:
<p><span>This is some text</span></p>

You can then apply whatever styles you want directly to all elements - however you usually need two <p> styles - one indented and one flush. Also note that it will remove location markers from your <h1/2..x> headers, so only use this if you plan on regenerating the ToC. You can remove tags from the 'or' list to avoid them entirely - I most likely have forgotten a few header ones in there.

If you know the book you're working with also does not contain any 'useful' formatting in the spans, you can use something like : </?span[^/>]*> to remove them all. But read the CSS first, often they are used only to apply italic/bold/underlines - in which case convert those first to their html tags like <i>.

All in all it's usually easier to just use the HTMLZ with the CSS set to use tags from the get-go
Serpentine is offline   Reply With Quote
Old 10-21-2011, 11:39 AM   #12
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by Serpentine View Post

If you know the book you're working with also does not contain any 'useful' formatting in the spans,
but if I've not already read the book, I don't know that. so I limit the amount of advance tweaking that I do, to improving header layouts, paragraph spacings & indents.

I used to always regenerate TOC, but with latest sigil I sometimes leave it mostly as-is, just edit as needed.

I find that many novels slip in occasional format changes e,g, for signs, diary/letter extracts, newspaper clippings ... to make than stand out from the general text, & I quite like those.

If, when reading, I see anything else that bugs me, I make a note of it using the reader notes facility then do an additional post-read edit. I do that also for annoying typos.
cybmole is offline   Reply With Quote
Old 10-21-2011, 11:41 AM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,220
Karma: 145488788
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Dopedangel View Post
What kind of ebook needs so many styles. I mostly read fiction so I just convert to text with textile. Then convert to epub from that you can add your custom styles for headings paragraphs italics while converting from text to epub if you are so inclined.
Maybe one built from an MS Word document.
JSWolf is offline   Reply With Quote
Old 10-21-2011, 11:55 AM   #14
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Quote:
Originally Posted by cybmole View Post
but if I've not already read the book, I don't know that. so I limit the amount of advance tweaking that I do, to improving header layouts, paragraph spacings & indents.
Yeah, I glance through the file first, looking for problems. If there are any issues I just go and clean it up beforehand. You can usually make a good guestimate of the classes from just reading the CSS and looking for font styling.

Quote:
Originally Posted by cybmole View Post
If, when reading, I see anything else that bugs me, I make a note of it using the reader notes facility then do an additional post-read edit. I do that also for annoying typos.
Yeah, I've just finished making 84 edits on an obviously pdf converted retail book - sent the changes back to the publisher :/.

Scifi conversions nearly always have their letters/signs etc messed up during conversion, especially when print books use a teletype style for them. So I spend a lot of time putting in <tt>'s. Yay.
Serpentine is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Has any dealt with this company? kenm Android Devices 3 10-14-2010 09:05 AM
NCX creation -- nested or not, css classes illustrata ePub 3 08-25-2010 08:56 AM
Has anybody ever dealt with this web store? pathfinderca Alternative Devices 5 12-03-2009 04:34 PM
Contradictory error messages in ConvertLit Fledchen Workshop 6 02-21-2009 12:38 PM


All times are GMT -4. The time now is 08:35 PM.


MobileRead.com is a privately owned, operated and funded community.