Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 04-10-2011, 11:18 PM   #1
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Adding pynliner to the conversion pipeline for some functions?

There are some functions I've been thinking about adding which would benefit from knowing what the styles for a given tag is. One use is generating more accurate markdown and Textile output by converting italic/bold/etc to the specific tags those converters expect. I was also thinking about a function which could swap between pseudo and real small-caps variants. A number of the existing heuristic functions could also be improved if the style were readily available at that stage in the conversion pipeline.

It seems to me that the easiest way to do this would be to insert all the styles directly into the document, although I'm open to other ideas to go about this.

There is already a python library for taking styles and placing them inline, pynliner:
http://packages.python.org/pynliner/

If that's a good approach, how to I go about retrieving the css file for a given flow?

Note I'm not saying that this should be a default part of the conversion pipeline, just that it can be used when the user enables a feature that would get some advantage from it.
ldolse is offline   Reply With Quote
Old 04-10-2011, 11:21 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Why dont you run these functions at a stage of the pipeline after CSS flattening? Then getting the style information for any element is trivial.
kovidgoyal is offline   Reply With Quote
Advert
Old 04-10-2011, 11:56 PM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
I guess partly because I'm used to thinking about doing things during pre-processing. I can look at doing it after css flattening. So you're saying create something that's called from plumber, after CSSFlattener?

I understand the logic that a flattened css will have all the potential style sources merged together, which definitely helps, but are there existing functions make it trivial to grab the styles of a given element at that point? Not sure where to look in the code.
ldolse is offline   Reply With Quote
Old 04-11-2011, 12:11 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Look at the end of workaround_ade_quirks in epub.output or the RemoveFakeMargins transform in page_margin.py

These are all examples of using the falttened CSS. Basically, what CSS flatenning does is put all the CSS into class selector based rules. So all you have to do is look at the class attribute of the element and then look up the CSS rule whoose selector is .classname
kovidgoyal is offline   Reply With Quote
Old 04-11-2011, 10:23 AM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Textile and Markdown changes are in the works (Perkin is doing the bulk of the work) to handle CSS.

Look at th htmlz modul in ebooks. oeb2html has a number of classes for handling CSS in different ways. Inlineing the styles is one of them.

Perking and I are planning on using the oeb2html transformed content in Textile and Markdown output. One major reason for using oeb2html is it rewrites links for being relative to a single document which is a necessary step (not currently handled) for the internal links working.

The current parsers for Textile and Markdown are not able to use the stylizer class so Perkin is thinking of using the inline style transform to have all style information accessible. He hasn't committed 100% yet.
user_none is offline   Reply With Quote
Advert
Old 04-11-2011, 12:09 PM   #6
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
From a couple of testsd I've been able to do, I was going to copy the oeb2html and adapt it's output so instead of actual html it then output Textile code (and just as easily Markdown). With the 'inline' options of the convrsion, all the tags have the info needed for all the extra output, such as text-indent and text align, that aren't part of the normal tags.

I was also thinking that...
Some ballpark figures will have to be used, such as text-indent:0.8em; would be converted to 1em which would be represented by a '(' in the paragraph/heading start tag. Probably round up >.5 and <1.5

I'll do it.
But it may take me longer than some of you, as I've only just started to learn/use python.

user_none and I have had a brief discussion on the small-caps and I was just going to add a tag for it in Textile (probably '&') and then allow conversion to and from textile, by converting to a span tag with a class which had text-variant: smallcaps;
I believe the Epub spec doesn't (yet) officially recognise the smallcaps so this may not render accurately on all readers, but as the output would just be normal text I thought that would be acceptable.
Perkin is offline   Reply With Quote
Old 04-11-2011, 12:49 PM   #7
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Perkin View Post
From a couple of testsd I've been able to do, I was going to copy the oeb2html and adapt it's output so instead of actual html it then output Textile code (and just as easily Markdown). With the 'inline' options of the convrsion, all the tags have the info needed for all the extra output, such as text-indent and text align, that aren't part of the normal tags.
I'll put together a simplier skeleton class later today for you. If your not going to use the lxml or the sgml parser we have a lot more flexibility. I'll show you how to use the stylizer class in the skeleton class to access the a specific style easily as you need to.

Quote:
Originally Posted by Perkin View Post
I was also thinking that...
Some ballpark figures will have to be used, such as text-indent:0.8em; would be converted to 1em which would be represented by a '(' in the paragraph/heading start tag. Probably round up >.5 and <1.5
Look at txtml. Rounded figures are used in it. You should use the same rounding to keep the outputs consistant. txtml also uses a similar class desing as the oeb2html and handles top / bottom margins for soft scene breaks.
user_none is offline   Reply With Quote
Old 04-11-2011, 01:25 PM   #8
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
This all sounds good. One of the use cases I was thinking about with Textile/Markdown conversion was for cleaning up garbage html before running through heuristics. One case in particular was for cleaning up files which abuse <br> instead of using proper paragraph tags.

I think the main hurdle is that in heuristics I would just have a flow to pass over to this new function, I'd still need a way to pass the css file over to the textile/markdown conversion function.

Regarding smallcaps, I wasn't thinking about that specifically for markdown/textile, I was thinking about this more for conforming to the epub spec, as Perkin is correct that the spec doesn't support that font-variant statement. So there are really two options, one is create a pseudo smallcaps effect, the other is to embed a dedicated smallcaps font. I'm thinking to give the user the option to convert from one to the other during conversion. I think for this sort of function Kovid's original recommended approach is still the best option.
ldolse is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there a way to prevent Calibre adding to the TOC upon conversion? PatNY Conversion 21 03-17-2011 03:09 PM
Calibre conversion without adding to library Starganderfish Calibre 2 01-02-2011 04:31 AM
New functions poco06 Calibre 4 05-01-2010 01:39 PM
Help Understanding Calibre Functions Knocka Calibre 8 04-08-2009 11:31 PM
Changing Button functions vadindot iRex 3 03-26-2009 10:49 AM


All times are GMT -4. The time now is 04:29 PM.


MobileRead.com is a privately owned, operated and funded community.