Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2015, 05:50 PM   #1
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Question for lxml experts, please

My lxml expertise is currently somewhat lacking. Is there a known technique, or sample calibre code I can look at, which can reliably identify matching start & end HTML tags?

My aims are two-fold:
  1. to create something which will automatically find occurrences of <span class="italic">...</span> and <span class="bold">...</span> and replace them with 'naked' <i>...</i> and <b>...</b> tags.
  2. to use this as a practical learning exercise to improve my parsing knowledge

P.S. I know Regex can easily be used to convert non-nested occurrences but if possible I'd like to create something which can also reliably handle the nested ones.
jackie_w is offline   Reply With Quote
Old 06-16-2015, 06:08 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,335
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
from calibre.ebooks.oeb.base import XHTML

for tag in root.iterdescendants('*'):
   if tag.name.endswith('}span'):
        cls = tag.get('class')
        if cls in {'bold', 'italic'}:
             tag.name = XHTML('b' if cls == 'bold' else 'i')
kovidgoyal is offline   Reply With Quote
Advert
Old 06-16-2015, 06:20 PM   #3
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Is that it?!! Consider me informed ... and humbled

As ever, thank you for your help
jackie_w is offline   Reply With Quote
Old 06-16-2015, 10:18 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,335
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You're welcome

These kinds of things are much easier to do with a fully parsed representation of the html, like lxml, rather than regexps.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Question for Android Experts friktion Kindle Fire 14 11-22-2011 07:34 PM
Simple Question for the CSS experts out there... Japes Calibre 2 06-23-2011 11:12 PM
Question for the CSS experts crutledge Sigil 8 06-10-2011 04:13 PM
newbie with a question for the experts swimr29 Amazon Kindle 7 11-09-2009 11:34 AM
Question to Experts srinivasvaradar Sony Reader 3 09-30-2007 04:05 PM


All times are GMT -4. The time now is 07:23 AM.


MobileRead.com is a privately owned, operated and funded community.