![]() |
#1 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Question for lxml experts, please
My lxml expertise is currently somewhat lacking. Is there a known technique, or sample calibre code I can look at, which can reliably identify matching start & end HTML tags?
My aims are two-fold:
P.S. I know Regex can easily be used to convert non-nested occurrences but if possible I'd like to create something which can also reliably handle the nested ones. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,335
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Code:
from calibre.ebooks.oeb.base import XHTML for tag in root.iterdescendants('*'): if tag.name.endswith('}span'): cls = tag.get('class') if cls in {'bold', 'italic'}: tag.name = XHTML('b' if cls == 'bold' else 'i') |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,251
Karma: 16539642
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
|
Is that it?!! Consider me informed ... and humbled
![]() As ever, thank you for your help ![]() |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,335
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You're welcome
![]() These kinds of things are much easier to do with a fully parsed representation of the html, like lxml, rather than regexps. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question for Android Experts | friktion | Kindle Fire | 14 | 11-22-2011 07:34 PM |
Simple Question for the CSS experts out there... | Japes | Calibre | 2 | 06-23-2011 11:12 PM |
Question for the CSS experts | crutledge | Sigil | 8 | 06-10-2011 04:13 PM |
newbie with a question for the experts | swimr29 | Amazon Kindle | 7 | 11-09-2009 11:34 AM |
Question to Experts | srinivasvaradar | Sony Reader | 3 | 09-30-2007 04:05 PM |