MobileRead Forums - View Single Post

jackie_w · 06-16-2015, 05:50 PM

My lxml expertise is currently somewhat lacking. Is there a known technique, or sample calibre code I can look at, which can reliably identify matching start & end HTML tags?

My aims are two-fold:

to create something which will automatically find occurrences of ... and ... and replace them with 'naked' ... and ... tags.
to use this as a practical learning exercise to improve my parsing knowledge

P.S. I know Regex can easily be used to convert non-nested occurrences but if possible I'd like to create something which can also reliably handle the nested ones.

06-16-2015, 05:50 PM	#1
jackie_w Grand Sorcerer Posts: 6,252 Karma: 16544692 Join Date: Sep 2009 Location: UK Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3	Question for lxml experts, please My lxml expertise is currently somewhat lacking. Is there a known technique, or sample calibre code I can look at, which can reliably identify matching start & end HTML tags? My aims are two-fold: to create something which will automatically find occurrences of <span class="italic">...</span> and <span class="bold">...</span> and replace them with 'naked' <i>...</i> and <b>...</b> tags. to use this as a practical learning exercise to improve my parsing knowledge P.S. I know Regex can easily be used to convert non-nested occurrences but if possible I'd like to create something which can also reliably handle the nested ones.