View Single Post
Old 06-16-2015, 05:50 PM   #1
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,252
Karma: 16544692
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Question for lxml experts, please

My lxml expertise is currently somewhat lacking. Is there a known technique, or sample calibre code I can look at, which can reliably identify matching start & end HTML tags?

My aims are two-fold:
  1. to create something which will automatically find occurrences of <span class="italic">...</span> and <span class="bold">...</span> and replace them with 'naked' <i>...</i> and <b>...</b> tags.
  2. to use this as a practical learning exercise to improve my parsing knowledge

P.S. I know Regex can easily be used to convert non-nested occurrences but if possible I'd like to create something which can also reliably handle the nested ones.
jackie_w is offline   Reply With Quote