MobileRead Forums - View Single Post

jackie_w · 07-21-2011, 09:40 AM

Please could someone guide me in the right direction.

I'm still feeling my way with Python and object-oriented stuff in general. To date, when I have been analysing epub opfs and occasionally htmls, I have achieved what I needed using regex. However, on poking around calibre source I see parsers being used, namely BeautifulSoup and lxml etree.

I haven't used a parser before, but it looks like something I ought to explore. What I would like to know is, under what circumstances might I choose to use BeautifulSoup rather than lxml etree, and vice versa?

07-21-2011, 09:40 AM	#1
jackie_w Grand Sorcerer Posts: 6,249 Karma: 16539642 Join Date: Sep 2009 Location: UK Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3	Calibre and parsers, info please Please could someone guide me in the right direction. I'm still feeling my way with Python and object-oriented stuff in general. To date, when I have been analysing epub opfs and occasionally htmls, I have achieved what I needed using regex. However, on poking around calibre source I see parsers being used, namely BeautifulSoup and lxml etree. I haven't used a parser before, but it looks like something I ought to explore. What I would like to know is, under what circumstances might I choose to use BeautifulSoup rather than lxml etree, and vice versa?