Quote:
Originally Posted by gambarini
Yes, it is a powerful solution... but not simple.
|
Agreed. Once you understand BeautifulSoup and pre and post processing of html, you can do almost anything with a page. During the Olympics I used it to parse a Flash-based slideshow of photos. The Flash code on page 1 included a URL that pointed to XML data elsewhere on the web. BeautifulSoup let me extract the address for that data from the scripting on page 1. The XML data had pointers to photo images, with titles and comments for each photo for the Flash code to use. BeautifulSoup then let me extract the XML data and build a custom virtual page with each photo being labeled and having a comment. That custom page, despite not really existing anywhere, was passed to Calibre's recipe handler to build the EPUB.
Basically, BeautifulSoup will let you remove elements, swap or add elements, find elements, construct new pages, etc. IIRC, multipage recipes grab article text from subsequent pages and paste it into the first page before the first page gets processed by the recipe.