Quote:
Originally Posted by kovidgoyal
It should be doable by using the postprocess_html method, which allows you to perform arbitrary manipulations on the downloaded html just before it is saved.
So what you will need to do is for each such image figure out the corresponding text and add it ina <p> after the image.
The postproces_html method is passed two parameters a BeautifulSoup instance and a boolean indicating if the HTML is the first page of the article or not. You can use the soup parameter to perform the manipulations. See the documentation of the BeautifulSoup package to understand how to use it.
|
Thank you for you help, but I think I'll pass on that. I know it's not that hard, but I don't think I should spend that much time on the recipe and start reading instead
Anyway, here's the recipe for Paul Thurrott's SuperSite for Windows