Quote:
Originally Posted by a.peter
The good point is, that the keep_only_tags member is a list of dictionaries. You may add any other expression you need to parse other pages. If i take a look at an article, e. g. http://www.pagina12.com.ar/diario/el...011-09-22.html, i see that the actual article is embedded into a <div class="nota top12"> tag.
|
In fact, I'm using the print version for the articles:
http://www.pagina12.com.ar/imprimir/...011-09-22.html
The actual article is contained into this tag: <div id="cuerpo">.
But before this, there's also more content needed for the articles (title, subtitle, author), with tags <h5>, <h1>, etc.. These would be excluded by the keep_only_tags, and if try to include them also, the page that have the comic strip would show these tags, of course.
I think the way to go would be, as Starson suggest:
Quote:
Basically, you want a link to an html page with an img tag on it that holds your strip. If the site doesn't have a page like that (it should, otherwise how do you see it) you can build it yourself in the recipe.
|
These would override the symptoms you describe:
Quote:
Calibre is expecting a HTML-page as URL. You passed the address of a GIF-image to calibre, which was interpredet as a HTML-page an produced the character garbage you've seen.
|
But I don't know how to "build the HTML myself". :-(
Maybe you know, pete? ;-)