View Single Post
Old 09-23-2011, 12:39 PM   #9
macpablus
Enthusiast
macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.macpablus once ate a cherry pie in a record 7 seconds.
 
Posts: 25
Karma: 1896
Join Date: Aug 2011
Device: Kindle 3
Quote:
Originally Posted by a.peter View Post
The good point is, that the keep_only_tags member is a list of dictionaries. You may add any other expression you need to parse other pages. If i take a look at an article, e. g. http://www.pagina12.com.ar/diario/el...011-09-22.html, i see that the actual article is embedded into a <div class="nota top12"> tag.
In fact, I'm using the print version for the articles:

http://www.pagina12.com.ar/imprimir/...011-09-22.html

The actual article is contained into this tag: <div id="cuerpo">.

But before this, there's also more content needed for the articles (title, subtitle, author), with tags <h5>, <h1>, etc.. These would be excluded by the keep_only_tags, and if try to include them also, the page that have the comic strip would show these tags, of course.

I think the way to go would be, as Starson suggest:
Quote:
Basically, you want a link to an html page with an img tag on it that holds your strip. If the site doesn't have a page like that (it should, otherwise how do you see it) you can build it yourself in the recipe.
These would override the symptoms you describe:

Quote:
Calibre is expecting a HTML-page as URL. You passed the address of a GIF-image to calibre, which was interpredet as a HTML-page an produced the character garbage you've seen.
But I don't know how to "build the HTML myself". :-(

Maybe you know, pete? ;-)
macpablus is offline   Reply With Quote