View Single Post
Old 06-22-2010, 03:22 PM   #2184
gambarini
Connoisseur
gambarini began at the beginning.
 
Posts: 98
Karma: 22
Join Date: Mar 2010
Device: IRiver Story, Ipod Touch, Android SmartPhone
With this feed i have tried two ways, and every one has is pro and cons...

With get.article i can obtain the correct link, but i can't find the title of the article.
With the parse_index ( index_to_soup) i can find the correct "title" but i don't get the link (in the soup there is a malformed "link" tag)
an example of index to soup
Spoiler:

Code:
<item>
<title><![CDATA[Berlusconi: "Siamo il Paese 
più ricco d'Europa"]]></title>
<description><![CDATA[ROMA<BR>Il Premier Silvio Berlusconi continua a confidare su un forte consenso popolare alla sua persona e al suo governo, a dispetto «di tutto il fango che ci buttano addosso». E inivita il centrodestra a «non farsi del male in casa», apprpoffitando semmai di una opposizione che descrive pressochè inesistente. «Nonostante tutto il fango che tentano di buttarci addosso - dice nel suo collegamento  ...(continua)]]></description>
<author><![CDATA[ ]]></author>
<category><![CDATA[POLITICA]]></category>
<pubdate><![CDATA[Sun, 20 Jun 2010 13:34:37 +0200]]></pubdate>
<link />http://www.lastampa.it/redazione/cmsSezioni/politica/201006articoli/56066girata.asp
<enclosure url="http://www.lastampa.it/redazione/cmssezioni/politica/201006images/berlusconi01g.jpg" type="image/jpeg">
<image>
<url>http://www.lastampa.it/redazione/cmssezioni/politica/201006images/berlusconi01g.jpg</url>
<title></title>
<link />
<width></width>
<height></height>
</image>

So is there the possibility to use both solutions together?
Or is there the possibility to extract the link near the malformet tag <link /> ???


p.s.

probably the bug is related to the feed
Spoiler:

Code:
Parsing index.html ...
Initial parse failed:
Traceback (most recent call last):
  File "site-packages\calibre\ebooks\oeb\base.py", line 813, in first_pass
  File "lxml.etree.pyx", line 2538, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48266)
  File "parser.pxi", line 1536, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71653)
  File "parser.pxi", line 1408, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70449)
  File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67144)
  File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820)
  File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741)
  File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084)
XMLSyntaxError: Opening and ending tag mismatch: img line 29 and p, line 29, column 27

Last edited by gambarini; 06-22-2010 at 03:36 PM.
gambarini is offline