Quote:
Originally Posted by gambarini
With this feed i have tried two ways, and every one has is pro and cons...
With get.article i can obtain the correct link, but i can't find the title of the article.
With the parse_index ( index_to_soup) i can find the correct "title" but i don't get the link (in the soup there is a malformed "link" tag)
an example of index to soup
So  is there the possibility to use both solutions together?
Or is there the possibility to extract the link near the malformet tag <link /> ???
p.s.
probably the bug is related to the feed
Spoiler:
Code:
Parsing index.html ...
Initial parse failed:
Traceback (most recent call last):
File "site-packages\calibre\ebooks\oeb\base.py", line 813, in first_pass
File "lxml.etree.pyx", line 2538, in lxml.etree.fromstring (src/lxml/lxml.etree.c:48266)
File "parser.pxi", line 1536, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:71653)
File "parser.pxi", line 1408, in lxml.etree._parseDoc (src/lxml/lxml.etree.c:70449)
File "parser.pxi", line 898, in lxml.etree._BaseParser._parseUnicodeDoc (src/lxml/lxml.etree.c:67144)
File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:63820)
File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:64741)
File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64084)
XMLSyntaxError: Opening and ending tag mismatch: img line 29 and p, line 29, column 27
|
ok, this is my solution; i don't use the feed but i try to obtain link directly from the html section of the site.
So this is the code (beta version

)