MobileRead Forums - View Single Post - Can't extract article title in parse index

hiperlink · 01-04-2011, 04:52 PM

I can't get it.

I mean, in the debug.log (for the previous version of the scrapped site): https://gist.github.com/749781

Here is one section with its articles, as shown in the log:

Quote:

Found section: Publicisztika
Found article: RAJNAI ATTILA : Foltok a mundéron at http://www.es.hu/2010-12-15_foltok-a-munderon
Found article: BODOKY TAMÁS : A Grupo Milton spanyol módszere at http://www.es.hu/2010-12-15_a-grupo-...anyol-modszere
Found article: TIMOTHY GARTON ASH: Követségi táviratok: titokparádé at http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Found article: Kovács Zoltán: Még mi kéne? at http://www.es.hu/2010-12-15_meg-mi-kene
Found article: UNGVÁRY RUDOLF: Nem magyar magyarként at http://www.es.hu/2010-12-15_nem-magyar-magyarkent
Found article: LOSONCZ MIKLÓS: Leminősítés at http://www.es.hu/2010-12-15_leminosites-
Found article: MEGYESI GUSZTÁV: Nullfaktor at http://www.es.hu/2010-12-15_nullfaktor

And later in the log:

Quote:

Could not fetch link http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 428, in process_links
soup = self.get_soup(dsrc)
File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 189, in get_soup
return self.preprocess_html_ext(soup)
File "/tmp/calibre_0.7.34_tmp_WzGqsn/calibre_0.7.34_8gsQ4J_recipes/recipe0.py", line 122, in preprocess_html
url = links['href']
File "/usr/lib/calibre/calibre/ebooks/BeautifulSoup.py", line 518, in __getitem__
return self._getAttrMap()[key]
KeyError: 'href'

http://www.es.hu/2010-12-15_kovetseg...ok-titokparade saved to
Downloading
Fetching http://www.es.hu/2010-12-15_leminosites-
Failed to download article: TIMOTHY GARTON ASH: Követségi táviratok: titokparádé from http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/utils/threadpool.py", line 95, in run
(request, request.callable(*request.args, **request.kwds))
File "/usr/lib/calibre/calibre/web/feeds/news.py", line 838, in fetch_article
return self._fetch_article(url, dir, f, a, num_of_feeds)
File "/usr/lib/calibre/calibre/web/feeds/news.py", line 834, in _fetch_article
raise Exception(_('Could not fetch article. Run with -vv to see the reason'))
Exception: Nem lehet a cikket letölteni. Futtassa a -vv paraméterrel a hibaüzenetek megjelenítéséhez

Which means I get the article href in parse_index part, but can't download it in preprocess_html (as this function contains: url = links['href'])?