Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-04-2011, 03:45 AM   #16
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Thanks for your Answer Kovid!

But what if I want to get the article? Why can't my recipe download it?
hiperlink is offline   Reply With Quote
Old 01-04-2011, 11:20 AM   #17
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
you have look and see why the link element has no href on the website and figure out an alternative
kovidgoyal is online now   Reply With Quote
Advert
Old 01-04-2011, 03:52 PM   #18
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Unhappy

I can't get it.

I mean, in the debug.log (for the previous version of the scrapped site): https://gist.github.com/749781

Here is one section with its articles, as shown in the log:

Quote:
Found section: Publicisztika
Found article: RAJNAI ATTILA : Foltok a mundéron at http://www.es.hu/2010-12-15_foltok-a-munderon
Found article: BODOKY TAMÁS : A Grupo Milton spanyol módszere at http://www.es.hu/2010-12-15_a-grupo-...anyol-modszere
Found article: TIMOTHY GARTON ASH: Követségi táviratok: titokparádé at http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Found article: Kovács Zoltán: Még mi kéne? at http://www.es.hu/2010-12-15_meg-mi-kene
Found article: UNGVÁRY RUDOLF: Nem magyar magyarként at http://www.es.hu/2010-12-15_nem-magyar-magyarkent
Found article: LOSONCZ MIKLÓS: Leminősítés at http://www.es.hu/2010-12-15_leminosites-
Found article: MEGYESI GUSZTÁV: Nullfaktor at http://www.es.hu/2010-12-15_nullfaktor
And later in the log:

Quote:
Could not fetch link http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 428, in process_links
soup = self.get_soup(dsrc)
File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 189, in get_soup
return self.preprocess_html_ext(soup)
File "/tmp/calibre_0.7.34_tmp_WzGqsn/calibre_0.7.34_8gsQ4J_recipes/recipe0.py", line 122, in preprocess_html
url = links['href']
File "/usr/lib/calibre/calibre/ebooks/BeautifulSoup.py", line 518, in __getitem__
return self._getAttrMap()[key]
KeyError: 'href'

http://www.es.hu/2010-12-15_kovetseg...ok-titokparade saved to
Downloading
Fetching http://www.es.hu/2010-12-15_leminosites-
Failed to download article: TIMOTHY GARTON ASH: Követségi táviratok: titokparádé from http://www.es.hu/2010-12-15_kovetseg...ok-titokparade
Traceback (most recent call last):
File "/usr/lib/calibre/calibre/utils/threadpool.py", line 95, in run
(request, request.callable(*request.args, **request.kwds))
File "/usr/lib/calibre/calibre/web/feeds/news.py", line 838, in fetch_article
return self._fetch_article(url, dir, f, a, num_of_feeds)
File "/usr/lib/calibre/calibre/web/feeds/news.py", line 834, in _fetch_article
raise Exception(_('Could not fetch article. Run with -vv to see the reason'))
Exception: Nem lehet a cikket letölteni. Futtassa a -vv paraméterrel a hibaüzenetek megjelenítéséhez
Which means I get the article href in parse_index part, but can't download it in preprocess_html (as this function contains: url = links['href'])?
hiperlink is offline   Reply With Quote
Old 01-18-2011, 08:31 AM   #19
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Unhappy Still can't get some articles

Hi All,



With my updated recipe (which still needs refactoring) at https://gist.github.com/749788 I still can't get some of the articles which were recognized by parse_index as valid feed items (and can access them via my browser). Could someone tell me why?

Here is the debug.log:
https://gist.github.com/749781

Important part is:

Code:
Could not fetch link http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon
Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 428, in process_links
    soup = self.get_soup(dsrc)
  File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 189, in get_soup
    return self.preprocess_html_ext(soup)
  File "/tmp/calibre_0.7.40_tmp_fNd0OI/calibre_0.7.40_CGdmix_recipes/recipe0.py", line 144, in preprocess_html
    url = links['href']
  File "/usr/lib/calibre/calibre/ebooks/BeautifulSoup.py", line 518, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'href'

http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon saved to 
Downloading
Fetching http://www.es.hu/2011-01-16_esse-delendam
Failed to download article: KOLTAY ANDRÁS  Van-e sajtószabadság Magyarországon? from http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon
Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/utils/threadpool.py", line 95, in run
    (request, request.callable(*request.args, **request.kwds))
  File "/usr/lib/calibre/calibre/web/feeds/news.py", line 846, in fetch_article
    return self._fetch_article(url, dir, f, a, num_of_feeds)
  File "/usr/lib/calibre/calibre/web/feeds/news.py", line 842, in _fetch_article
    raise Exception(_('Could not fetch article. Run with -vv to see the reason'))
Exception: Nem lehet a cikket letölteni. Futtassa a -vv paraméterrel a hibaüzenetek megjelenítéséhez
hiperlink is offline   Reply With Quote
Old 01-18-2011, 11:00 AM   #20
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by hiperlink View Post
I still can't get some of the articles which were recognized by parse_index as valid feed items (and can access them via my browser). Could someone tell me why?
I can't, but I can steer you to some debugging. When Calibre asks for an article, the way it asks differs from the request made by a browser. The trick is to make the browser look like Calibre or vice-versa. It can be a cookie issue, a header issue (referer, etc.). Use LiveHTTP Headers or TamperData in FireFox to control the browser. Use the browser and header commands in the recipe to see and modify headers/cookies/referer in your recipe. When they are the same, you will get the same results.
Starson17 is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ADD Books & extract tags from title? johnb0647 Calibre 3 01-08-2011 05:36 PM
Article tweak for title sort not working Manichean Calibre 2 10-04-2010 11:56 AM
Initial parse failed: mburgoa Calibre 4 08-07-2010 08:50 AM
Metadata extract from Title 507Tuli Calibre 14 05-29-2009 03:13 AM


All times are GMT -4. The time now is 09:02 PM.


MobileRead.com is a privately owned, operated and funded community.