01-04-2011, 03:45 AM | #16 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Thanks for your Answer Kovid!
But what if I want to get the article? Why can't my recipe download it? |
01-04-2011, 11:20 AM | #17 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you have look and see why the link element has no href on the website and figure out an alternative
|
Advert | |
|
01-04-2011, 03:52 PM | #18 | ||
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
I can't get it.
I mean, in the debug.log (for the previous version of the scrapped site): https://gist.github.com/749781 Here is one section with its articles, as shown in the log: Quote:
Quote:
|
||
01-18-2011, 08:31 AM | #19 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Still can't get some articles
Hi All,
With my updated recipe (which still needs refactoring) at https://gist.github.com/749788 I still can't get some of the articles which were recognized by parse_index as valid feed items (and can access them via my browser). Could someone tell me why? Here is the debug.log: https://gist.github.com/749781 Important part is: Code:
Could not fetch link http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon Traceback (most recent call last): File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 428, in process_links soup = self.get_soup(dsrc) File "/usr/lib/calibre/calibre/web/fetch/simple.py", line 189, in get_soup return self.preprocess_html_ext(soup) File "/tmp/calibre_0.7.40_tmp_fNd0OI/calibre_0.7.40_CGdmix_recipes/recipe0.py", line 144, in preprocess_html url = links['href'] File "/usr/lib/calibre/calibre/ebooks/BeautifulSoup.py", line 518, in __getitem__ return self._getAttrMap()[key] KeyError: 'href' http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon saved to Downloading Fetching http://www.es.hu/2011-01-16_esse-delendam Failed to download article: KOLTAY ANDRÁS Van-e sajtószabadság Magyarországon? from http://www.es.hu/2011-01-16_van-e-sajtoszabadsag-magyarorszagon Traceback (most recent call last): File "/usr/lib/calibre/calibre/utils/threadpool.py", line 95, in run (request, request.callable(*request.args, **request.kwds)) File "/usr/lib/calibre/calibre/web/feeds/news.py", line 846, in fetch_article return self._fetch_article(url, dir, f, a, num_of_feeds) File "/usr/lib/calibre/calibre/web/feeds/news.py", line 842, in _fetch_article raise Exception(_('Could not fetch article. Run with -vv to see the reason')) Exception: Nem lehet a cikket letölteni. Futtassa a -vv paraméterrel a hibaüzenetek megjelenítéséhez |
01-18-2011, 11:00 AM | #20 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I can't, but I can steer you to some debugging. When Calibre asks for an article, the way it asks differs from the request made by a browser. The trick is to make the browser look like Calibre or vice-versa. It can be a cookie issue, a header issue (referer, etc.). Use LiveHTTP Headers or TamperData in FireFox to control the browser. Use the browser and header commands in the recipe to see and modify headers/cookies/referer in your recipe. When they are the same, you will get the same results.
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
ADD Books & extract tags from title? | johnb0647 | Calibre | 3 | 01-08-2011 05:36 PM |
Article tweak for title sort not working | Manichean | Calibre | 2 | 10-04-2010 11:56 AM |
Initial parse failed: | mburgoa | Calibre | 4 | 08-07-2010 08:50 AM |
Metadata extract from Title | 507Tuli | Calibre | 14 | 05-29-2009 03:13 AM |