View Single Post
Old 01-08-2010, 06:48 AM   #1097
wdrwc
Junior Member
wdrwc began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Device: htc hero
can't fetch urls from feed in ebook-convert

I try to prepare a recipe for the gazeta.pl. I am testing it on one of their feeds:
http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml

I prepared very simple custom recipe which should use printable version of the articles. However when I test the recipe with ebook-convert, articles are not dowloaded. ebook-convert reports it can not fetch articles, but the urls generated in the print_version() open without any problem in the browser.

Here is the part of the report from running ebook-convert --vv:
Code:
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Traceback (most recent call last):
  File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
  File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found

http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html saved to 
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Failed to download article: Korzystasz z Windows i Adobe Readera? Szykuj si� na �atanie... from http://technologie.gazeta.pl/technologie/1,82008,7432357,Korzystasz_z_Windows_i_Adobe_Readera__Szykuj_sie_na.html
Traceback (most recent call last):
  File "site-packages\calibre\utils\threadpool.py", line 95, in run
  File "site-packages\calibre\web\feeds\news.py", line 703, in fetch_article
  File "site-packages\calibre\web\feeds\news.py", line 699, in _fetch_article
Exception: Could not fetch article. Run with -vv to see the reason



2% Article download failed: u'Korzystasz z Windows i Adobe Readera? Szykuj si\u0119 na \u0142atanie...'
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Traceback (most recent call last):
  File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
  File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found

http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html saved to
And here is the recipe:
Code:
#!/usr/bin/env  python
'''
technologie.gazeta.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class TechnologieGazeta(BasicNewsRecipe):
    title          = u'TechnologieGazeta'
    description    = 'Wiadomości z technologie.gazeta.pl'
    language = 'en'

    language = 'pl'
    encoding = 'iso-8859-2'
    no_stylesheets = True
    remove_javascript = True
    max_articles_per_feed = 50
    simultaneous_downloads = 1

    feeds          = [
                      ('Wiadomosci Technologie gazeta.pl', 'http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml'),
                    ]

    def print_version(self, url):
        start, sep, rest = url.rpartition('/')
        numbers, sep, tytul = rest.rpartition(',')
        printversion = numbers.replace('1,','2029020,',1)
        print( numbers,'  ',printversion)
        return start + '/' + printversion + '.html'
I would appreciate any help or suggestion.

Thanks,
wdrwc
wdrwc is offline