I try to prepare a recipe for the gazeta.pl. I am testing it on one of their feeds:
http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml
I prepared very simple custom recipe which should use printable version of the articles. However when I test the recipe with ebook-convert, articles are not dowloaded. ebook-convert reports it can not fetch articles, but the urls generated in the print_version() open without any problem in the browser.
Here is the part of the report from running ebook-convert --vv:
Code:
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html
Traceback (most recent call last):
File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found
http://technologie.gazeta.pl/technologie/2029020,82008,7432357.html saved to
Downloading
Fetching http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Failed to download article: Korzystasz z Windows i Adobe Readera? Szykuj si� na �atanie... from http://technologie.gazeta.pl/technologie/1,82008,7432357,Korzystasz_z_Windows_i_Adobe_Readera__Szykuj_sie_na.html
Traceback (most recent call last):
File "site-packages\calibre\utils\threadpool.py", line 95, in run
File "site-packages\calibre\web\feeds\news.py", line 703, in fetch_article
File "site-packages\calibre\web\feeds\news.py", line 699, in _fetch_article
Exception: Could not fetch article. Run with -vv to see the reason
2% Article download failed: u'Korzystasz z Windows i Adobe Readera? Szykuj si\u0119 na \u0142atanie...'
Could not fetch link http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html
Traceback (most recent call last):
File "site-packages\calibre\web\fetch\simple.py", line 401, in process_links
File "site-packages\calibre\web\fetch\simple.py", line 208, in fetch_url
FetchError: Not Found
http://technologie.gazeta.pl/technologie/2029020,82008,7432282.html saved to
And here is the recipe:
Code:
#!/usr/bin/env python
'''
technologie.gazeta.pl
'''
from calibre.web.feeds.news import BasicNewsRecipe
class TechnologieGazeta(BasicNewsRecipe):
title = u'TechnologieGazeta'
description = 'Wiadomości z technologie.gazeta.pl'
language = 'en'
language = 'pl'
encoding = 'iso-8859-2'
no_stylesheets = True
remove_javascript = True
max_articles_per_feed = 50
simultaneous_downloads = 1
feeds = [
('Wiadomosci Technologie gazeta.pl', 'http://serwisy.gazeta.pl/pub/rss/fb-technologie.xml'),
]
def print_version(self, url):
start, sep, rest = url.rpartition('/')
numbers, sep, tytul = rest.rpartition(',')
printversion = numbers.replace('1,','2029020,',1)
print( numbers,' ',printversion)
return start + '/' + printversion + '.html'
I would appreciate any help or suggestion.
Thanks,
wdrwc