|
|
#271 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Serbian newspaper Borba:
|
|
|
|
|
#272 | |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,615
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
Code:
def preprocess_html(soup):
for a in soup.findAll('a', href=True): a['href'] = ''
return soup
|
|
|
|
|
|
#273 |
|
Connoisseur
![]() Posts: 70
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
|
kiklop74 thank you very much for your time. Unfortunately I have to report that the al jazeera recipe does not work. I tried different IPs but the problem remains. Also mobireader can download correctly the feed so something else should be responsible...
|
|
|
|
|
#274 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Than this is a problem beyond my knowledge.
|
|
|
|
|
#275 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,615
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What error are you getting?
|
|
|
|
|
#276 | |
|
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
|
Quote:
I suppose what I'm looking for is a way of filtering only when processing the feed link page. |
|
|
|
|
|
#277 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,615
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try using postprocess_html instead. IIRC it's only called for downloaded html.
|
|
|
|
|
#278 |
|
Connoisseur
![]() Posts: 70
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
|
I am not getting an error. Everything finishes with success but I only get the first page with the date and the title and not any articles. The problem I guess should be with the site not the software. Thank you very much for your time!
|
|
|
|
|
#279 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,615
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Do other recipes work?
|
|
|
|
|
#280 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
There is some real problem with aljazeera. I tried to download the news from my machine at work. (dfifferent IP). It worked two times. Then third time (and every subsequent time) I got this:
Code:
Job: Fetch news from Al Jazeera in English INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html ERROR: Failed to download article: Pakistan Taliban chief calls truce from http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html ERROR: Failed to download article: Ex-Guantanamo inmate returns to UK from http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html ERROR: Failed to download article: Amnesty urges Israel arms embargo from http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html ERROR: Failed to download article: Three arrested over Cairo bombing from http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error DEBUG: INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html ERROR: Failed to download article: US offers more cash to ailing banks from http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: ERROR: Failed to download article: Bomb blast at Basque party office from http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: ERROR: Failed to download article: Arrests over Greek prison escape from http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html ERROR: Failed to download article: Aid workers killed in Darfur from http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to ERROR: Failed to download article: Israeli coalition talks continue from http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Failed to download the following articles: WARNING: Pakistan Taliban chief calls truce from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to WARNING: Ex-Guantanamo inmate returns to UK from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to WARNING: Amnesty urges Israel arms embargo from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to WARNING: Three arrested over Cairo bombing from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to WARNING: US offers more cash to ailing banks from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to WARNING: Bomb blast at Basque party office from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to WARNING: Arrests over Greek prison escape from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to WARNING: Aid workers killed in Darfur from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to WARNING: Israeli coalition talks continue from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to Generating epub... [INFO] Parsing temp\calibre_0.4.139_xpvlfy_feeds2epub\index.html [INFO] Rationalizing fonts... [DEBUG] Done rationalizing [DEBUG] Processing HTMLFile:0:a:c:\docume~1\darko\locals~1\temp\calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html... [INFO] Parsing calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html [INFO] Rationalizing fonts... [DEBUG] Done rationalizing [DEBUG] Saving stylesheets... [INFO] Splitting calibre_title_page.html (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Splitting index.xhtml (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Splitting index_cr_1.xhtml (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Checking files for bad links... [INFO] Output written to c:\docume~1\darko\locals~1\temp\calibre_0.4.139_uy5kpn_feeds2epub.epub |
|
|
|
|
#281 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,615
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Hmm perhaps they have some anti-scraper tech in place or their server can't handle loads very well. Have you tried increasing delay and setting max_connections to 1?
|
|
|
|
|
#282 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
These results are obtained with these settings:
Code:
simultaneous_downloads = 1
delay = 4
|
|
|
|
|
#283 |
|
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 309
Karma: 1008082
Join Date: Feb 2009
Location: NYC
Device: Kindle PW, K4 Touch, iPad2, Samsung Galaxy S II
|
Wired.com not working
Hi, I get an error the RSS feed for "Wired". the error message is very long but it starts with the following
Code:
Job: **Fetch news from Wired.com**
**tuple**: ('XMLSyntaxError', u"Failed to parse QName 'http:', line 1, column 143")
**Traceback**:
Traceback (most recent call last):
File "/home/kovid/work/calibre/src/calibre/parallel.py", line 957, in worker
File "/home/kovid/work/calibre/src/calibre/parallel.py", line 915, in work
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 69, in main
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 59, in convert
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/writer.py", line 603, in oeb2mobi
......................
|
|
|
|
|
#284 |
|
Wearer of Pants
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,050
Karma: 7634
Join Date: Jan 2008
Location: Norman, OK
Device: Amazon Kindle DX / iPhone
|
New York Review of Books (Subscription)
Code:
#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'
'''
nybooks.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from lxml import html
from calibre.constants import preferred_encoding
class NewYorkReviewOfBooks(BasicNewsRecipe):
title = u'New York Review of Books'
description = u'Book reviews'
language = _('English')
__author__ = 'Kovid Goyal'
needs_subscription = True
remove_tags_before = {'id':'container'}
remove_tags = [{'class':['noprint', 'ad', 'footer']}, {'id':'right-content'}]
def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open('http://www.nybooks.com/register/')
br.select_form(name='login')
br['email'] = self.username
br['password'] = self.password
br.submit()
return br
def parse_index(self):
root = html.fromstring(self.browser.open('http://www.nybooks.com/current-issue').read())
date = root.xpath('//h4[@class = "date"]')[0]
self.timefmt = ' ['+date.text.encode(preferred_encoding)+']'
articles = []
for tag in date.itersiblings():
if tag.tag == 'h4': break
if tag.tag == 'p':
if tag.get('class') == 'indented':
articles[-1]['description'] += html.tostring(tag)
else:
href = tag.xpath('descendant::a[@href]')[0].get('href')
article = {
'title': u''.join(tag.xpath('descendant::text()')),
'date' : '',
'url' : 'http://www.nybooks.com'+href,
'description': '',
}
articles.append(article)
return [('Current Issue', articles)]
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 03:52 AM |
| Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 01:33 PM |
| How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 06:08 AM |
| Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 05:54 PM |
| Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 03:37 PM |