![]() |
#271 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Serbian newspaper Borba:
|
![]() |
![]() |
#272 | |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
Code:
def preprocess_html(soup): for a in soup.findAll('a', href=True): a['href'] = '' return soup |
|
![]() |
Advert | |
|
![]() |
#273 |
Connoisseur
![]() Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
|
kiklop74 thank you very much for your time. Unfortunately I have to report that the al jazeera recipe does not work. I tried different IPs but the problem remains. Also mobireader can download correctly the feed so something else should be responsible...
|
![]() |
![]() |
#274 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Than this is a problem beyond my knowledge.
|
![]() |
![]() |
#275 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What error are you getting?
|
![]() |
Advert | |
|
![]() |
#276 | |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
|
Quote:
I suppose what I'm looking for is a way of filtering only when processing the feed link page. |
|
![]() |
![]() |
#277 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Try using postprocess_html instead. IIRC it's only called for downloaded html.
|
![]() |
![]() |
#278 |
Connoisseur
![]() Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
|
I am not getting an error. Everything finishes with success but I only get the first page with the date and the title and not any articles. The problem I guess should be with the site not the software. Thank you very much for your time!
|
![]() |
![]() |
#279 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Do other recipes work?
|
![]() |
![]() |
#280 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
There is some real problem with aljazeera. I tried to download the news from my machine at work. (dfifferent IP). It worked two times. Then third time (and every subsequent time) I got this:
Code:
Job: Fetch news from Al Jazeera in English INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html ERROR: Failed to download article: Pakistan Taliban chief calls truce from http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html ERROR: Failed to download article: Ex-Guantanamo inmate returns to UK from http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html ERROR: Failed to download article: Amnesty urges Israel arms embargo from http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html ERROR: Failed to download article: Three arrested over Cairo bombing from http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error DEBUG: INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html ERROR: Failed to download article: US offers more cash to ailing banks from http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: ERROR: Failed to download article: Bomb blast at Basque party office from http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: ERROR: Failed to download article: Arrests over Greek prison escape from http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html ERROR: Failed to download article: Aid workers killed in Darfur from http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to ERROR: Failed to download article: Israeli coalition talks continue from http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to DEBUG: Traceback (most recent call last): File "calibre\utils\threadpool.pyo", line 95, in run File "calibre\web\feeds\news.pyo", line 667, in fetch_article File "calibre\web\feeds\news.pyo", line 663, in _fetch_article Exception: Could not fetch article. Run with --debug to see the reason DEBUG: WARNING: Failed to download the following articles: WARNING: Pakistan Taliban chief calls truce from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to WARNING: Ex-Guantanamo inmate returns to UK from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to WARNING: Amnesty urges Israel arms embargo from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to WARNING: Three arrested over Cairo bombing from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to WARNING: US offers more cash to ailing banks from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to WARNING: Bomb blast at Basque party office from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to WARNING: Arrests over Greek prison escape from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to WARNING: Aid workers killed in Darfur from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to WARNING: Israeli coalition talks continue from AL JAZEERA ENGLISH (AJE) DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: INFO: Downloading DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html DEBUG: Error: Internal Server Error Traceback (most recent call last): File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url FetchError: Internal Server Error INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to Generating epub... [INFO] Parsing temp\calibre_0.4.139_xpvlfy_feeds2epub\index.html [INFO] Rationalizing fonts... [DEBUG] Done rationalizing [DEBUG] Processing HTMLFile:0:a:c:\docume~1\darko\locals~1\temp\calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html... [INFO] Parsing calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html [INFO] Rationalizing fonts... [DEBUG] Done rationalizing [DEBUG] Saving stylesheets... [INFO] Splitting calibre_title_page.html (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Splitting index.xhtml (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Splitting index_cr_1.xhtml (0 KB) [INFO] Looking for large trees... [INFO] No large trees found [INFO] Checking files for bad links... [INFO] Output written to c:\docume~1\darko\locals~1\temp\calibre_0.4.139_uy5kpn_feeds2epub.epub |
![]() |
![]() |
#281 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Hmm perhaps they have some anti-scraper tech in place or their server can't handle loads very well. Have you tried increasing delay and setting max_connections to 1?
|
![]() |
![]() |
#282 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
These results are obtained with these settings:
Code:
simultaneous_downloads = 1 delay = 4 |
![]() |
![]() |
#283 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 309
Karma: 1008082
Join Date: Feb 2009
Location: NYC
Device: Kindle PW, K4 Touch, iPad2, Samsung Galaxy S II
|
Wired.com not working
Hi, I get an error the RSS feed for "Wired". the error message is very long but it starts with the following
Code:
Job: **Fetch news from Wired.com** **tuple**: ('XMLSyntaxError', u"Failed to parse QName 'http:', line 1, column 143") **Traceback**: Traceback (most recent call last): File "/home/kovid/work/calibre/src/calibre/parallel.py", line 957, in worker File "/home/kovid/work/calibre/src/calibre/parallel.py", line 915, in work File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 69, in main File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 59, in convert File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/writer.py", line 603, in oeb2mobi ...................... |
![]() |
![]() |
#284 |
Wearer of Pants
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,050
Karma: 7634
Join Date: Jan 2008
Location: Norman, OK
Device: Amazon Kindle DX / iPhone
|
New York Review of Books (Subscription)
Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net' __docformat__ = 'restructuredtext en' ''' nybooks.com ''' from calibre.web.feeds.news import BasicNewsRecipe from lxml import html from calibre.constants import preferred_encoding class NewYorkReviewOfBooks(BasicNewsRecipe): title = u'New York Review of Books' description = u'Book reviews' language = _('English') __author__ = 'Kovid Goyal' needs_subscription = True remove_tags_before = {'id':'container'} remove_tags = [{'class':['noprint', 'ad', 'footer']}, {'id':'right-content'}] def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open('http://www.nybooks.com/register/') br.select_form(name='login') br['email'] = self.username br['password'] = self.password br.submit() return br def parse_index(self): root = html.fromstring(self.browser.open('http://www.nybooks.com/current-issue').read()) date = root.xpath('//h4[@class = "date"]')[0] self.timefmt = ' ['+date.text.encode(preferred_encoding)+']' articles = [] for tag in date.itersiblings(): if tag.tag == 'h4': break if tag.tag == 'p': if tag.get('class') == 'indented': articles[-1]['description'] += html.tostring(tag) else: href = tag.xpath('descendant::a[@href]')[0].get('href') article = { 'title': u''.join(tag.xpath('descendant::text()')), 'date' : '', 'url' : 'http://www.nybooks.com'+href, 'description': '', } articles.append(article) return [('Current Issue', articles)] |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |