Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-22-2009, 07:01 AM   #271
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Serbian newspaper Borba:
Attached Files
File Type: zip borba.zip (2.9 KB, 350 views)
kiklop74 is offline  
Old 02-22-2009, 09:19 AM   #272
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by howsey View Post
Thanks for that. I've now got it working reasonably well. The next issue is that the article contains hyperlinks. The default processing seems to be to replace these with the element text and then include the url in brackets afterwards. Is there a way to stop the url coming out. My initial thought was to try the pre/post processing functions but this appears to filter out way too early.
Code:
def preprocess_html(soup):
    for a in soup.findAll('a', href=True): a['href'] = ''
    return soup
kovidgoyal is offline  
Advert
Old 02-22-2009, 09:25 AM   #273
crAss
Connoisseur
crAss began at the beginning.
 
Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
kiklop74 thank you very much for your time. Unfortunately I have to report that the al jazeera recipe does not work. I tried different IPs but the problem remains. Also mobireader can download correctly the feed so something else should be responsible...
crAss is offline  
Old 02-22-2009, 10:54 AM   #274
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Than this is a problem beyond my knowledge.
kiklop74 is offline  
Old 02-22-2009, 11:03 AM   #275
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by crAss View Post
kiklop74 thank you very much for your time. Unfortunately I have to report that the al jazeera recipe does not work. I tried different IPs but the problem remains. Also mobireader can download correctly the feed so something else should be responsible...
What error are you getting?
kovidgoyal is offline  
Advert
Old 02-22-2009, 02:12 PM   #276
howsey
Junior Member
howsey began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Feb 2009
Device: Sony Reader
Quote:
Originally Posted by kovidgoyal View Post
Code:
def preprocess_html(soup):
    for a in soup.findAll('a', href=True): a['href'] = ''
    return soup
When I try this, it strips everything out i.e. I just end up with a book containing a cover page, a summary page and then two pages (one for each feed). Each feed page is empty apart from the title.

I suppose what I'm looking for is a way of filtering only when processing the feed link page.
howsey is offline  
Old 02-22-2009, 04:45 PM   #277
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try using postprocess_html instead. IIRC it's only called for downloaded html.
kovidgoyal is offline  
Old 02-23-2009, 01:12 PM   #278
crAss
Connoisseur
crAss began at the beginning.
 
Posts: 68
Karma: 20
Join Date: Jan 2009
Location: Athens, Greece
Device: Cybook Gen3
Quote:
Originally Posted by kovidgoyal View Post
What error are you getting?
I am not getting an error. Everything finishes with success but I only get the first page with the date and the title and not any articles. The problem I guess should be with the site not the software. Thank you very much for your time!
crAss is offline  
Old 02-23-2009, 01:15 PM   #279
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Do other recipes work?
kovidgoyal is offline  
Old 02-23-2009, 01:23 PM   #280
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
There is some real problem with aljazeera. I tried to download the news from my machine at work. (dfifferent IP). It worked two times. Then third time (and every subsequent time) I got this:

Code:
Job: Fetch news from Al Jazeera in English
INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html

WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html

ERROR: Failed to download article: Pakistan Taliban chief calls truce from http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html
WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html

ERROR: Failed to download article: Ex-Guantanamo inmate returns to UK from http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html

ERROR: Failed to download article: Amnesty urges Israel arms embargo from http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html

ERROR: Failed to download article: Three arrested over Cairo bombing from http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html


WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
DEBUG: 


INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html

WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html

WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html

ERROR: Failed to download article: US offers more cash to ailing banks from http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html
WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


ERROR: Failed to download article: Bomb blast at Basque party office from http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


ERROR: Failed to download article: Arrests over Greek prison escape from http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to 

INFO: Downloading

DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html

ERROR: Failed to download article: Aid workers killed in Darfur from http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html
WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html

DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to 

ERROR: Failed to download article: Israeli coalition talks continue from http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html


DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to 


DEBUG: Traceback (most recent call last):
  File "calibre\utils\threadpool.pyo", line 95, in run
  File "calibre\web\feeds\news.pyo", line 667, in fetch_article
  File "calibre\web\feeds\news.pyo", line 663, in _fetch_article
Exception: Could not fetch article. Run with --debug to see the reason


DEBUG: 


WARNING: Failed to download the following articles:

WARNING: Pakistan Taliban chief calls truce from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html
WARNING: Could not fetch link http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/asia/2009/02/2009223165133105227.html saved to 


WARNING: Ex-Guantanamo inmate returns to UK from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223154524900913.html saved to 


WARNING: Amnesty urges Israel arms embargo from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922313453545186.html saved to 


WARNING: Three arrested over Cairo bombing from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/200922381126139132.html saved to 


WARNING: US offers more cash to ailing banks from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html
WARNING: Could not fetch link http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/americas/2009/02/2009223152727975795.html saved to 


WARNING: Bomb blast at Basque party office from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223142314336958.html saved to 


WARNING: Arrests over Greek prison escape from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html
WARNING: Could not fetch link http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/europe/2009/02/2009223135835227829.html saved to 


WARNING: Aid workers killed in Darfur from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html
WARNING: Could not fetch link http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/africa/2009/02/2009223125912659355.html saved to 


WARNING: Israeli coalition talks continue from AL JAZEERA ENGLISH (AJE)

DEBUG: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html

DEBUG: INFO: Downloading
DEBUG: Fetching http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html
WARNING: Could not fetch link http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html
DEBUG: Error: Internal Server Error
Traceback (most recent call last):
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 390, in process_links
  File "C:\Program Files\calibre\library.zip\calibre\web\fetch\simple.py", line 191, in fetch_url
FetchError: Internal Server Error
INFO: http://english.aljazeera.net//news/middleeast/2009/02/20092238327515157.html saved to 


Generating epub...
[INFO] 	Parsing temp\calibre_0.4.139_xpvlfy_feeds2epub\index.html
[INFO] 		Rationalizing fonts...
[DEBUG] 		Done rationalizing
[DEBUG] Processing HTMLFile:0:a:c:\docume~1\darko\locals~1\temp\calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html...
[INFO] 	Parsing calibre_0.4.139_xpvlfy_feeds2epub\feed_0\index.html
[INFO] 		Rationalizing fonts...
[DEBUG] 		Done rationalizing
[DEBUG] Saving stylesheets...
[INFO] 	Splitting calibre_title_page.html (0 KB)
[INFO] 	Looking for large trees...
[INFO] 	No large trees found
[INFO] 	Splitting index.xhtml (0 KB)
[INFO] 	Looking for large trees...
[INFO] 	No large trees found
[INFO] 	Splitting index_cr_1.xhtml (0 KB)
[INFO] 	Looking for large trees...
[INFO] 	No large trees found
[INFO] 	Checking files for bad links...
[INFO] Output written to c:\docume~1\darko\locals~1\temp\calibre_0.4.139_uy5kpn_feeds2epub.epub
kiklop74 is offline  
Old 02-23-2009, 01:51 PM   #281
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,398
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Hmm perhaps they have some anti-scraper tech in place or their server can't handle loads very well. Have you tried increasing delay and setting max_connections to 1?
kovidgoyal is offline  
Old 02-23-2009, 02:40 PM   #282
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
These results are obtained with these settings:

Code:
    simultaneous_downloads = 1
    delay                  = 4
I already suspected on previous page that they do have anti-scrape protection.
kiklop74 is offline  
Old 02-23-2009, 03:31 PM   #283
shinew
Addict
shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.shinew ought to be getting tired of karma fortunes by now.
 
Posts: 309
Karma: 1008082
Join Date: Feb 2009
Location: NYC
Device: Kindle PW, K4 Touch, iPad2, Samsung Galaxy S II
Wired.com not working

Hi, I get an error the RSS feed for "Wired". the error message is very long but it starts with the following
Code:
Job: **Fetch news from Wired.com**
**tuple**: ('XMLSyntaxError', u"Failed to parse QName 'http:', line 1, column 143")
**Traceback**:
Traceback (most recent call last):
File "/home/kovid/work/calibre/src/calibre/parallel.py", line 957, in worker
File "/home/kovid/work/calibre/src/calibre/parallel.py", line 915, in work
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 69, in main
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/from_feeds.py", line 59, in convert
File "/home/kovid/work/calibre/src/calibre/ebooks/mobi/writer.py", line 603, in oeb2mobi
......................
does that mean it needs a new custom Recipe? if so, can someone provide one? thank you!
shinew is offline  
Old 02-23-2009, 03:52 PM   #284
Gideon
Wearer of Pants
Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.Gideon knows the square root of minus one.
 
Gideon's Avatar
 
Posts: 1,050
Karma: 7634
Join Date: Jan 2008
Location: Norman, OK
Device: Amazon Kindle DX / iPhone
New York Review of Books (Subscription)

Code:
#!/usr/bin/env  python
__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal kovid@kovidgoyal.net'
__docformat__ = 'restructuredtext en'

'''
nybooks.com
'''

from calibre.web.feeds.news import BasicNewsRecipe
from lxml import html
from calibre.constants import preferred_encoding

class NewYorkReviewOfBooks(BasicNewsRecipe):
    
    title = u'New York Review of Books'
    description = u'Book reviews'
    language = _('English')
    __author__ = 'Kovid Goyal' 
    needs_subscription = True
    remove_tags_before = {'id':'container'}
    remove_tags = [{'class':['noprint', 'ad', 'footer']}, {'id':'right-content'}]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        if self.username is not None and self.password is not None:
            br.open('http://www.nybooks.com/register/')
            br.select_form(name='login')
            br['email']   = self.username
            br['password'] = self.password
            br.submit()
        return br
    
    def parse_index(self):
        root = html.fromstring(self.browser.open('http://www.nybooks.com/current-issue').read())
        date = root.xpath('//h4[@class = "date"]')[0]
        self.timefmt = ' ['+date.text.encode(preferred_encoding)+']'
        articles = []
        for tag in date.itersiblings():
            if tag.tag == 'h4': break
            if tag.tag == 'p':
                if tag.get('class') == 'indented':
                    articles[-1]['description'] += html.tostring(tag)
                else:
                    href = tag.xpath('descendant::a[@href]')[0].get('href')
                    article = {
                               'title': u''.join(tag.xpath('descendant::text()')),
                               'date' : '',
                               'url'  : 'http://www.nybooks.com'+href,
                               'description': '',
                               }
                    articles.append(article)
                    
        return [('Current Issue', articles)]
Gideon is offline  
Old 02-23-2009, 04:27 PM   #285
mgorokhov
Junior Member
mgorokhov began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Feb 2009
Device: prs-505
Hi! I am new here. Is it possible to get a recipe for the russian newspaper "Komsomolskaya Pravda" www.kp.ru. It is in russian
Thanks!
mgorokhov is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 10:42 AM.


MobileRead.com is a privately owned, operated and funded community.