![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Al Jazeera in English recipe needs an update?
Has worked fine for a long time, but suddenly now only downloads a 0.2MB file, that when opened only includes what I think is a cover page.
|
![]() |
![]() |
![]() |
#2 |
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Works for me. I'm on Calibre 1.18.0 on Windows and get about 15 articles when running Al Jazeera English.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
I'm on 1.18.0 as well. I just tried opening it in Calibre (instead of my device.)
All I get is one page: Failed feed: AL JAZEERA ENGLISH (AJE) HTTP Error 404: Not Found I wonder if we are using the same recipe... All I can find is one, though. And, the 404 error means the server is there, but can't find the specific content, which to me suggests a broken recipe. |
![]() |
![]() |
![]() |
#4 |
Connoisseur
![]() Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
This is working for me
I see I have been using a customised copy, although I think it was probably only the timefmt statement which I added or changed. I do not remember making any change which would effect content extraction.
Code:
__license__ = 'GPL v3' __copyright__ = '2009-2010, Darko Miletic <darko.miletic at gmail.com>' ''' english.aljazeera.net ''' from calibre.web.feeds.news import BasicNewsRecipe class AlJazeera(BasicNewsRecipe): title = 'Al Jazeera in English' __author__ = 'Darko Miletic oneillpt update' description = 'News from Middle East' language = 'en' timefmt = ' [%a, %d %b, %Y %H:%M]' publisher = 'Al Jazeera' category = 'news, politics, middle east' delay = 1 oldest_article = 2 max_articles_per_feed = 100 no_stylesheets = True encoding = 'iso-8859-1' use_embedded_content = False #ignore_duplicate_articles = {'url'} ignore_duplicate_articles = {'title', 'url'} cover_url = u'http://www.aljazeera.com/Media/ver2/Images/1pximage.png' extra_css = """ body{font-family: Arial,sans-serif} #ctl00_cphBody_dvSummary{font-weight: bold} #dvArticleDate{font-size: small; color: #999999} """ conversion_options = { 'comment' : description , 'tags' : category , 'publisher' : publisher , 'language' : language } keep_only_tags = [ dict(attrs={'id':['DetailedTitle','ctl00_cphBody_dvSummary','dvArticleDate']}) ,dict(name='td',attrs={'class':'DetailedSummary'}) ] remove_tags = [ dict(name=['object','link','table','meta','base','iframe','embed']) ,dict(name='td', attrs={'class':['MostActiveDescHeader','MostActiveDescBody']}) ] feeds = [(u'AL JAZEERA ENGLISH (AJE)', u'http://english.aljazeera.net/Services/Rss/?PostingId=2007731105943979989' )] def get_article_url(self, article): artlurl = article.get('link', None) return artlurl.replace('http://english.aljazeera.net//','http://english.aljazeera.net/') def preprocess_html(self, soup): for item in soup.findAll(style=True): del item['style'] for item in soup.findAll(face=True): del item['face'] td = soup.find('td',attrs={'class':'DetailedSummary'}) if td: td.name = 'div' spn = soup.find('span',attrs={'id':'DetailedTitle'}) if spn: spn.name='h1' for itm in soup.findAll('span', attrs={'id':['dvArticleDate','ctl00_cphBody_lblDate']}): itm.name = 'div' for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Hmmmm.... I am stumped then.
Update: I fixed it. I removed it (unchecked 'schedule for download') restarted Calibre and added it back again. Now it works... Last edited by NSILMike; 01-09-2014 at 10:28 AM. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Today's Zaman (english) recipe update | swerling | Recipes | 0 | 01-02-2013 11:33 AM |
English APA.AZ Recipe | BetterRed | Recipes | 2 | 11-26-2012 02:58 PM |
English Pravda recipe | Raskospoon | Recipes | 2 | 11-02-2012 05:23 AM |
Al Jazeera in english | teraflame | Recipes | 4 | 07-04-2012 08:16 AM |
Recipe for Skylife (English) | thomass | Recipes | 0 | 11-28-2011 09:49 PM |