|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Al Jazeera in English recipe needs an update?
Has worked fine for a long time, but suddenly now only downloads a 0.2MB file, that when opened only includes what I think is a cover page.
|
|
|
|
|
|
#2 |
|
Enthusiast
![]() ![]() Posts: 43
Karma: 136
Join Date: Mar 2011
Device: Kindle Paperwhite
|
Works for me. I'm on Calibre 1.18.0 on Windows and get about 15 articles when running Al Jazeera English.
|
|
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
I'm on 1.18.0 as well. I just tried opening it in Calibre (instead of my device.)
All I get is one page: Failed feed: AL JAZEERA ENGLISH (AJE) HTTP Error 404: Not Found I wonder if we are using the same recipe... All I can find is one, though. And, the 404 error means the server is there, but can't find the specific content, which to me suggests a broken recipe. |
|
|
|
|
|
#4 |
|
Connoisseur
![]() Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
|
This is working for me
I see I have been using a customised copy, although I think it was probably only the timefmt statement which I added or changed. I do not remember making any change which would effect content extraction.
Code:
__license__ = 'GPL v3'
__copyright__ = '2009-2010, Darko Miletic <darko.miletic at gmail.com>'
'''
english.aljazeera.net
'''
from calibre.web.feeds.news import BasicNewsRecipe
class AlJazeera(BasicNewsRecipe):
title = 'Al Jazeera in English'
__author__ = 'Darko Miletic oneillpt update'
description = 'News from Middle East'
language = 'en'
timefmt = ' [%a, %d %b, %Y %H:%M]'
publisher = 'Al Jazeera'
category = 'news, politics, middle east'
delay = 1
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'iso-8859-1'
use_embedded_content = False
#ignore_duplicate_articles = {'url'}
ignore_duplicate_articles = {'title', 'url'}
cover_url = u'http://www.aljazeera.com/Media/ver2/Images/1pximage.png'
extra_css = """
body{font-family: Arial,sans-serif}
#ctl00_cphBody_dvSummary{font-weight: bold}
#dvArticleDate{font-size: small; color: #999999}
"""
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
keep_only_tags = [
dict(attrs={'id':['DetailedTitle','ctl00_cphBody_dvSummary','dvArticleDate']})
,dict(name='td',attrs={'class':'DetailedSummary'})
]
remove_tags = [
dict(name=['object','link','table','meta','base','iframe','embed'])
,dict(name='td', attrs={'class':['MostActiveDescHeader','MostActiveDescBody']})
]
feeds = [(u'AL JAZEERA ENGLISH (AJE)', u'http://english.aljazeera.net/Services/Rss/?PostingId=2007731105943979989' )]
def get_article_url(self, article):
artlurl = article.get('link', None)
return artlurl.replace('http://english.aljazeera.net//','http://english.aljazeera.net/')
def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll(face=True):
del item['face']
td = soup.find('td',attrs={'class':'DetailedSummary'})
if td:
td.name = 'div'
spn = soup.find('span',attrs={'id':'DetailedTitle'})
if spn:
spn.name='h1'
for itm in soup.findAll('span', attrs={'id':['dvArticleDate','ctl00_cphBody_lblDate']}):
itm.name = 'div'
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 735
Karma: 35936
Join Date: Apr 2011
Location: Shrewsury, MA
Device: Lenovo Android Tablet
|
Hmmmm.... I am stumped then.
Update: I fixed it. I removed it (unchecked 'schedule for download') restarted Calibre and added it back again. Now it works... Last edited by NSILMike; 01-09-2014 at 11:28 AM. |
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Today's Zaman (english) recipe update | swerling | Recipes | 0 | 01-02-2013 12:33 PM |
| English APA.AZ Recipe | BetterRed | Recipes | 2 | 11-26-2012 03:58 PM |
| English Pravda recipe | Raskospoon | Recipes | 2 | 11-02-2012 06:23 AM |
| Al Jazeera in english | teraflame | Recipes | 4 | 07-04-2012 09:16 AM |
| Recipe for Skylife (English) | thomass | Recipes | 0 | 11-28-2011 10:49 PM |