02-18-2009, 03:47 AM | #1 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
the feed only contains a link ...
hi Kovid,
I'm writing a brief recipe for an italian newspaper, http://www.repubblica.it . Using the simplified interface, I see that the feed just contains links to the articles (http://www.repubblica.it/rss/homepage/rss2.0.xml). In the manual, you mention this as a typical problem (Now we will look at a news source that does not provide full content feeds. In such feeds, the full article is a webpage and the feed only contains a link to the webpage with a short summary of the article.) - but then you forgot to explain how to solve it, explaining instead how to obtain the prionted version... maybe it's a dumb question, but python is really a stranger to me so maybe if you could explain the basic steps... thanks! alessandro |
02-18-2009, 04:02 AM | #2 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
On this page you have all the feeds from that newspaper
http://www.repubblica.it/servizi/rss...tml?ref=hpfoot |
Advert | |
|
02-18-2009, 04:22 AM | #3 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
yes, in fact I took them exactly there.
Then, when you insert any of them in the simplified calibre feed interface, and download the newspaper, what you get is just a brief list of headlines, each pointing to the corresponding article. What I need is the hint on how to modify the recipe - in the advanced interface - to substitute those with their links. alessandro |
02-18-2009, 06:11 AM | #4 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Show us your code.
|
02-18-2009, 07:22 AM | #5 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
the code? Nothing to see there, in fact it's just the default produced by calibre:
Code:
class AdvancedUserRecipe1234959710(BasicNewsRecipe): title = u'la Repubblica' oldest_article = 1 max_articles_per_feed = 100 feeds = [(u'Repubblica homepage', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'), (u'Repubblica Scienze', u'http://www.repubblica.it/rss/scienze/rss2.0.xml'), (u'Repubblica Tecnologia', u'http://www.repubblica.it/rss/tecnologia/rss2.0.xml'), (u'Repubblica Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml')] alessandro |
Advert | |
|
02-18-2009, 08:20 AM | #6 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
I tried this recipe and it gets complete articles. You just need to filter out the trash.
|
02-18-2009, 08:29 AM | #7 |
Guru
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Here is extended recipe that filters out the extra trash:
Code:
class AdvancedUserRecipe1234959710(BasicNewsRecipe): title = u'la Repubblica' oldest_article = 1 max_articles_per_feed = 100 remove_javascript = True no_stylesheets = True keep_only_tags = [dict(name='div', attrs={'class':'articolo'})] remove_tags = [ dict(name=['object','link']) ,dict(name='span',attrs={'class':'linkindice'}) ,dict(name='div',attrs={'class':'bottom-mobile'}) ,dict(name='div',attrs={'id':['rssdiv','blocco']}) ] feeds = [(u'Repubblica homepage', u'http://www.repubblica.it/rss/homepage/rss2.0.xml'), (u'Repubblica Scienze', u'http://www.repubblica.it/rss/scienze/rss2.0.xml'), (u'Repubblica Tecnologia', u'http://www.repubblica.it/rss/tecnologia/rss2.0.xml'), (u'Repubblica Esteri', u'http://www.repubblica.it/rss/esteri/rss2.0.xml')] |
02-18-2009, 08:43 AM | #8 |
Addict
Posts: 223
Karma: 356
Join Date: Aug 2007
Device: Rocket; Hiebook; N700; Sony 505; Kindle DX ...
|
thanks a lot, it works fine now!
alessandro |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Break up feed | BrianG | Calibre | 2 | 01-09-2010 06:13 PM |
RSS Feed | timezone | Feedback | 8 | 01-02-2010 06:55 PM |
Rotate images/feed | fishfuzz | Calibre | 4 | 05-11-2009 02:05 AM |
feed issue on os x | greenspiral | Calibre | 2 | 12-20-2008 08:08 AM |