![]() |
#1336 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for TidBITS:
|
![]() |
![]() |
#1337 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Gizmodo:
|
![]() |
Advert | |
|
![]() |
#1338 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for News Straits Times from Malaysia:
|
![]() |
![]() |
#1339 | ||
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505, Asus eeePC 1000H
|
I'm still having trouble to get a recipe for
http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml cleared of unnecessary clutter, am still getting artifacts. The modified basic news recipe works in principle and removes much of the clutter but still includes, among others, a "ghost" of an add: Quote:
I want to get a "filtered" recipe going to scan several rss-feeds and filter out all articles that don't contain certain keywords so that only news items that do contain those keywords are included in the created e-book, thus creating an instant press review on a certain theme/person/event etc. Kovidgoyal has confirmed the possibility of doing this with calibre: Quote:
|
||
![]() |
![]() |
#1340 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Quote:
Code:
remove_attributes = ['width','height'] remove_tags_before = dict(name='h1') remove_tags_after = dict(name='div',attrs={'class':'ynw-article-body mod'}) remove_tags = [ dict(attrs={'id':['ynw-image-video-inset','ynw-more-news']}) ,dict(attrs={'class':['ynw-utility']}) ] |
|
![]() |
Advert | |
|
![]() |
#1341 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
New recipe for Read It Later website:
|
![]() |
![]() |
#1342 |
Member
![]() Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
|
In trying to create a custom recipe I got as far as posting the feeds and getting the print version but I'm having trouble cleaning up the extra links at the bottome of each article. At the end of the article the HTML file looks like:
</p> </div> <div class="print-logo"></div> <hr class="calibre3"/> <div class="print-logo"></div> <div class="print-logo"> <p class="calibre5"><a href="https://www.neodata.com/ITPS2.cgi?OrderType=Reply+Only&ItemCode=WSTD&a mp;iResponse=WSTD.NEW">Subscribe now to The Weekly Standard!</a></p> <p class="calibre5"><b class="calibre6">Get more from The Weekly Standard:</b> <a href="/feeds">Follow WeeklyStandard.com on RSS</a> and <a href="/newsletter/requestform.asp">sign-up for our free Newsletter.</a></p> <p class="calibre5"><a href="/tws/advertising/default.asp">Contact our advertising team</a> for advertising and sponsorship on WeeklyStandard.com or in <b class="calibre6">The Weekly Standard.</b></p> <p class="calibre5">Copyright 2010 Weekly Standard LLC.</p> </div> <hr class="calibre3"/> <div class="print-logo"><strong class="calibre6">Source URL:</strong> <a href="http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program">http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program</a></div> <div class="print-logo"></div> <div class="navbar1"> <hr class="calibre3"/> <p class="calibre7"> This article was downloaded by <b class="calibre6">calibre</b> from <a href="http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program">http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program</a> </p> <br class="print-logo"/><br class="print-logo"/> | <a href="../index.html#article_0">Section menu</a> | </div></body> </html> Can someone tell me how best to eliminate this? Thanks, Denny |
![]() |
![]() |
#1343 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
|
Just add this to your recipe
Quote:
|
|
![]() |
![]() |
#1344 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Dec 2008
Device: none
|
Hi Everybody ! I am New here but for 2 days I have been around this wonderfull site.
May I request a help of recipe for http://www.islamqa.com/en/rss.xml Thank You Once Again |
![]() |
![]() |
#1345 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505
|
Hi
![]() I try to make a recipe for http://szmobil.sueddeutsche.de/ This ist the code up to now (with which I get - IndexError: list index out of range -Error Code: 1). Am I on the right way with that? Can somebody please tell me what is wrong. ![]() ... def parse_index(self): feeds = [] for title, url in [('Politik', 'http://szmobil.sueddeutsche.de/show.php?section=Politik'), ('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'), ('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'), ('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'), ('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'), ('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'), ('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'), ('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'), ('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'), ('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern'), ]: articles = self.nz_parse_section(url) if articles: feeds.append((title, articles)) return feeds def nz_parse_section(self, url): soup = self.index_to_soup(url) # div = soup.find(attrs={'class': 'col-300 categoryList'}) # date = div.find(attrs={'class': 'link-list-heading'}) current_articles = [] # for tag in date.findAllNext(attrs = {'class': ['linkList', 'link-list-heading']}): # if tag.get('class') == 'link-list-heading': # break for li in soup.findAll('li'): a = li.find('a', href = True) if a is None: continue title = self.tag_to_string(a) url = a.get('href', False) if not url or not title: continue # if url.startswith('/'): # url = 'http://www.nzherald.co.nz'+url self.log('\t\tFound article:', title) self.log('\t\t\t', url) current_articles.append({'title': title, 'url': url, 'description':'', 'date':''}) return current_articles |
![]() |
![]() |
#1346 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Updated recipe for The New Republic
Fix for a stylesheet issue; add feed for The Book
Code:
from calibre.web.feeds.news import BasicNewsRecipe class The_New_Republic(BasicNewsRecipe): title = 'The New Republic' __author__ = 'cix3' language = 'en' description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture' timefmt = ' [%b %d, %Y]' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True remove_tags = [ dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}), dict(name='hr', attrs={'class':'print-hr'}), dict(name='img') ] feeds = [ ('Politics', 'http://www.tnr.com/rss/articles/Politics'), ('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'), ('Economy', 'http://www.tnr.com/rss/articles/Economy'), ('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'), ('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'), ('Metro Policy', 'http://www.tnr.com/rss/articles/Metro-Policy'), ('World', 'http://www.tnr.com/rss/articles/World'), ('Film', 'http://www.tnr.com/rss/articles/Film'), ('Books', 'http://www.tnr.com/rss/articles/books'), ('The Book', 'http://www.tnr.com/rss/book'), ('Jonathan Chait', 'http://www.tnr.com/rss/blogs/Jonathan-Chait'), ('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'), ('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'), ('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'), ('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'), ('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'), ('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'), ('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'), ('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'), ('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'), ('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter') ] def print_version(self, url): return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/') |
![]() |
![]() |
#1347 |
Member
![]() Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
|
hi,
i haven't tried anything yet, but one thing i noticed is the comma after your last feed entry which means that array is expecting another entry and finds a blank instead. Code:
('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'), ('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'), ('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'), ('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'), ('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'), ('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'), ('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'), ('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'), ('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern'), Last edited by lorenzov; 02-03-2010 at 08:42 PM. |
![]() |
![]() |
#1348 |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
may i request a recipe for http://sethgodin.typepad.com/
|
![]() |
![]() |
#1349 |
Member
![]() Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
|
Seth Godin's Blog
Very basic recipe. Feel free to enhance in any way...
Code:
class SethGodin(BasicNewsRecipe): title = 'Seth Godins Blog' __author__ = 'cix3' language = 'en' description = 'Seth Godin - riffs on marketing, respect, and the ways ideas spread.' timefmt = ' [%b %d, %Y]' oldest_article = 30 max_articles_per_feed = 100 no_stylesheets = True remove_tags = [dict(name='script')] feeds = [('SethGodin', 'http://feeds.feedburner.com/typepad/sethsmainblog')] |
![]() |
![]() |
#1350 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Feb 2010
Device: Kindle
|
Digital Spy
Hi Guys,
I love the Digital Spy site as it is a great source for anything to do with entertainment. It would be great to get it on my Kindle but I don't have a clue. I have looked at the recipe section and it may as well be written in Russian. The site is www.digitalspy.co.uk any help and direction on how to get this as a newspaper download would be appreciated. REGARDS Martyn |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |