Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 02-03-2010, 06:35 AM   #1336
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for TidBITS:
Attached Files
File Type: zip tidbits.zip (1.8 KB, 205 views)
kiklop74 is offline  
Old 02-03-2010, 06:52 AM   #1337
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Gizmodo:
Attached Files
File Type: zip gizmodo.zip (2.2 KB, 209 views)
kiklop74 is offline  
Advert
Old 02-03-2010, 07:16 AM   #1338
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for News Straits Times from Malaysia:
Attached Files
File Type: zip newstraitstimes.zip (1.4 KB, 197 views)
kiklop74 is offline  
Old 02-03-2010, 08:04 AM   #1339
TBR
Junior Member
TBR began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505, Asus eeePC 1000H
I'm still having trouble to get a recipe for

http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml

cleared of unnecessary clutter, am still getting artifacts.
The modified basic news recipe works in principle and removes much of the clutter but still includes, among others, a "ghost" of an add:
Quote:
class AdvancedUserRecipe1264591440(BasicNewsRecipe):
title = u'Bundeswehr'
oldest_article = 7
max_articles_per_feed = 100
remove_tags_after = dict(name='div', attrs={'id':'content'})
remove_tags_before = dict(name='div', attrs={'id':'content'})
feeds = [(u'Bundeswehr in AFP und AP', u'http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml')]
Could anyone jump in with advice?

I want to get a "filtered" recipe going to scan several rss-feeds and filter out all articles that don't contain certain keywords so that only news items that do contain those keywords are included in the created e-book, thus creating an instant press review on a certain theme/person/event etc. Kovidgoyal has confirmed the possibility of doing this with calibre:
Quote:
Originally Posted by kovidgoyal View Post
If you've seen http://bazaar.launchpad.net/~kovid/c.../feeds/news.py

there's not much more I can tell you. Basically, you can completely customize the news download process by overring the methods of that class. So if you want to create a compsite recipe you would create a parse_index method that will list all the current articles in your various news sources. Then you would override postprocess_html to check for the required keywords and if absent return None
but I'm afraid that this is currently beyond my programming/scripting skills. As this would be a rather extensive recipe I'm hesitant to simply request it in this forum but could someone post a recipe with a keyword filter so I can learn from the example?
TBR is offline  
Old 02-03-2010, 09:32 AM   #1340
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Quote:
Originally Posted by TBR View Post
I'm still having trouble to get a recipe for

http://p.yimg.com/bw/rss/nachrichten/bundeswehr.xml

cleared of unnecessary clutter, am still getting artifacts.
The modified basic news recipe works in principle and removes much of the clutter but still includes, among others, a "ghost" of an add:

Could anyone jump in with advice?
This is what you should put in your recipe for complete cleanup:

Code:
    remove_attributes  = ['width','height']
    remove_tags_before = dict(name='h1')
    remove_tags_after  = dict(name='div',attrs={'class':'ynw-article-body mod'})
    remove_tags        = [
                            dict(attrs={'id':['ynw-image-video-inset','ynw-more-news']})
                           ,dict(attrs={'class':['ynw-utility']})
                         ]
kiklop74 is offline  
Advert
Old 02-03-2010, 10:04 AM   #1341
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for Read It Later website:
Attached Files
File Type: zip readitlater.zip (1.7 KB, 230 views)
kiklop74 is offline  
Old 02-03-2010, 02:21 PM   #1342
Denny_
Member
Denny_ began at the beginning.
 
Posts: 12
Karma: 42
Join Date: Jan 2010
Device: Kindle
In trying to create a custom recipe I got as far as posting the feeds and getting the print version but I'm having trouble cleaning up the extra links at the bottome of each article. At the end of the article the HTML file looks like:

</p>
</div>
<div class="print-logo"></div>
<hr class="calibre3"/>
<div class="print-logo"></div>
<div class="print-logo">
<p class="calibre5"><a href="https://www.neodata.com/ITPS2.cgi?OrderType=Reply+Only&amp;ItemCode=WSTD&a mp;iResponse=WSTD.NEW">Subscribe now to The Weekly Standard!</a></p>
<p class="calibre5"><b class="calibre6">Get more from The Weekly Standard:</b> <a href="/feeds">Follow WeeklyStandard.com on RSS</a> and <a href="/newsletter/requestform.asp">sign-up for our free Newsletter.</a></p>
<p class="calibre5"><a href="/tws/advertising/default.asp">Contact our advertising team</a> for advertising and sponsorship on WeeklyStandard.com or in <b class="calibre6">The Weekly Standard.</b></p>
<p class="calibre5">Copyright 2010 Weekly Standard LLC.</p>
</div>
<hr class="calibre3"/>
<div class="print-logo"><strong class="calibre6">Source URL:</strong> <a href="http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program">http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program</a></div>
<div class="print-logo"></div>
<div class="navbar1">
<hr class="calibre3"/>
<p class="calibre7">
This article was downloaded by <b class="calibre6">calibre</b> from <a href="http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program">http://www.theweeklystandard.com/blogs/obama-halts-nasas-constellation-program</a>
</p>
<br class="print-logo"/><br class="print-logo"/>
| <a href="../index.html#article_0">Section menu</a>
|
</div></body>
</html>

Can someone tell me how best to eliminate this?

Thanks,

Denny
Denny_ is offline  
Old 02-03-2010, 02:31 PM   #1343
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
Just add this to your recipe

Quote:
keep_only_tags = [dict(attrs={'class':['print-title','print-subtitle','print-author','print-date-issue','print-content']})]
kiklop74 is offline  
Old 02-03-2010, 03:11 PM   #1344
gafleh
Junior Member
gafleh began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Dec 2008
Device: none
Hi Everybody ! I am New here but for 2 days I have been around this wonderfull site.

May I request a help of recipe for
http://www.islamqa.com/en/rss.xml

Thank You Once Again
gafleh is offline  
Old 02-03-2010, 04:50 PM   #1345
exdream
Junior Member
exdream began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2010
Device: Sony PRS-505
Hi

I try to make a recipe for http://szmobil.sueddeutsche.de/ This ist the code up to now (with which I get - IndexError: list index out of range -Error Code: 1). Am I on the right way with that? Can somebody please tell me what is wrong.
...

def parse_index(self):
feeds = []
for title, url in [('Politik', 'http://szmobil.sueddeutsche.de/show.php?section=Politik'),
('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'),
('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'),
('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'),
('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'),
('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'),
('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'),
('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'),
('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'),
('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern'),
]:
articles = self.nz_parse_section(url)
if articles:
feeds.append((title, articles))
return feeds

def nz_parse_section(self, url):
soup = self.index_to_soup(url)
# div = soup.find(attrs={'class': 'col-300 categoryList'})
# date = div.find(attrs={'class': 'link-list-heading'})

current_articles = []
# for tag in date.findAllNext(attrs = {'class': ['linkList', 'link-list-heading']}):
# if tag.get('class') == 'link-list-heading':
# break
for li in soup.findAll('li'):
a = li.find('a', href = True)
if a is None:
continue
title = self.tag_to_string(a)
url = a.get('href', False)
if not url or not title:
continue
# if url.startswith('/'):
# url = 'http://www.nzherald.co.nz'+url
self.log('\t\tFound article:', title)
self.log('\t\t\t', url)
current_articles.append({'title': title, 'url': url, 'description':'', 'date':''})

return current_articles
exdream is offline  
Old 02-03-2010, 06:54 PM   #1346
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Updated recipe for The New Republic

Fix for a stylesheet issue; add feed for The Book

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class The_New_Republic(BasicNewsRecipe):
    title = 'The New Republic'
    __author__ = 'cix3'
    language = 'en'
    description = 'Intelligent, stimulating and rigorous examination of American politics, foreign policy and culture'
    timefmt = ' [%b %d, %Y]'

    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True

    remove_tags = [
            dict(name='div', attrs={'class':['print-logo', 'print-site_name', 'img-left', 'print-source_url']}),
            dict(name='hr', attrs={'class':'print-hr'}), dict(name='img')
            ]

    feeds = [
        ('Politics', 'http://www.tnr.com/rss/articles/Politics'),
        ('Books and Arts', 'http://www.tnr.com/rss/articles/Books-and-Arts'),
        ('Economy', 'http://www.tnr.com/rss/articles/Economy'),
        ('Environment and Energy', 'http://www.tnr.com/rss/articles/Environment-%2526-Energy'),
        ('Health Care', 'http://www.tnr.com/rss/articles/Health-Care'),
        ('Metro Policy', 'http://www.tnr.com/rss/articles/Metro-Policy'),
        ('World', 'http://www.tnr.com/rss/articles/World'),
        ('Film', 'http://www.tnr.com/rss/articles/Film'),
        ('Books', 'http://www.tnr.com/rss/articles/books'),
        ('The Book', 'http://www.tnr.com/rss/book'),
        ('Jonathan Chait', 'http://www.tnr.com/rss/blogs/Jonathan-Chait'),
        ('The Plank', 'http://www.tnr.com/rss/blogs/The-Plank'),
        ('The Treatment', 'http://www.tnr.com/rss/blogs/The-Treatment'),
        ('The Spine', 'http://www.tnr.com/rss/blogs/The-Spine'),
        ('The Vine', 'http://www.tnr.com/rss/blogs/The-Vine'),
        ('The Avenue', 'http://www.tnr.com/rss/blogs/The-Avenue'),
        ('William Galston', 'http://www.tnr.com/rss/blogs/William-Galston'),
        ('Simon Johnson', 'http://www.tnr.com/rss/blogs/Simon-Johnson'),
        ('Ed Kilgore', 'http://www.tnr.com/rss/blogs/Ed-Kilgore'),
        ('Damon Linker', 'http://www.tnr.com/rss/blogs/Damon-Linker'),
        ('John McWhorter', 'http://www.tnr.com/rss/blogs/John-McWhorter')
            ]

    def print_version(self, url):
        return url.replace('http://www.tnr.com/', 'http://www.tnr.com/print/')
cix3 is offline  
Old 02-03-2010, 08:38 PM   #1347
lorenzov
Member
lorenzov began at the beginning.
 
lorenzov's Avatar
 
Posts: 23
Karma: 12
Join Date: Jan 2010
Location: Edinburgh, UK
Device: SONY PRS600, Apple iPhone 3G
hi,
i haven't tried anything yet, but one thing i noticed is the comma after your last feed entry which means that array is expecting another entry and finds a blank instead.


Code:
('Seite Drei', 'http://szmobil.sueddeutsche.de/show.php?section=Seite+drei'),
('Meinungsseite', 'http://szmobil.sueddeutsche.de/show.php?section=Meinungsseite'),
('Panorama', 'http://szmobil.sueddeutsche.de/show.php?section=Panorama'),
('Feuilleton', 'http://szmobil.sueddeutsche.de/show.php?section=Feuilleton'),
('Medien', 'http://szmobil.sueddeutsche.de/show.php?section=Medien'),
('Wissen', 'http://szmobil.sueddeutsche.de/show.php?section=Wissen'),
('Wirtschaft', u'http://szmobil.sueddeutsche.de/show.php?section=Wirtschaft'),
('Sport', u'http://szmobil.sueddeutsche.de/show.php?section=Sport'),
('Muenchen-Bayern', u'http://szmobil.sueddeutsche.de/show.php?section=M%FCnchen%2FBayern'),
it might put you in the right direction for tonight!

Last edited by lorenzov; 02-03-2010 at 08:42 PM.
lorenzov is offline  
Old 02-03-2010, 09:05 PM   #1348
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
may i request a recipe for http://sethgodin.typepad.com/
bhandarisaurabh is offline  
Old 02-03-2010, 10:43 PM   #1349
cix3
Member
cix3 began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Aug 2009
Device: Kindle 2
Seth Godin's Blog

Very basic recipe. Feel free to enhance in any way...

Code:
class SethGodin(BasicNewsRecipe):
    title = 'Seth Godins Blog'
    __author__ = 'cix3'
    language = 'en'
    description = 'Seth Godin - riffs on marketing, respect, and the ways ideas spread.'
    timefmt = ' [%b %d, %Y]'

    oldest_article = 30
    max_articles_per_feed = 100
    no_stylesheets = True

    remove_tags = [dict(name='script')]
    feeds = [('SethGodin', 'http://feeds.feedburner.com/typepad/sethsmainblog')]
cix3 is offline  
Old 02-04-2010, 07:32 AM   #1350
MartynM
Junior Member
MartynM began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2010
Device: Kindle
Digital Spy

Hi Guys,

I love the Digital Spy site as it is a great source for anything to do with entertainment. It would be great to get it on my Kindle but I don't have a clue. I have looked at the recipe section and it may as well be written in Russian.

The site is www.digitalspy.co.uk any help and direction on how to get this as a newspaper download would be appreciated.

REGARDS

Martyn
MartynM is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 04:07 AM.


MobileRead.com is a privately owned, operated and funded community.