Custom recipes (archive, read-only) - Page 153

jasonfedelem · 07-09-2010, 09:51 AM

Quote:

Originally Posted by jasonfedelem

That's weird. I'm looking under Fetch News under both "Austin" and "The Austin" but it doesn't show up...

Is there some extended lib of recipes? I'm running v0.7.7. and have 237 recipes listed under "English"

Never mind! I'm blind.... it was listed under just "Statesman".

I see that you are the author of it. Have you considered changing the listed name to "Austin American Statesmen"? I think that's what most people would look for...

einstuerzende · 07-09-2010, 10:34 AM

Quote:

Originally Posted by rty

Either I am getting senile today, but I think cn.wsj.com seems to have some kind of protection in place to prevent Calibre basic scraping method from the RSS page. We'll have to wait for experts to help.

Yeah, I think the dates on the RSS feed were wonky. Nothing got picked up the first time I put it through, only once it was allowed to pull older articles did it get anything. But even then it was pretty ugly (my python kung-fu is hella weak).

Experts?

bhandarisaurabh · 07-09-2010, 09:25 PM

AN SOMEONE MAKE RECIPE FOR WHARTON INDIA@ KNOWLEDGE
http://knowledge.wharton.upenn.edu/india/rss/

AND FINANCIAL EXPRESS PRINT EDITION WITHOUT USING FEEDS AND USING THE LINK
http://www.financialexpress.com/print/

rty · 07-10-2010, 02:01 AM

Quote:

Originally Posted by einstuerzende

Yeah, I think the dates on the RSS feed were wonky.

Bingo! You got it! The wonky dates!

Just insert this magic line in your recipe and it should work!

timefmt = ' [%Y %b %d]'

I'll make the recipe for you.

capidamonte · 07-10-2010, 03:19 AM

Could I request a recipe for the Calibre User Manual?

Perhaps it could test for changes so the server doesn't get bombed...

Thanks!

DoctorOhh · 07-10-2010, 04:08 AM

Quote:

Originally Posted by capidamonte

Could I request a recipe for the Calibre User Manual?

Perhaps it could test for changes so the server doesn't get bombed...

Thanks!

This isn't a recipe but from the online user manual is the following.

Quote:

An e-book version of this User Manual is available in EPUB format. Because the User Manual uses advanced formatting, it is only suitable for use with the calibre e-book viewer.

capidamonte · 07-10-2010, 04:41 AM

Yeah, I just went looking for that, and couldn't see it. I'm sure it was there, b/c you found it, but for me it was invisible. Even though I'd seen it before.

Assuming it's updated when the webpage is updated, maybe I could write a script to download it and import it into Calibre using the command-line tools.

Still, it'd be elegant if Calibre provided its own updates via its News system, wouldn't it?

Dereks · 07-10-2010, 06:42 AM

i saw there was a discussion about problems with google reader recipe a while ago? was is solved?

rty · 07-10-2010, 07:07 AM

Quote:

Originally Posted by einstuerzende

rty,

I've been fumbling around with making a recipe for cn.wsj.com without an awful lot of success. If you have time and are taking any requests, I'd appreciate whatever help you could give. I'm trying to get the Traditional character edition, which I think means throwing "big5" in front of everything (ex: http://cn.wsj.com/big5/20100708/FRX003561.asp)

http://chinese.wsj.com/gb/rss01.xml

Would you like to take a look at the recipe code below? It pulls all the correct articles but for some reason, the 'remove_tags_after' doesn't work on this particular site. Basically you want to remove everything after the Division with id='toolbar_tb'

Spoiler:

bikecd · 07-10-2010, 07:12 AM

Hello!

Great program!!!! You all do great work! I do have one issue, however. I tried to create my own recipe for El Pais- a Spanish newspaper- since the recipes provided give the print version of the articles on the webpage. I tried to create a recipe to get the print version of articles of the PRINT EDITION each morning. But to no avail. I failed miserably! Any help in creating a recipe for El Pais to get only the article the in daily PRINT EDITION??

THANKS!!!

sibermage · 07-10-2010, 07:48 AM

Quote:

Originally Posted by rty

Here it is: Recipe for SINGTAO DAILY CANADA

Language: Chinese (Traditional)
Tested OK on B&N Nook e-reader.

Updated: Recipe updated to remove the hidden/bogus tab character that prevented the recipe to be imported into Calibre.

Thanks RTY. That worked.

rty · 07-10-2010, 11:14 AM

Quote:

Originally Posted by jasonfedelem

I see that you are the author of it. Have you considered changing the listed name to "Austin American Statesmen"? I think that's what most people would look for...

Nah, if you go to the website, you can tell that publisher doesn't seem to agree.

But here's another recipe you asked for: Waco Tribune.

mikegps1 · 07-10-2010, 11:18 AM

Since my last post a couple of days ago, I've tried to update the Times Online recipe as the paper now requires a subscription for access to newsfeeds.

My first attempt is below get errors in lines 41 and 43 can anyone help please?

BTW version 07.8 is great, calibre gets better all the time.

************************************************** ******
#!/usr/bin/env python

__license__ = 'GPL v3'
__copyright__ = '2008-2009, Darko Miletic <darko.miletic at gmail.com>'
'''
timesonline.co.uk
'''
import re

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag

class Timesonline(BasicNewsRecipe):
title = 'The Times Online'
__author__ = 'Darko Miletic and Sujata Raman'
description = 'UK news'
publisher = 'timesonline.co.uk'
category = 'news, politics, UK'
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = True
use_embedded_content = False
simultaneous_downloads = 1
encoding = 'ISO-8859-1'
remove_javascript = True
language = 'en_GB'
recursions = 9
LOGIN = http://www.timesplus.co.uk/tto/news/...lightbox=false
keep_only_tags = [
dict(name='div', attrs= {'id':['region-column1and2-layout2']}),
{'class' : ['subheading']},
dict(name='div', attrs= {'id':['dynamic-image-holder']}),
dict(name='div', attrs= {'class':['article-author']}),
dict(name='div', attrs= {'id':['related-article-links']}),
]

remove_tags = [
dict(name=['embed','object','form','iframe']),
dict(name='span', attrs = {'class':'float-left padding-left-8 padding-top-2'}),
dict(name='div', attrs= {'id':['region-footer','region-column2-layout2','grid-column4','login-status','comment-sort-order']}),
dict(name='div', attrs= {'class': ['debate-quote-container','clear','your-comment','float-left related-attachements-container','float-left padding-bottom-5 padding-top-8','puff-top']}),
dict(name='span', attrs = {'id': ['comment-count']}),
dict(name='ul',attrs = {'id': 'read-all-comments'}),
dict(name='a', attrs = {'class':'reg-bold'}),
]

extra_css = '''
.small{font-family :Arial,Helvetica,sans-serif; font-size:x-small;}
.byline{font-family :Arial,Helvetica,sans-serif; font-size:x-small; background:#F8F1D8;}
.color-666{font-family :Arial,Helvetica,sans-serif; font-size:x-small; color:#666666; }
h1{font-family:Georgia,Times New Roman,Times,serif;font-size:large; }
.color-999 {color:#999999;}
.x-small {font-size:x-small;}
#related-article-links{font-family :Arial,Helvetica,sans-serif; font-size:small;}
h2{color:#333333;font-family :Georgia,Times New Roman,Times,serif; font-size:small;}
p{font-family :Arial,Helvetica,sans-serif; font-size:small;}
'''
feeds = [
(u'Top stories from Times Online', u'http://www.timesonline.co.uk/tol/feeds/rss/topstories.xml' ),
('Latest Business News', 'http://www.timesonline.co.uk/tol/feeds/rss/business.xml'),
('Economics', 'http://www.timesonline.co.uk/tol/feeds/rss/economics.xml'),
('World News', 'http://www.timesonline.co.uk/tol/feeds/rss/worldnews.xml'),
('UK News', 'http://www.timesonline.co.uk/tol/feeds/rss/uknews.xml'),
('Travel News', 'http://www.timesonline.co.uk/tol/feeds/rss/travel.xml'),
('Sports News', 'http://www.timesonline.co.uk/tol/feeds/rss/sport.xml'),
('Film News', 'http://www.timesonline.co.uk/tol/feeds/rss/film.xml'),
('Tech news', 'http://www.timesonline.co.uk/tol/feeds/rss/tech.xml'),
('Literary Supplement', 'http://www.timesonline.co.uk/tol/feeds/rss/thetls.xml'),
]

def get_cover_url(self):
cover_url = None
index = 'http://www.timesonline.co.uk/tol/newspapers/'
soup = self.index_to_soup(index)
link_item = soup.find(name = 'div',attrs ={'class': "float-left margin-right-15"})
if link_item:
cover_url = link_item.img['src']
return cover_url

def get_article_url(self, article):
return article.get('guid', None)

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username is not None and self.password is not None:
br.open(self.LOGIN)
br.select_form(name='loginForm')
br['username'] = self.username
br['password'] = self.password
br.submit()
return br

def preprocess_html(self, soup):
soup.html['xml:lang'] = self.language
soup.html['lang'] = self.language
mlang = Tag(soup,'meta',[("http-equiv","Content-Language"),("content",self.language)])
mcharset = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset=ISO-8859-1")])
soup.head.insert(0,mlang)
soup.head.insert(1,mcharset)
return self.adeify_images(soup)

def postprocess_html(self,soup,first):
for tag in soup.findAll(text = ['Previous Page','Next Page']):
tag.extract()
return soup

iLeaveYou · 07-11-2010, 06:43 AM

WOW!!!
0.7.8 is just great.
Now everybody could access a Romanian recipe.
I am asking again if somebody could do a recipe for this:
http://www.realitatea.net/rss.html
They probably have the best rss feeds for the best Romanian News.
Thank you.

Starson17 · 07-11-2010, 10:43 AM

Quote:

Originally Posted by Dereks

i saw there was a discussion about problems with google reader recipe a while ago? was is solved?

I solved it a few hours ago. It's still being tested - there's a dedicated thread with the fixed recipe, if you're interested.

07-10-2010, 11:18 AM	#2293
mikegps1 Junior Member Posts: 4 Karma: 10 Join Date: Jul 2010 Device: sony prs600	Times Online - subscription version Since my last post a couple of days ago, I've tried to update the Times Online recipe as the paper now requires a subscription for access to newsfeeds. My first attempt is below get errors in lines 41 and 43 can anyone help please? BTW version 07.8 is great, calibre gets better all the time. ************************************************ **** #!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008-2009, Darko Miletic <darko.miletic at gmail.com>' ''' timesonline.co.uk ''' import re from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import Tag class Timesonline(BasicNewsRecipe): title = 'The Times Online' __author__ = 'Darko Miletic and Sujata Raman' description = 'UK news' publisher = 'timesonline.co.uk' category = 'news, politics, UK' oldest_article = 2 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False simultaneous_downloads = 1 encoding = 'ISO-8859-1' remove_javascript = True language = 'en_GB' recursions = 9 LOGIN = http://www.timesplus.co.uk/tto/news/...lightbox=false keep_only_tags = [ dict(name='div', attrs= {'id':['region-column1and2-layout2']}), {'class' : ['subheading']}, dict(name='div', attrs= {'id':['dynamic-image-holder']}), dict(name='div', attrs= {'class':['article-author']}), dict(name='div', attrs= {'id':['related-article-links']}), ] remove_tags = [ dict(name=['embed','object','form','iframe']), dict(name='span', attrs = {'class':'float-left padding-left-8 padding-top-2'}), dict(name='div', attrs= {'id':['region-footer','region-column2-layout2','grid-column4','login-status','comment-sort-order']}), dict(name='div', attrs= {'class': ['debate-quote-container','clear','your-comment','float-left related-attachements-container','float-left padding-bottom-5 padding-top-8','puff-top']}), dict(name='span', attrs = {'id': ['comment-count']}), dict(name='ul',attrs = {'id': 'read-all-comments'}), dict(name='a', attrs = {'class':'reg-bold'}), ] extra_css = ''' .small{font-family :Arial,Helvetica,sans-serif; font-size:x-small;} .byline{font-family :Arial,Helvetica,sans-serif; font-size:x-small; background:#F8F1D8;} .color-666{font-family :Arial,Helvetica,sans-serif; font-size:x-small; color:#666666; } h1{font-family:Georgia,Times New Roman,Times,serif;font-size:large; } .color-999 {color:#999999;} .x-small {font-size:x-small;} #related-article-links{font-family :Arial,Helvetica,sans-serif; font-size:small;} h2{color:#333333;font-family :Georgia,Times New Roman,Times,serif; font-size:small;} p{font-family :Arial,Helvetica,sans-serif; font-size:small;} ''' feeds = [ (u'Top stories from Times Online', u'http://www.timesonline.co.uk/tol/feeds/rss/topstories.xml' ), ('Latest Business News', 'http://www.timesonline.co.uk/tol/feeds/rss/business.xml'), ('Economics', 'http://www.timesonline.co.uk/tol/feeds/rss/economics.xml'), ('World News', 'http://www.timesonline.co.uk/tol/feeds/rss/worldnews.xml'), ('UK News', 'http://www.timesonline.co.uk/tol/feeds/rss/uknews.xml'), ('Travel News', 'http://www.timesonline.co.uk/tol/feeds/rss/travel.xml'), ('Sports News', 'http://www.timesonline.co.uk/tol/feeds/rss/sport.xml'), ('Film News', 'http://www.timesonline.co.uk/tol/feeds/rss/film.xml'), ('Tech news', 'http://www.timesonline.co.uk/tol/feeds/rss/tech.xml'), ('Literary Supplement', 'http://www.timesonline.co.uk/tol/feeds/rss/thetls.xml'), ] def get_cover_url(self): cover_url = None index = 'http://www.timesonline.co.uk/tol/newspapers/' soup = self.index_to_soup(index) link_item = soup.find(name = 'div',attrs ={'class': "float-left margin-right-15"}) if link_item: cover_url = link_item.img['src'] return cover_url def get_article_url(self, article): return article.get('guid', None) def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open(self.LOGIN) br.select_form(name='loginForm') br['username'] = self.username br['password'] = self.password br.submit() return br def preprocess_html(self, soup): soup.html['xml:lang'] = self.language soup.html['lang'] = self.language mlang = Tag(soup,'meta',[("http-equiv","Content-Language"),("content",self.language)]) mcharset = Tag(soup,'meta',[("http-equiv","Content-Type"),("content","text/html; charset=ISO-8859-1")]) soup.head.insert(0,mlang) soup.head.insert(1,mcharset) return self.adeify_images(soup) def postprocess_html(self,soup,first): for tag in soup.findAll(text = ['Previous Page','Next Page']): tag.extract() return soup

07-11-2010, 06:43 AM	#2294
iLeaveYou Junior Member Posts: 5 Karma: 10 Join Date: Jul 2010 Device: Kindle DX	WOW!!! 0.7.8 is just great. Now everybody could access a Romanian recipe. I am asking again if somebody could do a recipe for this: http://www.realitatea.net/rss.html They probably have the best rss feeds for the best Romanian News. Thank you. Last edited by iLeaveYou; 07-11-2010 at 06:47 AM.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Custom column read ?	pchrist7	Calibre	2	10-04-2010 02:52 AM
Archive for custom screensavers	sleeplessdave	Amazon Kindle	1	07-07-2010 12:33 PM
How to back up preferences and custom recipes?	greenapple	Calibre	3	03-29-2010 05:08 AM
Donations for Custom Recipes	ddavtian	Calibre	5	01-23-2010 04:54 PM
Help understanding custom recipes	andersent	Calibre	0	12-17-2009 02:37 PM

07-09-2010, 09:25 PM	#2283
bhandarisaurabh Enthusiast Posts: 49 Karma: 10 Join Date: Aug 2009 Device: none	AN SOMEONE MAKE RECIPE FOR WHARTON INDIA@ KNOWLEDGE http://knowledge.wharton.upenn.edu/india/rss/ AND FINANCIAL EXPRESS PRINT EDITION WITHOUT USING FEEDS AND USING THE LINK http://www.financialexpress.com/print/

07-10-2010, 03:19 AM	#2285
capidamonte Not who you think I am... Posts: 374 Karma: 30283 Join Date: Jan 2010 Location: Honolulu Device: PocketBook 360 -- Ivory	Could I request a recipe for the Calibre User Manual? Perhaps it could test for changes so the server doesn't get bombed... Thanks!

07-10-2010, 04:41 AM	#2287
capidamonte Not who you think I am... Posts: 374 Karma: 30283 Join Date: Jan 2010 Location: Honolulu Device: PocketBook 360 -- Ivory	Yeah, I just went looking for that, and couldn't see it. I'm sure it was there, b/c you found it, but for me it was invisible. Even though I'd seen it before. Assuming it's updated when the webpage is updated, maybe I could write a script to download it and import it into Calibre using the command-line tools. Still, it'd be elegant if Calibre provided its own updates via its News system, wouldn't it?

07-10-2010, 06:42 AM	#2288
Dereks Connoisseur Posts: 57 Karma: 10 Join Date: Feb 2010 Device: Kindle Paperwhite 1	i saw there was a discussion about problems with google reader recipe a while ago? was is solved?

07-10-2010, 07:12 AM	#2290
bikecd Junior Member Posts: 1 Karma: 10 Join Date: Jul 2010 Device: Amazon Kindle	Hello! Great program!!!! You all do great work! I do have one issue, however. I tried to create my own recipe for El Pais- a Spanish newspaper- since the recipes provided give the print version of the articles on the webpage. I tried to create a recipe to get the print version of articles of the PRINT EDITION each morning. But to no avail. I failed miserably! Any help in creating a recipe for El Pais to get only the article the in daily PRINT EDITION?? THANKS!!!

Advert

Advert