Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 07-11-2010, 12:49 PM   #2296
CaptainJSK
Junior Member
CaptainJSK began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle DX
I am new to using calibre (and new to ebooks in general). I am trying to set up RSS feeds using calibre, and so far the experience has been great! I am trying to set up some blogs that I read often. If I just use the basic set up, these work pretty well, but there are a few things I would like to be different. 1) I would like them to be displayed in reverse order (i.e., older entries first) so that I can catch up on things I've missed and 2) I would like to see the comments embedded in feed (or at the very least, a link to the comments that I could then open up in the browser). Does anyone have any idea on how to do this? Thanks!
CaptainJSK is offline  
Old 07-11-2010, 04:51 PM   #2297
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New recipe for El País - printed edition. This one scrapes daily website without using feeds.
Attached Files
File Type: zip elpais_impreso.zip (2.4 KB, 593 views)
kiklop74 is offline  
Old 07-11-2010, 09:00 PM   #2298
vietchovui
Zealot
vietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enoughvietchovui will become famous soon enough
 
Posts: 109
Karma: 556
Join Date: Nov 2009
Location: SaiGon VietNam
Device: PRS T1, Kobo Forma 8G, Kobo Libra H2O
I found that there is a problem with Associated Press recipe. Anyone can fix it, please?
vietchovui is offline  
Old 07-11-2010, 11:11 PM   #2299
kkurzendoerfer
Junior Member
kkurzendoerfer began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2010
Device: Kindle
Charlottesville Daily Progress

Hi - I'm new to Calibre and am looking for a recipe that would download articles from the local paper, The Daily Progress: http://www2.dailyprogress.com/. Any help would be appreciated.

Thanks!
kkurzendoerfer is offline  
Old 07-12-2010, 09:12 AM   #2300
einstuerzende
Junior Member
einstuerzende began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jul 2010
Device: Kindle
Quote:
Would you like to take a look at the recipe code below? It pulls all the correct articles but for some reason, the 'remove_tags_after' doesn't work on this particular site. Basically you want to remove everything after the Division with id='toolbar_tb'
Thanks, rty! Yeah, happy to take a look at it. I'll play around and see if I can't get it to behave. Is there a restriction on how many feeds are put into these? I can get the toolbar bit to drop off, I may put up a Trad. version with a couple more sections included.
einstuerzende is offline  
Old 07-12-2010, 09:28 AM   #2301
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by einstuerzende View Post
Thanks, rty! Yeah, happy to take a look at it. I'll play around and see if I can't get it to behave. Is there a restriction on how many feeds are put into these? I can get the toolbar bit to drop off, I may put up a Trad. version with a couple more sections included.
You're welcome. AFAIK, there's no restriction on the number of feeds but remember, the more feed you put the longer the time it takes to download. Mixing it with traditional version may not be a good idea because it may use a different character encoding from the simplified one but anyway, have fun experimenting.
rty is offline  
Old 07-12-2010, 09:41 AM   #2302
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by CaptainJSK View Post
1) I would like them to be displayed in reverse order (i.e., older entries first) so that I can catch up on things I've missed
Search this thread for "reverse" and look at my GoComics recipe. It does a reverse of date order for comic strips with:
current_articles.reverse()
It requires that you build the article feed yourself before reversing it.

Quote:
2) I would like to see the comments embedded in feed (or at the very least, a link to the comments that I could then open up in the browser). Does anyone have any idea on how to do this?
What "comments" are you referring to?
Where do you want them embedded - in the page giving the list of feeds, in the page giving the article list for each feed or at the article level?
Starson17 is offline  
Old 07-12-2010, 11:54 AM   #2303
CaptainJSK
Junior Member
CaptainJSK began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Kindle DX
Quote:
Originally Posted by Starson17 View Post
Search this thread for "reverse" and look at my GoComics recipe. It does a reverse of date order for comic strips with:
current_articles.reverse()
It requires that you build the article feed yourself before reversing it.
Thanks! I'll take a look at it and see if I can make it work.



Quote:
Originally Posted by Starson17 View Post
What "comments" are you referring to?
Where do you want them embedded - in the page giving the list of feeds, in the page giving the article list for each feed or at the article level?
I mean the comments to the blog posting. They are not a part of the direct rss feed but ideally I would like to be able to read them along with the article (so article level I suppose).

Thanks for your help!
CaptainJSK is offline  
Old 07-12-2010, 12:05 PM   #2304
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by CaptainJSK View Post
Thanks! I'll take a look at it and see if I can make it work.
Feel free to ask questions. The normal method a recipe uses to get articles from the feed automatically parses the feed. For more positive control, you can use the parse_index method. You may want to read the links here.

Quote:
I mean the comments to the blog posting. They are not a part of the direct rss feed but ideally I would like to be able to read them along with the article (so article level I suppose).

Thanks for your help!
Hmmm. Blog comments are usually on the bottom of the page the RSS feed points to. Most of the time, the problem for a recipe is how to get rid of the comments, not how to add them.

Are you stripping them by accident, or are they not there? Some recipes use the print version of the articles, and they usually do not have comments, so if you are using the print version, you may need to switch.
Starson17 is offline  
Old 07-12-2010, 01:02 PM   #2305
einstuerzende
Junior Member
einstuerzende began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jul 2010
Device: Kindle
Quote:
Originally Posted by rty View Post
Mixing it with traditional version may not be a good idea because it may use a different character encoding from the simplified one but anyway, have fun experimenting.
Oh, I didn't mix at all, just getting it to pull the Trad feeds (can't read Simp), but have had the same problem with it not cutting those tags.

Anybody have an idea? The only thing I can figure is that the combined style and div_id is screwing it up:

Code:
<div id="toolbar_tb" style='padding:4px 0px 25px 15px;background: url(../../pictures/format/dot3.gif) repeat-x bottom; margin-top:10px;'>
Would this trip up remove_tags_after toolbar_tb?
einstuerzende is offline  
Old 07-12-2010, 01:21 PM   #2306
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by einstuerzende View Post
Anybody have an idea? The only thing I can figure is that the combined style and div_id is screwing it up:

Code:
<div id="toolbar_tb" style='padding:4px 0px 25px 15px;background: url(../../pictures/format/dot3.gif) repeat-x bottom; margin-top:10px;'>
Would this trip up remove_tags_after toolbar_tb?
No. It shouldn't be a problem. Are you sure that whatever you think is "after" is really "after" that tag? Try printing the soup in your preprocess_html and actually look. Sometimes the order of the tags as seen in FireFox (or whatever you use to check the tag order) is not the same as what your recipe actually sees when it runs. I've had a lot of trouble with remove_tags before and after.
Starson17 is offline  
Old 07-12-2010, 02:54 PM   #2307
Daffy6964
Junior Member
Daffy6964 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2010
Location: Texas
Device: Sony PRS-300
The Daily Beast

A recipe request for "The Daily Beast" if possible.
http://www.thedailybeast.com/

Thank you very much.
Daffy6964 is offline  
Old 07-12-2010, 04:58 PM   #2308
einstuerzende
Junior Member
einstuerzende began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jul 2010
Device: Kindle
Here's a first attempt at a recipe for Taiwan's Apple Daily News, rty and others feel free to add comments/clean up/etc.

Spoiler:
class AdvancedUserRecipe1277443634(BasicNewsRecipe):
title = u'蘋果日報'
oldest_article = 7
max_articles_per_feed = 100

feeds = [
(u'\u982D\u689D', u'http://tw.nextmedia.com/rss/create/type/1077'),
(u'\u8981\u805E', u'http://tw.nextmedia.com/rss/create/type/11'),
(u'\u653F\u6CBB', u'http://tw.nextmedia.com/rss/create/type/151'),
(u'\u793E\u6703', u'http://tw.nextmedia.com/rss/create/type/1066'),
(u'\u751F\u6D3B', u'http://tw.nextmedia.com/rss/create/type/2724'),
(u'\u5730\u65B9\u7D9C\u5408', u'http://tw.nextmedia.com/rss/create/type/1076'),
(u'\u6696\u6D41', u'http://tw.nextmedia.com/rss/create/type/9499'),
(u'\u6295\u8A34', u'http://tw.nextmedia.com/rss/create/type/16287'),
(u'\u8AD6\u58C7', u'http://tw.nextmedia.com/rss/create/type/824711')
]
extra_css = '''
@font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
body {margin-right: 8pt; font-family: 'DroidFont', serif;}\n
h1 {font-family: 'DroidFont', serif;}\n
.articledescription {font-family: 'DroidFont', serif;}
'''
__author__ = 'einstuerzende'
__version__ = '1.0'
language = 'zh-HANT'
pubisher = 'Next Media'
description = 'Apple Daily (Taiwan)'
category = 'News, Chinese'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
encoding = 'UTF-8'
conversion_options = {'linearize_tables':True}
masthead_url = 'http://tw.img.nextmedia.com/www/images/atnextheader_logo_appledaily.gif'
keep_only_tags = [dict(name='div', attrs={'id':['article_left']})]
remove_tags = [
dict(name='div', attrs={'id':['articleTools','articleTools2','pagebar','articleI ntroPhoto']}),
dict(name='div', attrs={'class':'gotoFeedback'}),
dict(name='span', attrs={'class':'zoom'}),
]


Quote:
Originally Posted by Starson17 View Post
No. It shouldn't be a problem. Are you sure that whatever you think is "after" is really "after" that tag? Try printing the soup in your preprocess_html and actually look. Sometimes the order of the tags as seen in FireFox (or whatever you use to check the tag order) is not the same as what your recipe actually sees when it runs. I've had a lot of trouble with remove_tags before and after.
Thanks for the tip; I'll see tonight if I can get something more out of playing with the command line. If anybody can help with the Chinese WSJ recipe, I'm dying to know what I'm missing. I'll keep working on it, but even a recipe this simple, it seems to be pulling the whole page:

Spoiler:
class AdvancedUserRecipe1278740771(BasicNewsRecipe):
title = u'WSJ 華爾街日報'
__author__ = 'x'
oldest_article = 14
max_articles_per_feed = 2
timefmt = ' [%Y %b %d]'
feeds = [
#(u'\u8981\u805E', u'http://chinese.wsj.com/big5/rss01.xml'),
(u'\u7279\u5BEB', u'http://chinese.wsj.com/big5/rss02.xml'),
#(u'\u4E2D\u6E2F\u53F0', u'http://chinese.wsj.com/big5/rssbch.xml'),
#(u'\u570B\u969B\u8CA1\u7D93', u'http://chinese.wsj.com/big5/rssglobal.xml'),
#(u'\u4E2D\u570B\u80A1\u5E02', u'http://chinese.wsj.com/big5/rsschinastock.xml'),
#(u'\u9999\u6E2F\u80A1\u5E02', u'http://chinese.wsj.com/big5/rssHKstock.xml'),
#(u'\u5916\u532F\u5E02\u5834', u'http://chinese.wsj.com/big5/rssforex.xml')
#(u'\u5168\u7403\u91D1\u878D\u5E02\u5834', u'http://chinese.wsj.com/big5/rssmarkets.xml')
#(u'\u79D1\u6280', u'http://chinese.wsj.com/big5/rsstech.xml')
#(u'\u80FD\u6E90\u8207\u6C7D\u8ECA', u'http://chinese.wsj.com/big5/rssautoene.xml')
]
language = 'zh-cn'
pubisher = 'Dow Jones & Company, Inc.'
description = 'Wall Stree Journal - Chinese edition'
category = 'News, Business'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
encoding = 'big5'
#conversion_options = {'linearize_tables':True}


extra_css = '''
@font-face { font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
body {
margin-right: 8pt;
font-family: 'DroidFont', serif;}
.left_content {font-family: 'DroidFont', serif, sans-serif}
'''

keep_only_tags = [dict(name='div', attrs={'id':['headline','A']}),]

def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll(width=True):
del item['width']
return soup
einstuerzende is offline  
Old 07-12-2010, 11:19 PM   #2309
einstuerzende
Junior Member
einstuerzende began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jul 2010
Device: Kindle
Two more Taiwan papers, in case anyone's interested (drafts; questions, comments, corrections encouraged):

China Times:

Spoiler:
class AdvancedUserRecipe1277443634(BasicNewsRecipe):
title = u'中時電子報'
oldest_article = 7
max_articles_per_feed = 100

feeds = [
(u'焦點', u'http://rss.chinatimes.com/rss/focus-u.rss'),
(u'政治', u'http://rss.chinatimes.com/rss/Politic-u.rss'),
#(u'社會', u'http://rss.chinatimes.com/rss/social-u.rss'),
(u'國際', u'http://rss.chinatimes.com/rss/international-u.rss'),
(u'兩岸', u'http://rss.chinatimes.com/rss/mainland-u.rss'),
#(u'地方', u'http://rss.chinatimes.com/rss/local-u.rss'),
#(u'言論', u'http://rss.chinatimes.com/rss/comment-u.rss'),
#(u'科技', u'http://rss.chinatimes.com/rss/technology-u.rss'),
#(u'運動', u'http://rss.chinatimes.com/rss/sport-u.rss'),
(u'藝文', u'http://rss.chinatimes.com/rss/philology-u.rss'),
#(u'旺報', u'http://rss.chinatimes.com/rss/want-u.rss'),
(u'財經', u'http://rss.chinatimes.com/rss/finance-u.rss'),
(u'股市', u'http://rss.chinatimes.com/rss/stock-u.rss')

]
extra_css = '''
@font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
body {margin-right: 8pt; font-family: 'DroidFont', serif;}\n
h1 {font-family: 'DroidFont', serif;}\n
.articledescription {font-family: 'DroidFont', serif;}
'''
__author__ = 'einstuerzende'
__version__ = '1.0'
language = 'zh-TW'
pubisher = 'China Times Group'
description = 'China Times (Taiwan)'
category = 'News, Chinese'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
encoding = 'big5'
conversion_options = {'linearize_tables':True}
masthead_url = 'http://www.fcuaa.org/gif/chinatimeslogo.gif'
keep_only_tags = [dict(name='div', attrs={'class':['articlebox','articlebox clearfix']})]
remove_tags = [dict(name='div', attrs={'class':['focus-news']})]


Liberty Times:

Spoiler:
class AdvancedUserRecipe1277443634(BasicNewsRecipe):
title = u'自由電子報'
oldest_article = 7
max_articles_per_feed = 100

feeds = [
(u'焦點新聞', u'http://www.libertytimes.com.tw/rss/fo.xml'),
(u'政治新聞', u'http://www.libertytimes.com.tw/rss/p.xml'),
(u'生活新聞', u'http://www.libertytimes.com.tw/rss/life.xml'),
(u'國際新聞', u'http://www.libertytimes.com.tw/rss/int.xml'),
(u'自由廣場', u'http://www.libertytimes.com.tw/rss/o.xml'),
#(u'社會新聞', u'http://www.libertytimes.com.tw/rss/so.xml'),
#(u'體育新聞', u'http://www.libertytimes.com.tw/rss/sp.xml'),
(u'財經焦點', u'http://www.libertytimes.com.tw/rss/e.xml'),
(u'證券理財', u'http://www.libertytimes.com.tw/rss/stock.xml'),
#(u'影視焦點', u'http://www.libertytimes.com.tw/rss/show.xml'),
#(u'北部新聞', u'http://www.libertytimes.com.tw/rss/north.xml'),
#(u'中部新聞', u'http://www.libertytimes.com.tw/rss/center.xml'),
#(u'南部新聞', u'http://www.libertytimes.com.tw/rss/south.xml'),
#(u'大台北新聞', u'http://www.libertytimes.com.tw/rss/taipei.xml'),
(u'藝術文化', u'http://www.libertytimes.com.tw/rss/art.xml'),
]
extra_css = '''
@font-face {font-family: "DroidFont", serif, sans-serif; src: url(res:///system/fonts/DroidSansFallback.ttf); }\n
body {margin-right: 8pt; font-family: 'DroidFont', serif;}\n
h1 {font-family: 'DroidFont', serif;}\n
.articledescription {font-family: 'DroidFont', serif;}
'''
__author__ = 'einstuerzende'
__version__ = '1.0'
language = 'zh-HANT'
pubisher = 'Liberty Times Group'
description = 'Liberty Times (Taiwan)'
category = 'News, Chinese'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
encoding = 'big5'
conversion_options = {'linearize_tables':True}
masthead_url = 'http://www.libertytimes.com.tw/2008/images/img_auto/005/logo_new.gif'
keep_only_tags = [dict(name='td', attrs={'id':['newsContent']})]


I'm commenting out feeds I think might be of less interest, but including all that seem reasonable. I'll see if I can't get a United Daily News recipe soon (about 8 hojillion RSS feeds on that site).
einstuerzende is offline  
Old 07-13-2010, 12:09 AM   #2310
alan_in_oz
Junior Member
alan_in_oz began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2010
Device: Kindle
Hi all,

I have my Kindle for 3 days and have been using Calibre for 2 days.

I haven't quite got my head arounde the recipe coding yet and was wondering if it is possible to fetch/convert the text only version of the Sydney Morning Herald found at www.smh.com.au/text ???

I find the current recipe for SMH doesn't provide me with all that I would like to read eg. Editorial, Letters and all Columnists.

Any suggestions and help would be apprweciated.
alan_in_oz is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:51 AM.


MobileRead.com is a privately owned, operated and funded community.