Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 07-02-2010, 05:08 AM   #2221
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by elsuave View Post
Did a recipe for Foreign Policy (http://www.foreignpolicy.com/) ever come out? It's been mentioned a couple of times in this thread, with an unsuccessful attempt here: https://www.mobileread.com/forums/sho...&postcount=616.

If not, would anybody like to try their hand at it? RSS feed is available here: http://www.foreignpolicy.com/node/feed
Here it is: a recipe for FOREIGN POLICY.
Attached Files
File Type: zip Foreign Policy.zip (639 Bytes, 211 views)
rty is offline  
Old 07-02-2010, 08:14 AM   #2222
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
The Economist seems to have added an annoying "related items" in their print view. This is annoying since it is usually right in the center of an article. I've amended the recipe with the following tag, which removes the related items div. I suggest this gets added into the main distribution recipe, as I can't imagine it's useful to have those links there.

remove_tags = [dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
dict(attrs={'class':['dblClkTrk', 'ec-article-info']}),
dict(name='div', attrs={'class':'related-items'})]
bobbysteel is offline  
Advert
Old 07-02-2010, 08:55 AM   #2223
elsuave
Member
elsuave began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jun 2010
Device: Nook
Quote:
Originally Posted by rty View Post
Here it is: a recipe for FOREIGN POLICY.
Thank you, rty! It works like a charm.
elsuave is offline  
Old 07-02-2010, 09:43 AM   #2224
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by ehsahog View Post
Hi all,


Would it be possible for someone with knowledge to write a recipe for akihabara news (http://en.akihabaranews.com/feed)

Lots of thanks in advance!
/Anders
This site doesn't seem to have long enough articles to spend time on.

Last edited by rty; 07-03-2010 at 06:17 AM.
rty is offline  
Old 07-02-2010, 11:42 AM   #2225
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
New (updated) recipe for Foreign Policy:
Attached Files
File Type: zip foreign_policy.zip (922 Bytes, 215 views)

Last edited by kiklop74; 07-02-2010 at 01:04 PM.
kiklop74 is offline  
Advert
Old 07-02-2010, 12:23 PM   #2226
elsuave
Member
elsuave began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Jun 2010
Device: Nook
Quote:
Originally Posted by kiklop74 View Post
New recipe for Foreign Policy:
Kiklop: You appear to have included an .epub example, and not the recipe itself.
elsuave is offline  
Old 07-02-2010, 12:43 PM   #2227
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@bobbysteel: The related links are useful on readers that support web browsing.
kovidgoyal is online now  
Old 07-02-2010, 01:05 PM   #2228
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 800
Karma: 194644
Join Date: Dec 2007
Location: Argentina
Device: Kindle Voyage
I updated the post sorry for the mistake.
kiklop74 is offline  
Old 07-02-2010, 02:15 PM   #2229
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Quote:
Originally Posted by kovidgoyal View Post
@bobbysteel: The related links are useful on readers that support web browsing.
Fair enough, but they're extremely annoying and flow poorly on non-browsing readers. I use a Nook mostly in the subway and can't stand seeing these links. It seems like you should offer both options really. I'm happy to use my custom recipe of course

Thanks again for all the work you've put in to this Kovid!
bobbysteel is offline  
Old 07-02-2010, 04:03 PM   #2230
schnortz
Junior Member
schnortz began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
Thumbs up Help - Two Issues

I have not had luck finding a solution within my searches (probably not using the correct terminology.)

I have successfully modified the Cincinnati Enquirer recipe as the basis of my Appleton Post Crescent recipe as they both used similar web templates. However, I am not having luck with the following...

1. Remove the Additional Information box that comes up after a couple of paragraphs of each article. I have tried
Quote:
preprocess_regexps = [
(re.compile(r'<p></p><div*.</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]
without success.

2. Remove any RSS feeds that start with the word "Photo" or "Photos:"

Any guidance that you can give would be very helpful.
schnortz is offline  
Old 07-02-2010, 05:06 PM   #2231
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by schnortz View Post
I am not having luck with the following...
I enjoy answering these little puzzles, but it's a lot easier if you provide a link to the page that you are having trouble with, and a copy of the recipe you're using.

Here you are asking why this doesn't match something. Usually, that would be impossible without a link to the "something," but I do see an error in this.
Quote:
1. Remove the Additional Information box that comes up after a couple of paragraphs of each article. I have tried
Code:
preprocess_regexps = [
(re.compile(r'<p></p><div*.</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]
without success.

I assume you wanted to delete everything in the <div> tag, but you reversed the "everything." it should be ".*" not "*."

Quote:
2. Remove any RSS feeds that start with the word "Photo" or "Photos:"

Any guidance that you can give would be very helpful.
I suspect you want to remove any articles that start with those words, not "feeds" - correct? You control the list of feeds.
For articles, I used to think that filter_regexps would do that job, but I never got it to work. Maybe it only works on recursed links, not the main article link.
Starson17 is offline  
Old 07-02-2010, 07:29 PM   #2232
schnortz
Junior Member
schnortz began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
Starson17... thanks for responding.

The recipe I am using is the following (modified with your suggested change, even if it was unsuccessful). Hope I'm not violating etiquette by posting the code.

Quote:
import string, re

#!/usr/bin/env python
__license__ = 'GPL v3'
__copyright__ = '2009 Kovid Goyal <kovid at kovidgoyal.net>'

from calibre.web.feeds.news import BasicNewsRecipe

class AppletonPostCrescent(BasicNewsRecipe):
title = u'Appleton Post Crescent'
oldest_article = 2
language = 'en'

__author__ = 'Joseph Kitzmiller and Sujata Raman'
max_articles_per_feed = 25
no_stylesheets = True
use_embedded_content = False
remove_javascript = True
encoding = 'cp1252'
cover_url = u'http://www.postcrescent.com/ic/assets/frontpage.pdf'
publisher = 'Appleton Post Crescent, Gannett'
category = 'news, Appleton, Fox Cities, Wisconsin'

extra_css = '''
h1{font-family:Arial,Helvetica,sans-serif; font-size:large; color:#0E5398; }
h2{color:#666666;}
.blog_title{color:#4E0000; font-family:Georgia,"Times New Roman",Times,serif; font-size:large;}
.sidebar-photo{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:30%;}
.blog_post{font-family:Arial,Helvetica,sans-serif; color:#222222; font-size:xx-small;}
.article-bodytext{font-family:Arial,Helvetica,sans-serif; font-size:xx-small; color:#222222;font-weight:normal;}
.ratingbyline{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:50%;}
.author{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
.date{font-family:Arial,Helvetica,sans-serif; color:#777777; font-size:50%;}
.padding{font-family:Arial,Helvetica,sans-serif; font-size:70%; color:#222222; font-weight:normal;}
'''

preprocess_regexps = [
(re.compile(r'<p></p><div.*</div>', re.IGNORECASE | re.DOTALL), lambda match : r''),
]

keep_only_tags = [dict(name='div', attrs={'class':['padding','sidebar-photo']})]

remove_tags = [ dict(name=['object','link','table','embed','script', 'noscript'])
,dict(name='div',attrs={'id':["pluckcomments","StoryChat"]})
,dict(name='div',attrs={'class':['article-tools',"padding article-sidebar",'articleflex-container','poster-container','newslist','footer-container','sidebar-related','sub']})
,dict(name='p',attrs={'class':['posted','tags']})]

feeds = [(u'Breaking News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSbreaking.pbs&mime=xml'),
(u'Latest Headlines', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlatest.pbs&mime=xml'),
(u'Local News', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSlocal.pbs&mime=xml'),
(u'Sports', u'http://www.postcrescent.com/apps/pbcs.dll/misc?URL=/templates/RSSsports.pbs&mime=xml'),
(u'Buzz Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9a8980f0-f726-439c-8c4e-1dc0f788941e'),
(u'Weekend Blog', u'http://sitelife.postcrescent.com/ver1.0/Blog/BlogRss?plckBlogId=Blog:9dbf4deb-0468-41b7-a0c7-3a777c03d64c')]


def preprocess_html(self, soup):
for item in soup.findAll(style=True):
del item['style']
for item in soup.findAll(face=True):
del item['face']
return soup
And as requested... here is a link to an artichttps://www.mobileread.com/forums/newreply.php?do=newreply&noquote=1&p=989970le that has the "Additional Information box"... http://www.postcrescent.com/article/...AA&located=rss

And yes, I meant articles. Here is their Local News RSS Feed... http://www.postcrescent.com/apps/pbc...l.pbs&mime=xml As of now, there were a couple of "Photos: ..."

Thanks in advance.
[/LIST]
schnortz is offline  
Old 07-03-2010, 04:15 AM   #2233
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Recipe for BBC Chinese

Language: Chinese (Simplified)
PS. Tested OK on B&N Nook.
Attached Files
File Type: zip BBC Chinese.zip (928 Bytes, 194 views)

Last edited by rty; 07-03-2010 at 06:18 AM.
rty is offline  
Old 07-03-2010, 06:14 AM   #2234
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by sibermage View Post
I did a search but couldn't find anything regarding the Sing Tao news site.
Is it possible to obtain the news articles for http://news.singtao.ca/toronto/

Thanks.
Here it is: Recipe for SINGTAO DAILY CANADA

Language: Chinese (Traditional)
Tested OK on B&N Nook e-reader.

Updated: Recipe updated to remove the hidden/bogus tab character that prevented the recipe to be imported into Calibre.
Attached Files
File Type: zip Singtao Daily.zip (1.3 KB, 203 views)

Last edited by rty; 07-04-2010 at 10:47 AM.
rty is offline  
Old 07-03-2010, 10:32 AM   #2235
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Recipe for China Economic Net Magazine

Tested OK B&N Nook
Language: Chinese (Simplified)
Attached Thumbnails
Click image for larger version

Name:	P1030912.jpg
Views:	247
Size:	57.5 KB
ID:	54489  
Attached Files
File Type: zip China Econmic Net Magzazine.zip (971 Bytes, 196 views)

Last edited by rty; 07-03-2010 at 11:08 AM.
rty is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 05:54 AM.


MobileRead.com is a privately owned, operated and funded community.