![]() |
#2326 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 146
Karma: 189664
Join Date: Feb 2009
Device: Glo HD, Aura H20, PRS-T1
|
I would like a custom recipe to download print articles from thecolumbian.com. I tried to modify the recipe to add in the ?print after each url but failed. For instance, for each article you visit at the thecolumbian.com you simply need to type "?print" (without the quotation marks) and you can view the print edition. I would like a recipe for the all the RSS feeds on the site if possible, using the print version.
Examples: http://www.columbian.com/news/2010/j...fort-festival/ just type in ?print after the slash and you get the print edition http://www.columbian.com/news/2010/j...estival/?print |
![]() |
![]() |
#2327 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
Quote:
![]() Recipe for Technology Review: Updated to remove the Flash Macromedia advertisement. @Kovid: I have updated the recipe for Alternet as well to remove the "Width" attribute so that it can display properly on reading devices. https://www.mobileread.com/forums/sho...postcount=2325 Last edited by rty; 07-17-2010 at 09:14 AM. |
|
![]() |
Advert | |
|
![]() |
#2328 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jul 2010
Location: Ankara, Turkey
Device: PRS-300
|
![]() |
![]() |
![]() |
#2329 |
Addict
![]() Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
|
![]() |
![]() |
#2330 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jul 2010
Device: iphone and stanza
|
Custom Recipe Request
I would like to have a recipe for The Tampa Tribune. I'm having a hard time following the instructions myself, so maybe one of you guru's can help me out...thanks!
http://www.tampatrib.com/ |
![]() |
Advert | |
|
![]() |
#2331 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jul 2010
Device: nook
|
has anyone had a chance to look at relevantmagazine.com?
|
![]() |
![]() |
#2332 | |
Enthusiast
![]() Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
|
Quote:
Thanks in advance |
|
![]() |
![]() |
#2333 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jul 2010
Device: Kindle DX
|
Hello!!!
I was asking for this before. Maybe I didn't do it nice enough or nobody was available (able) to do it. Could somebody be that kind and do a recipe for this: http://www.realitatea.net/rss.html ? They probably have the best rss feeds for the best Romanian News. I would do it myself but I was never good in such a deep thing. Your support is greatly appreciated. ![]() |
![]() |
![]() |
#2334 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
|
UNCLE!!
ok, I have tried to figure out what the heck you guys are doing for other feeds and apply them to mine but I ain't that smart!!
Here is my half finished recipe if someone would be so kind as to take a look and tell me how i can get this website minus all the crap!! i have the print pages but couldn't figure out how to do the find replace to change 2 different parts of the url. thanks! Code:
class AdvancedUserRecipe1279635146(BasicNewsRecipe): title = u'EMS1' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False no_stylesheets = True feeds = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'), (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')] def print_version(self, url): baseurl = url.rpartition('/?')[0] turl = baseurl.partition('/reviews/')[2] return 'http://www.ems1.com/print.asp?act=print&vid=' + turl |
![]() |
![]() |
#2335 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
|
Quote:
Take one article for example: 'http://www.ems1.com/fire-ems/articles/852270-EMT-with-firemans-key-accused-of-NY-sex-attacks/'. The print version for this article is 'http://www.ems1.com/print.asp?act=print&vid=852270' Your base URL for the print version should be 'http://www.ems1.com/print.asp?act=print&vid='. You need to append this base URL with the number found in the original article URL, i.e. 852270. To extract this number you need to split the URL using "/" and "-" as the delimiters for the splits. |
|
![]() |
![]() |
#2336 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jul 2009
Device: Sony Reader PRS-700BC
|
Recipe for media.daum.net (Korean news portal)
I'm not sure if this thread is the right place to post my recipe, but here it is:
Code:
import re from datetime import date, timedelta from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString ,Comment class MediaDaumRecipe(BasicNewsRecipe): title = u'\uBBF8\uB514\uC5B4 \uB2E4\uC74C \uC624\uB298\uC758 \uC8FC\uC694 \uB274\uC2A4' language = 'ko' max_articles = 100 timefmt = '' masthead_url = 'http://img-media.daum-img.net/2010ci/service_news.gif' cover_margins = (18,18,'grey99') no_stylesheets = True remove_tags_before = dict(id='GS_con') remove_tags_after = dict(id='GS_con') remove_tags = [dict(attrs={'class':[ 'bline', 'GS_vod', ]}), dict(id=[ 'GS_swf_poll', 'ad250', ]), dict(name=['script', 'noscript', 'style', 'object'])] preprocess_regexps = [ (re.compile(r'<\s+', re.DOTALL|re.IGNORECASE), lambda match: '< '), (re.compile(r'(<br[^>]*>[ \t\r\n]*){3,}', re.DOTALL|re.IGNORECASE), lambda match: ''), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</div>', re.DOTALL|re.IGNORECASE), lambda match: '</div>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</p>', re.DOTALL|re.IGNORECASE), lambda match: '</p>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</td>', re.DOTALL|re.IGNORECASE), lambda match: '</td>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</strong>', re.DOTALL|re.IGNORECASE), lambda match: '</strong>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</b>', re.DOTALL|re.IGNORECASE), lambda match: '</b>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</em>', re.DOTALL|re.IGNORECASE), lambda match: '</em>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</i>', re.DOTALL|re.IGNORECASE), lambda match: '</i>'), (re.compile(u'\(\uB05D\)[ \t\r\n]*<br[^>]*>.*</div>', re.DOTALL|re.IGNORECASE), lambda match: '</div>'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<div', re.DOTALL|re.IGNORECASE), lambda match: '<div'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<p', re.DOTALL|re.IGNORECASE), lambda match: '<p'), (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<table', re.DOTALL|re.IGNORECASE), lambda match: '<table'), (re.compile(r'<strong>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE), lambda match: '<strong>'), (re.compile(r'<b>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE), lambda match: '<b>'), (re.compile(r'<em>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE), lambda match: '<em>'), (re.compile(r'<i>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE), lambda match: '<i>'), (re.compile(u'(<br[^>]*>[ \t\r\n]*)*(\u25B6|\u25CF|\u261E|\u24D2|\(c\))*\[[^\]]*(\u24D2|\(c\)|\uAE30\uC0AC|\uC778\uAE30[^\]]*\uB274\uC2A4)[^\]]*\].*</div>', re.DOTALL|re.IGNORECASE), lambda match: '</div>'), ] def parse_index(self): today = date.today(); articles = [] articles = self.parse_list_page(articles, today) articles = self.parse_list_page(articles, today - timedelta(1)) return [('\uBBF8\uB514\uC5B4 \uB2E4\uC74C \uC624\uB298\uC758 \uC8FC\uC694 \uB274\uC2A4', articles)] def parse_list_page(self, articles, date): if len(articles) >= self.max_articles: return articles for page in range(1, 10): soup = self.index_to_soup('http://media.daum.net/primary/total/list.html?cateid=100044&date=%(date)s&page=%(page)d' % {'date': date.strftime('%Y%m%d'), 'page': page}) done = True for item in soup.findAll('dl'): dt = item.find('dt', { 'class': 'tit' }) dd = item.find('dd', { 'class': 'txt' }) if dt is None: break a = dt.find('a', href=True) url = 'http://media.daum.net/primary/total/' + a['href'] title = self.tag_to_string(dt) if dd is None: description = '' else: description = self.tag_to_string(dd) articles.append(dict(title=title, description=description, url=url, content='')) done = len(articles) >= self.max_articles if done: break if done: break return articles def preprocess_html(self, soup): return self.strip_anchors(soup) def strip_anchors(self, soup): for para in soup.findAll(True): aTags = para.findAll('a') for a in aTags: if a.img is None: a.replaceWith(a.renderContents().decode('utf-8','replace')) return soup As a backup, I also uploaded this recipe to http://pastebin.com/mEptXLsN Last edited by trustin; 07-22-2010 at 12:41 PM. Reason: Fixed more bugs |
![]() |
![]() |
#2337 | ||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Code:
reverse_article_order = True |
||
![]() |
![]() |
#2338 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
|
uncle uncle
rty,
thanks for your help but I still am at a loss. i added the print page lines and now get less. i don't think i set up the split right (copy and paste from tech review and altered) Code:
class AdvancedUserRecipe1279635146(BasicNewsRecipe): title = u'EMS1' oldest_article = 7 max_articles_per_feed = 100 use_embedded_content = False feeds = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'), (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')] def print_version(self, url): baseurl='http://www.ems1.com/print.asp?act=print&vid=' split1 = string.split(url,"/") xxx=split1 [4] split2= string.split(xxx,"-") s = baseurl + split2[0] return s |
![]() |
![]() |
#2339 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
You can fix that with: import string from calibre.web.feeds.news import BasicNewsRecipe Next, your xxx=split1 [4] is wrong. Worse, it sometimes should be xxx=split1[5] and other times should be xxx=split1[6] You need to test the result of the split2 to see if it's an integer. There's lots of ways to do it. I used a try/except and integer conversion. I also changed the split, so the import of string is not needed, but I left it in, in case you want to use it. Note that this only works if the number you need is in position 5 or 6. I didn't test all the recipe to see if it's ever in another location in the URL Try this: Spoiler:
Last edited by Starson17; 07-22-2010 at 04:21 PM. |
|
![]() |
![]() |
#2340 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
|
thank you starson17. this works fine and i have more to fix. i will post with more questions i am sure
|
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Custom column read ? | pchrist7 | Calibre | 2 | 10-04-2010 02:52 AM |
Archive for custom screensavers | sleeplessdave | Amazon Kindle | 1 | 07-07-2010 12:33 PM |
How to back up preferences and custom recipes? | greenapple | Calibre | 3 | 03-29-2010 05:08 AM |
Donations for Custom Recipes | ddavtian | Calibre | 5 | 01-23-2010 04:54 PM |
Help understanding custom recipes | andersent | Calibre | 0 | 12-17-2009 02:37 PM |