Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 07-17-2010, 03:51 AM   #2326
tayseidel
Zealot
tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.tayseidel can program the VCR without an owner's manual.
 
tayseidel's Avatar
 
Posts: 146
Karma: 189664
Join Date: Feb 2009
Device: Glo HD, Aura H20, PRS-T1
I would like a custom recipe to download print articles from thecolumbian.com. I tried to modify the recipe to add in the ?print after each url but failed. For instance, for each article you visit at the thecolumbian.com you simply need to type "?print" (without the quotation marks) and you can view the print edition. I would like a recipe for the all the RSS feeds on the site if possible, using the print version.

Examples:

http://www.columbian.com/news/2010/j...fort-festival/

just type in ?print after the slash and you get the print edition


http://www.columbian.com/news/2010/j...estival/?print
tayseidel is offline  
Old 07-17-2010, 04:44 AM   #2327
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by koray View Post
Can somebody help me with writing a recipe for MIT Technology Review?

Cheers,

Koray
Sure... for a nice fella who knows how to ask nicely!

Recipe for Technology Review:

Updated to remove the Flash Macromedia advertisement.

@Kovid: I have updated the recipe for Alternet as well to remove the "Width" attribute so that it can display properly on reading devices. https://www.mobileread.com/forums/sho...postcount=2325
Attached Files
File Type: zip Technology Review.zip (817 Bytes, 228 views)

Last edited by rty; 07-17-2010 at 09:14 AM.
rty is offline  
Advert
Old 07-17-2010, 05:03 AM   #2328
koray
Junior Member
koray began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2010
Location: Ankara, Turkey
Device: PRS-300
Thumbs up

Quote:
Originally Posted by rty View Post
Sure... for a nice fella who knows how to ask nicely!

Recipe for Technology Review:
A million thanks, rty! Works fabulously!

Cheers,

K.
koray is offline  
Old 07-18-2010, 12:24 AM   #2329
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by rty View Post
Here it is:

Recipe for ALTERNET.ORG


ps. The original print pages at alternet.org got broken logo

UPDATED to remove the predefined width display
thanks a million!!!
TonytheBookworm is offline  
Old 07-18-2010, 04:50 PM   #2330
strick242
Junior Member
strick242 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2010
Device: iphone and stanza
Custom Recipe Request

I would like to have a recipe for The Tampa Tribune. I'm having a hard time following the instructions myself, so maybe one of you guru's can help me out...thanks!
http://www.tampatrib.com/
strick242 is offline  
Advert
Old 07-18-2010, 08:01 PM   #2331
tbrenske
Junior Member
tbrenske began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2010
Device: nook
has anyone had a chance to look at relevantmagazine.com?
tbrenske is offline  
Old 07-18-2010, 09:29 PM   #2332
bhandarisaurabh
Enthusiast
bhandarisaurabh began at the beginning.
 
Posts: 49
Karma: 10
Join Date: Aug 2009
Device: none
Quote:
Originally Posted by bhandarisaurabh View Post
AN SOMEONE MAKE RECIPE FOR WHARTON INDIA@ KNOWLEDGE
http://knowledge.wharton.upenn.edu/india/rss/

AND FINANCIAL EXPRESS PRINT EDITION WITHOUT USING FEEDS AND USING THE LINK
http://www.financialexpress.com/print/
I had posted this request earlier too,if someone can help please do.
Thanks in advance
bhandarisaurabh is offline  
Old 07-21-2010, 02:22 AM   #2333
iLeaveYou
Junior Member
iLeaveYou began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2010
Device: Kindle DX
Hello!!!
I was asking for this before.
Maybe I didn't do it nice enough or nobody was available (able) to do it.
Could somebody be that kind and do a recipe for this: http://www.realitatea.net/rss.html ?
They probably have the best rss feeds for the best Romanian News.
I would do it myself but I was never good in such a deep thing.
Your support is greatly appreciated.
iLeaveYou is offline  
Old 07-21-2010, 09:36 PM   #2334
mohmedic
Junior Member
mohmedic began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
UNCLE!!

ok, I have tried to figure out what the heck you guys are doing for other feeds and apply them to mine but I ain't that smart!!
Here is my half finished recipe if someone would be so kind as to take a look and tell me how i can get this website minus all the crap!! i have the print pages but couldn't figure out how to do the find replace to change 2 different parts of the url.
thanks!

Code:
class AdvancedUserRecipe1279635146(BasicNewsRecipe):
    title          = u'EMS1'
    oldest_article = 7
    max_articles_per_feed = 100

    use_embedded_content = False
    no_stylesheets = True
   
  

    feeds          = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'),
                          (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), 
                          (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), 
                          (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), 
                          (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')]
    
    def print_version(self, url):
        baseurl = url.rpartition('/?')[0]
        turl = baseurl.partition('/reviews/')[2]
        return 'http://www.ems1.com/print.asp?act=print&vid=' + turl
mohmedic is offline  
Old 07-22-2010, 01:03 AM   #2335
rty
Zealot
rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.rty got an A in P-Chem.
 
Posts: 108
Karma: 6066
Join Date: Apr 2010
Location: Singapore
Device: iPad Air, Kindle DXG, Kindle Paperwhite
Quote:
Originally Posted by mohmedic View Post
ok, I have tried to figure out what the heck you guys are doing for other feeds and apply them to mine but I ain't that smart!!
Here is my half finished recipe if someone would be so kind as to take a look and tell me how i can get this website minus all the crap!! i have the print pages but couldn't figure out how to do the find replace to change 2 different parts of the url.
thanks!
You can study the recipe I made for Technology Review above (post#2327). It's quite similar.

Take one article for example:
'http://www.ems1.com/fire-ems/articles/852270-EMT-with-firemans-key-accused-of-NY-sex-attacks/'.

The print version for this article is
'http://www.ems1.com/print.asp?act=print&vid=852270'

Your base URL for the print version should be 'http://www.ems1.com/print.asp?act=print&vid='. You need to append this base URL with the number found in the original article URL, i.e. 852270. To extract this number you need to split the URL using "/" and "-" as the delimiters for the splits.
rty is offline  
Old 07-22-2010, 08:01 AM   #2336
trustin
Junior Member
trustin began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2009
Device: Sony Reader PRS-700BC
Recipe for media.daum.net (Korean news portal)

I'm not sure if this thread is the right place to post my recipe, but here it is:

Code:
import re
from datetime import date, timedelta

from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, NavigableString ,Comment

class MediaDaumRecipe(BasicNewsRecipe):
    title = u'\uBBF8\uB514\uC5B4 \uB2E4\uC74C \uC624\uB298\uC758 \uC8FC\uC694 \uB274\uC2A4'
    language  = 'ko'
    max_articles = 100

    timefmt = ''
    masthead_url = 'http://img-media.daum-img.net/2010ci/service_news.gif'
    cover_margins = (18,18,'grey99')
    no_stylesheets = True
    remove_tags_before = dict(id='GS_con')
    remove_tags_after  = dict(id='GS_con')
    remove_tags = [dict(attrs={'class':[
                            'bline',
                            'GS_vod',
                            ]}),
                   dict(id=[
                            'GS_swf_poll',
                            'ad250',
                            ]),
                   dict(name=['script', 'noscript', 'style', 'object'])]
    preprocess_regexps = [
       (re.compile(r'<\s+', re.DOTALL|re.IGNORECASE),
        lambda match: '&lt; '),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*){3,}', re.DOTALL|re.IGNORECASE),
        lambda match: ''),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</div>', re.DOTALL|re.IGNORECASE),
        lambda match: '</div>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</p>', re.DOTALL|re.IGNORECASE),
        lambda match: '</p>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</td>', re.DOTALL|re.IGNORECASE),
        lambda match: '</td>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</strong>', re.DOTALL|re.IGNORECASE),
        lambda match: '</strong>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</b>', re.DOTALL|re.IGNORECASE),
        lambda match: '</b>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</em>', re.DOTALL|re.IGNORECASE),
        lambda match: '</em>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*</i>', re.DOTALL|re.IGNORECASE),
        lambda match: '</i>'),
       (re.compile(u'\(\uB05D\)[ \t\r\n]*<br[^>]*>.*</div>', re.DOTALL|re.IGNORECASE),
        lambda match: '</div>'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<div', re.DOTALL|re.IGNORECASE),
        lambda match: '<div'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<p', re.DOTALL|re.IGNORECASE),
        lambda match: '<p'),
       (re.compile(r'(<br[^>]*>[ \t\r\n]*)*<table', re.DOTALL|re.IGNORECASE),
        lambda match: '<table'),
       (re.compile(r'<strong>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE),
        lambda match: '<strong>'),
       (re.compile(r'<b>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE),
        lambda match: '<b>'),
       (re.compile(r'<em>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE),
        lambda match: '<em>'),
       (re.compile(r'<i>(<br[^>]*>[ \t\r\n]*)*', re.DOTALL|re.IGNORECASE),
        lambda match: '<i>'),
       (re.compile(u'(<br[^>]*>[ \t\r\n]*)*(\u25B6|\u25CF|\u261E|\u24D2|\(c\))*\[[^\]]*(\u24D2|\(c\)|\uAE30\uC0AC|\uC778\uAE30[^\]]*\uB274\uC2A4)[^\]]*\].*</div>', re.DOTALL|re.IGNORECASE),
        lambda match: '</div>'),
    ]

    def parse_index(self):
        today = date.today();
        articles = []
        articles = self.parse_list_page(articles, today)
        articles = self.parse_list_page(articles, today - timedelta(1))
        return [('\uBBF8\uB514\uC5B4 \uB2E4\uC74C \uC624\uB298\uC758 \uC8FC\uC694 \uB274\uC2A4', articles)]
        

    def parse_list_page(self, articles, date):
        if len(articles) >= self.max_articles:
            return articles

        for page in range(1, 10):
            soup = self.index_to_soup('http://media.daum.net/primary/total/list.html?cateid=100044&date=%(date)s&page=%(page)d' % {'date': date.strftime('%Y%m%d'), 'page': page})
            done = True
            for item in soup.findAll('dl'):
                dt = item.find('dt', { 'class': 'tit' })
                dd = item.find('dd', { 'class': 'txt' })
                if dt is None:
                    break
                a = dt.find('a', href=True)
                url = 'http://media.daum.net/primary/total/' + a['href']
                title = self.tag_to_string(dt)
                if dd is None:
                    description = ''
                else:
                    description = self.tag_to_string(dd)
                articles.append(dict(title=title, description=description, url=url, content=''))
                done = len(articles) >= self.max_articles                   
                if done:
                    break
            if done:
                break
        return articles


    def preprocess_html(self, soup):
        return self.strip_anchors(soup)

    def strip_anchors(self, soup):
        for para in soup.findAll(True):
            aTags = para.findAll('a')
            for a in aTags:
                if a.img is None:
                    a.replaceWith(a.renderContents().decode('utf-8','replace'))
        return soup
This recipe fetches the latest top stories from http://media.daum.net/ which is one of the most popular news portal in South Korea. This is my first attempt to write a recipe and I'm not a Python user, so it might have some rough edges, but it just works fine for me.

As a backup, I also uploaded this recipe to http://pastebin.com/mEptXLsN

Last edited by trustin; 07-22-2010 at 12:41 PM. Reason: Fixed more bugs
trustin is offline  
Old 07-22-2010, 10:43 AM   #2337
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Starson17 View Post
Quote:
Originally Posted by CaptainJSK View Post
1) I would like them to be displayed in reverse order (i.e., older entries first) so that I can catch up on things I've missed
Search this thread for "reverse" and look at my GoComics recipe. It does a reverse of date order for comic strips with:
current_articles.reverse()
It requires that you build the article feed yourself before reversing it.
This is an old question, but I only recently learned that there's another solution. There's a built-in option to reverse article order. I've never seen it used in a recipe or documented, but found it while perusing Calibre's code. It's:
Code:
reverse_article_order = True
Add this to a recipe and the article order is switched to oldest first. I've been using it in comics recipes.
Starson17 is offline  
Old 07-22-2010, 02:43 PM   #2338
mohmedic
Junior Member
mohmedic began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
uncle uncle

rty,
thanks for your help but I still am at a loss. i added the print page lines and now get less. i don't think i set up the split right (copy and paste from tech review and altered)
Code:
class AdvancedUserRecipe1279635146(BasicNewsRecipe):
    title          = u'EMS1'
    oldest_article = 7
    max_articles_per_feed = 100

    use_embedded_content = False
  
   
  

    feeds          = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'),
                          (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), 
                          (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), 
                          (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), 
                          (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')]
    
    def print_version(self, url):
        baseurl='http://www.ems1.com/print.asp?act=print&vid=' 
        split1 = string.split(url,"/")
        xxx=split1 [4]
        split2= string.split(xxx,"-")  
        s =  baseurl + split2[0]
        return s
mohmedic is offline  
Old 07-22-2010, 04:01 PM   #2339
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mohmedic View Post
rty,
thanks for your help but I still am at a loss. i added the print page lines and now get less. i don't think i set up the split right (copy and paste from tech review and altered)
Spoiler:
Code:
class AdvancedUserRecipe1279635146(BasicNewsRecipe):
    title          = u'EMS1'
    oldest_article = 7
    max_articles_per_feed = 100

    use_embedded_content = False

    feeds          = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'),
                          (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), 
                          (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), 
                          (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), 
                          (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')]
    
    def print_version(self, url):
        baseurl='http://www.ems1.com/print.asp?act=print&vid=' 
        split1 = string.split(url,"/")
        xxx=split1 [4]
        split2= string.split(xxx,"-")  
        s =  baseurl + split2[0]
        return s
You have a couple of problems. To start, you didn't import string.

You can fix that with:
import string
from calibre.web.feeds.news import BasicNewsRecipe
Next, your xxx=split1 [4] is wrong. Worse, it sometimes should be xxx=split1[5] and other times should be xxx=split1[6]

You need to test the result of the split2 to see if it's an integer. There's lots of ways to do it. I used a try/except and integer conversion. I also changed the split, so the import of string is not needed, but I left it in, in case you want to use it. Note that this only works if the number you need is in position 5 or 6. I didn't test all the recipe to see if it's ever in another location in the URL

Try this:
Spoiler:

Code:
import string
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1279635146(BasicNewsRecipe):
    title          = u'EMS1'
    oldest_article = 7
    max_articles_per_feed = 100

    use_embedded_content = False

    feeds          = [(u'columnist', u'http://www.ems1.com/ems-rss-feeds/columnists.xml'),
                          (u'topics', u'http://www.ems1.com/ems-rss-feeds/topics.xml'), 
                          (u'most popular', u'http://www.ems1.com/ems-rss-feeds/most-popular-articles.xml'), 
                          (u'EMS Tips', u'http://www.ems1.com/ems-rss-feeds/tips.xml'), 
                          (u'Daily news', u'http://www.ems1.com/ems-rss-feeds/news.xml')]
    
    def print_version(self, url):
        baseurl='http://www.ems1.com/print.asp?act=print&vid=' 
        split1 = url.split("/")
        xxx=split1[6]
        split2= xxx.split("-")
        yyy =  split1[5] 
        split3= yyy.split("-")
        try:
          final =str(int(split2[0]))
        except:
          final  = str(int(split3[0]))
        s = baseurl + final
        return s

Last edited by Starson17; 07-22-2010 at 04:21 PM.
Starson17 is offline  
Old 07-22-2010, 05:10 PM   #2340
mohmedic
Junior Member
mohmedic began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2010
Device: Nook
thank you starson17. this works fine and i have more to fix. i will post with more questions i am sure
mohmedic is offline  
Closed Thread


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom column read ? pchrist7 Calibre 2 10-04-2010 02:52 AM
Archive for custom screensavers sleeplessdave Amazon Kindle 1 07-07-2010 12:33 PM
How to back up preferences and custom recipes? greenapple Calibre 3 03-29-2010 05:08 AM
Donations for Custom Recipes ddavtian Calibre 5 01-23-2010 04:54 PM
Help understanding custom recipes andersent Calibre 0 12-17-2009 02:37 PM


All times are GMT -4. The time now is 02:54 PM.


MobileRead.com is a privately owned, operated and funded community.