Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2010, 09:36 AM   #1
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
new recipe all done. and an idea.

the idea is that some of the web sites we use for recipes earn money for advertising. if we skip the article page and go to the print version, the site will suffer. in this recipe, and in all my future ones, i will download the article page before i go to the print version.

so this recipe is ready to be builtin.
Spoiler:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1283848012(BasicNewsRecipe):
    description   = 'This is a recipe of Calcalist.co.il. The recipe downloads the article page to not hurt the sites advertising income.'
    cover_url      = 'http://ftp5.bizportal.co.il/web/giflib/news/calcalist.JPG'
    title          = u'Calcalist'
    language              = _('Hebrew')
    __author__ = 'marbs'
    extra_css='img {max-width:100%;} body{direction: rtl;},title{direction: rtl; } ,article_description{direction: rtl; }, a.article{direction: rtl; } ,calibre_feed_description{direction: rtl; }'
    simultaneous_downloads = 5
    remove_javascript     = True
    timefmt        = '[%a, %d %b, %Y]'
    oldest_article = 1
    max_articles_per_feed = 100
    remove_attributes = ['width']
    simultaneous_downloads = 5
    keep_only_tags =dict(name='div', attrs={'id':'articleContainer'}) 
    remove_tags = [dict(name='p', attrs={'text':[' ']})]
    max_articles_per_feed = 100
    preprocess_regexps = [
        (re.compile(r'<p>&nbsp;</p>', re.DOTALL|re.IGNORECASE), lambda match: '')
        ]


    feeds          = [(u'\u05d3\u05e3 \u05d4\u05d1\u05d9\u05ea', u'http://www.calcalist.co.il/integration/StoryRss8.xml'),                            
                           (u'24/7', u'http://www.calcalist.co.il/integration/StoryRss3674.xml'), 
                           (u'\u05d1\u05d0\u05d6\u05d6', u'http://www.calcalist.co.il/integration/StoryRss3674.xml'),                            
                           (u'\u05de\u05d1\u05d6\u05e7\u05d9\u05dd', u'http://www.calcalist.co.il/integration/StoryRss184.xml'), 
                           (u'\u05d4\u05e9\u05d5\u05e7', u'http://www.calcalist.co.il/integration/StoryRss2.xml'), 
                           (u'\u05d1\u05d0\u05e8\u05e5', u'http://www.calcalist.co.il/integration/StoryRss14.xml'), 
                           (u'\u05d4\u05db\u05e1\u05e3', u'http://www.calcalist.co.il/integration/StoryRss9.xml'), 
                           (u'\u05e0\u05d3\u05dc"\u05df', u'http://www.calcalist.co.il/integration/StoryRss7.xml'), 
                           (u'\u05e2\u05d5\u05dc\u05dd', u'http://www.calcalist.co.il/integration/StoryRss13.xml'), 
                           (u'\u05e4\u05e8\u05e1\u05d5\u05dd \u05d5\u05e9\u05d9\u05d5\u05d5\u05e7', u'http://www.calcalist.co.il/integration/StoryRss5.xml'), 
                           (u'\u05e4\u05e0\u05d0\u05d9', u'http://www.calcalist.co.il/integration/StoryRss3.xml'), 
                           (u'\u05d8\u05db\u05e0\u05d5\u05dc\u05d5\u05d2\u05d9', u'http://www.calcalist.co.il/integration/StoryRss4.xml'), 
                           (u'\u05e2\u05e1\u05e7\u05d9 \u05e1\u05e4\u05d5\u05e8\u05d8', u'http://www.calcalist.co.il/integration/StoryRss18.xml')]
       
    def print_version(self, url):
        br = BasicNewsRecipe.get_browser()
        br.open(url)
        print 'ORG URL IS: ', url
        split1 = url.split("-")
        print 'THE SPLIT IS: ', split1 
        weblinks = url
        print_url = 'http://www.calcalist.co.il/Ext/Comp/ArticleLayout/CdaArticlePrintPreview/1,2506,L-' + split1[1]      
        print 'THIS URL WILL PRINT: ', print_url # this is a test string to see what the url is it will return
        return print_url
marbs is offline   Reply With Quote
Old 11-01-2010, 01:29 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,856
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Note that downloading the article page will almost certainly not help, since most ad systems rely on javascript to fetch the add once the page has loaded and since the news download system does not execute javascript, the ad view is never registered.

Generally speaking, most web based ad systems rely on the browser reporting back to the ad server. Since ebooks do not support javascript and are often viewwwed in a context without an internet connection, a web based ad system is unlikely to work for them.
kovidgoyal is offline   Reply With Quote
Advert
Old 11-01-2010, 05:01 PM   #3
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
i see. i will leave it like this and of anyone using the recipe feels it is too slow, they can remove the 2 lines of code.

in any case, it is a really good recipe, if i must say so my self.
marbs is offline   Reply With Quote
Old 11-01-2010, 08:24 PM   #4
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
All my years I have stripped ads from the websites I view using abp and so forth, but then you wanna add them to a ebook recipe? I see amazon in the near future taking and making the screensaver screen be an ad. Again when/if that happens I will install a jailbreak and remove that. If you really don't want the website not to suffer then consider sending them a paypal donation but ad's boo hiss on that!
TonytheBookworm is offline   Reply With Quote
Old 11-02-2010, 03:47 AM   #5
marbs
Zealot
marbs began at the beginning.
 
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
i agree with not wanting to see ads. but when the site goes to advertisers it says "we have 1 million visitors a month, an add will be some amount". that is what keeps them in business. no i understand there are more sophisticate ways of measuring ads, but i am sure the number of times a page is browsed to is a factor.

all i did was add two lines in print_version that open the original article before getting the print version. if it takes the recipe a couple more minutes to download, so be it. my computer can turn on on its own a few minutes earlier.

i want to support my news sites, and it makes no difference for the end file i get...
marbs is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My "read" tag idea enhancement for Calibre idea rcuadro Calibre 10 01-20-2011 04:23 PM
I have an Idea Tim32127 News 23 01-04-2010 11:55 PM
Unutterably Silly I have no idea. pshrynk Lounge 18 04-27-2009 02:09 AM


All times are GMT -4. The time now is 10:30 AM.


MobileRead.com is a privately owned, operated and funded community.