Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-03-2011, 05:36 PM   #1
alessandro_q
Member
alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.
 
alessandro_q's Avatar
 
Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
Download Only New Entries when Fetching News

I have calibre download 8 news feeds every morning to read over breakfast. It is not always clear which articles I have already read the previous day, as calibre seems to always download the entire feed (which also takes some time to do).

Is there a way to have calibre only download the new entries in each feed?

Cheers.
alessandro_q is offline   Reply With Quote
Old 03-03-2011, 06:03 PM   #2
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by alessandro_q View Post
Is there a way to have calibre only download the new entries in each feed?
No, but most recipes have a line like this:

oldest_article = 3 #days

If you download these recipes daily then changing the value, via the built in tool under add custom news source, to 1 will minimize overlap.
DoctorOhh is offline   Reply With Quote
Old 03-03-2011, 06:07 PM   #3
alessandro_q
Member
alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.
 
alessandro_q's Avatar
 
Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
How do you get to the code of existing recipes?

Nevermind, I've found "customize builtin recipe"

Does the number indicate a difference in date, or an actual 24-hour period? If it's the former, I might be better off leaving it at 2 to avoid missing any articles.

Last edited by alessandro_q; 03-03-2011 at 06:10 PM.
alessandro_q is offline   Reply With Quote
Old 03-03-2011, 06:47 PM   #4
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by alessandro_q View Post
Does the number indicate a difference in date, or an actual 24-hour period? If it's the former, I might be better off leaving it at 2 to avoid missing any articles.
I'm not sure, try 2 and adjust if needed.

You might learn more here.

Last edited by DoctorOhh; 03-03-2011 at 06:50 PM.
DoctorOhh is offline   Reply With Quote
Old 03-03-2011, 09:54 PM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by alessandro_q View Post
Is there a way to have calibre only download the new entries in each feed?
Yes.

See here.
There are other options that are less well developed.
Starson17 is offline   Reply With Quote
Old 03-03-2011, 10:40 PM   #6
alessandro_q
Member
alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.
 
alessandro_q's Avatar
 
Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
Thanks Starson. Can you give me some advice on how to include the code into the existing code for a news source. For example, here is the code for Gizmodo:

Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
gizmodo.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class Gizmodo(BasicNewsRecipe):
    title                 = 'Gizmodo'
    __author__            = 'Darko Miletic'
    description           = "Gizmodo, the gadget guide. So much in love with shiny new toys, it's unnatural."
    publisher             = 'gizmodo.com'
    category              = 'news, IT, Internet, gadgets'
    oldest_article        = 2
    max_articles_per_feed = 100
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = True
    language              = 'en'
    masthead_url          = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]

    remove_tags = [
            {'class': 'feedflare'},
    ]


    def preprocess_html(self, soup):
        return self.adeify_images(soup)
alessandro_q is offline   Reply With Quote
Old 03-04-2011, 08:45 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by alessandro_q View Post
Thanks Starson. Can you give me some advice on how to include the code into the existing code for a news source.
My advice would be not to do it. I wrote similar code and wasn't happy with it. Any error in a download and you don't get the articles the next day. You have to get every issue and read them in order. Having the most recent issue isn't enough. Ultimately, I decided I preferred keeping the ebook exactly like the feed, only having to successfully download one issue and just skipping over any articles I'd already read.

Have you tried the code I pointed you to?
Quote:
For example, here is the code for Gizmodo:
There's no need to post a copy of the code for builtin recipes.
Starson17 is offline   Reply With Quote
Old 03-04-2011, 09:09 PM   #8
alessandro_q
Member
alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.alessandro_q composes epic poetry in binary.
 
alessandro_q's Avatar
 
Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
I have not tried the code you pointed to. I meant to ask how to use the template. Here is my attempt:

Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5

class OnlyLatestRecipe(BasicNewsRecipe):
    title          = u'Gizmodo'
	__author__            = 'Darko Miletic'
    description           = "Gizmodo, the gadget guide. So much in love with shiny new toys, it's unnatural."
    publisher             = 'gizmodo.com'
    category              = 'news, IT, Internet, gadgets'
	
    oldest_article = 10000
    max_articles_per_feed = 10000
    no_stylesheets        = True
    encoding              = 'utf-8'
    use_embedded_content  = True
    language              = 'en'
    masthead_url          = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'
	
    feeds          = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]

    def parse_feeds(self):
        recipe_dir = os.path.join(config_dir,'recipes')
        hash_dir = os.path.join(recipe_dir,'recipe_storage')
        feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
        if not os.path.isdir(feed_dir):
            os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)

        feeds = BasicNewsRecipe.parse_feeds(self)

        for feed in feeds:
            feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='')
            feed_fn = os.path.join(feed_dir,feed_hash)

            past_items = set()
            if os.path.exists(feed_fn):
               with file(feed_fn) as f:
                   for h in f:
                       past_items.add(h.strip())
                       
            cur_items = set()
            for article in feed.articles[:]:
                item_hash = md5()
                if article.content: item_hash.update(article.content.encode('utf-8'))
                if article.summary: item_hash.update(article.summary.encode('utf-8'))
                item_hash = item_hash.hexdigest()
                if article.url:
                    item_hash = article.url + ':' + item_hash
                cur_items.add(item_hash)
                if item_hash in past_items:
                    feed.articles.remove(article)
            with file(feed_fn,'w') as f:
                for h in cur_items:
                    f.write(h+'\n')

        remove = [f for f in feeds if len(f) == 0 and
                self.remove_empty_feeds]
        for f in remove:
            feeds.remove(f)

        return feeds
		
	 conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    remove_tags = [
            {'class': 'feedflare'},
    ]


    def preprocess_html(self, soup):
        return self.adeify_images(soup)
Is there anything wrong with this code?
alessandro_q is offline   Reply With Quote
Old 03-05-2011, 10:03 AM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by alessandro_q View Post

Is there anything wrong with this code?
You have several indent errors, starting with the author. Just run it and it will report the errors. Don't use tabs; only use spaces.
Starson17 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Error for fetching news. nick_martin Calibre 0 11-26-2010 01:52 AM
Fetching News has gone bad... rogue_ronin Calibre 6 09-03-2010 08:41 AM
automating news fetching zerozombie72 Calibre 6 02-16-2010 04:31 PM
Fetching News In Calibre Rootman Calibre 2 11-11-2009 07:06 PM
Question about fetching the news spoudaios Sony Reader Dev Corner 4 01-27-2008 05:01 PM


All times are GMT -4. The time now is 05:43 PM.


MobileRead.com is a privately owned, operated and funded community.