|
|
#1 |
|
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
|
Download Only New Entries when Fetching News
I have calibre download 8 news feeds every morning to read over breakfast. It is not always clear which articles I have already read the previous day, as calibre seems to always download the entire feed (which also takes some time to do).
Is there a way to have calibre only download the new entries in each feed? Cheers. |
|
|
|
|
|
#2 | |
|
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
oldest_article = 3 #days If you download these recipes daily then changing the value, via the built in tool under add custom news source, to 1 will minimize overlap. |
|
|
|
|
|
|
#3 |
|
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
|
How do you get to the code of existing recipes?
Nevermind, I've found "customize builtin recipe" Does the number indicate a difference in date, or an actual 24-hour period? If it's the former, I might be better off leaving it at 2 to avoid missing any articles. Last edited by alessandro_q; 03-03-2011 at 06:10 PM. |
|
|
|
|
|
#4 | |
|
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
You might learn more here. Last edited by DoctorOhh; 03-03-2011 at 06:50 PM. |
|
|
|
|
|
|
#6 |
|
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
|
Thanks Starson. Can you give me some advice on how to include the code into the existing code for a news source. For example, here is the code for Gizmodo:
Code:
__license__ = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
gizmodo.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
class Gizmodo(BasicNewsRecipe):
title = 'Gizmodo'
__author__ = 'Darko Miletic'
description = "Gizmodo, the gadget guide. So much in love with shiny new toys, it's unnatural."
publisher = 'gizmodo.com'
category = 'news, IT, Internet, gadgets'
oldest_article = 2
max_articles_per_feed = 100
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]
remove_tags = [
{'class': 'feedflare'},
]
def preprocess_html(self, soup):
return self.adeify_images(soup)
|
|
|
|
|
|
#7 | ||
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Have you tried the code I pointed you to? Quote:
|
||
|
|
|
|
|
#8 |
|
Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 23
Karma: 90010
Join Date: Mar 2011
Device: Kindle 3
|
I have not tried the code you pointed to. I meant to ask how to use the template. Here is my attempt:
Code:
from calibre.constants import config_dir, CONFIG_DIR_MODE
import os, os.path, urllib
from hashlib import md5
class OnlyLatestRecipe(BasicNewsRecipe):
title = u'Gizmodo'
__author__ = 'Darko Miletic'
description = "Gizmodo, the gadget guide. So much in love with shiny new toys, it's unnatural."
publisher = 'gizmodo.com'
category = 'news, IT, Internet, gadgets'
oldest_article = 10000
max_articles_per_feed = 10000
no_stylesheets = True
encoding = 'utf-8'
use_embedded_content = True
language = 'en'
masthead_url = 'http://cache.gawkerassets.com/assets/gizmodo.com/img/logo.png'
feeds = [(u'Articles', u'http://feeds.gawker.com/gizmodo/vip?format=xml')]
def parse_feeds(self):
recipe_dir = os.path.join(config_dir,'recipes')
hash_dir = os.path.join(recipe_dir,'recipe_storage')
feed_dir = os.path.join(hash_dir,self.title.encode('utf-8').replace('/',':'))
if not os.path.isdir(feed_dir):
os.makedirs(feed_dir,mode=CONFIG_DIR_MODE)
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
feed_hash = urllib.quote(feed.title.encode('utf-8'),safe='')
feed_fn = os.path.join(feed_dir,feed_hash)
past_items = set()
if os.path.exists(feed_fn):
with file(feed_fn) as f:
for h in f:
past_items.add(h.strip())
cur_items = set()
for article in feed.articles[:]:
item_hash = md5()
if article.content: item_hash.update(article.content.encode('utf-8'))
if article.summary: item_hash.update(article.summary.encode('utf-8'))
item_hash = item_hash.hexdigest()
if article.url:
item_hash = article.url + ':' + item_hash
cur_items.add(item_hash)
if item_hash in past_items:
feed.articles.remove(article)
with file(feed_fn,'w') as f:
for h in cur_items:
f.write(h+'\n')
remove = [f for f in feeds if len(f) == 0 and
self.remove_empty_feeds]
for f in remove:
feeds.remove(f)
return feeds
conversion_options = {
'comment' : description
, 'tags' : category
, 'publisher' : publisher
, 'language' : language
}
remove_tags = [
{'class': 'feedflare'},
]
def preprocess_html(self, soup):
return self.adeify_images(soup)
|
|
|
|
|
|
#9 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Error for fetching news. | nick_martin | Calibre | 0 | 11-26-2010 01:52 AM |
| Fetching News has gone bad... | rogue_ronin | Calibre | 6 | 09-03-2010 08:41 AM |
| automating news fetching | zerozombie72 | Calibre | 6 | 02-16-2010 04:31 PM |
| Fetching News In Calibre | Rootman | Calibre | 2 | 11-11-2009 07:06 PM |
| Question about fetching the news | spoudaios | Sony Reader Dev Corner | 4 | 01-27-2008 05:01 PM |