Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-15-2010, 03:06 PM   #16
janvanmaar
Addict
janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.janvanmaar has a complete set of Star Wars action figures.
 
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
@Starson17: You could just look at the date of the last run and exclude all articles with date older than that - I assume at least that this was meant by 'date comparison'.
janvanmaar is offline   Reply With Quote
Old 11-15-2010, 04:14 PM   #17
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by janvanmaar View Post
@Starson17: You could just look at the date of the last run and exclude all articles with date older than that - I assume at least that this was meant by 'date comparison'.
Yes, but to do that you need to know the date of the last run, and that requires storing that date somewhere during the previous run and fetching it during the current run to do the comparison. That's exactly what my comment was about - passing data from one recipe run to the next one.
Starson17 is offline   Reply With Quote
Advert
Old 11-15-2010, 04:23 PM   #18
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I went back and looked at my original post. I suspect he was originally commenting on the statement that in my tests I stored article URLs for comparison. The reason I did it that way and not only by date comparison (I actually did it both ways) was to also solve the problem of duplicate articles seen in many recipes where the same article is often listed in several feeds. The feed on Energy and the feed on Politics might have the same article listed about the new Energy Bill. I was thinking about how to solve both problems simultaneously.
Starson17 is offline   Reply With Quote
Old 11-16-2010, 04:18 AM   #19
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Quote:
Originally Posted by Starson17 View Post
Yes, but to do that you need to know the date of the last run, and that requires storing that date somewhere during the previous run and fetching it during the current run to do the comparison. That's exactly what my comment was about - passing data from one recipe run to the next one.
My idea was to look for the latest file that was generated for the feed from calibre. Then you could look up the date at which this file was generated which you can retrieve from the filesystem. I have not really looked at it, but this way you could avoid storing data in a separate history file for the feed. I am sure there is some python command to retrieve the date of a file.


Of course, the duplicate article problem cannot be solved that way.

Last edited by oecherprinte; 11-16-2010 at 04:22 AM.
oecherprinte is offline   Reply With Quote
Old 11-16-2010, 09:16 AM   #20
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
My idea was to look for the latest file that was generated for the feed from calibre. Then you could look up the date at which this file was generated which you can retrieve from the filesystem. I have not really looked at it, but this way you could avoid storing data in a separate history file for the feed. I am sure there is some python command to retrieve the date of a file.
That will often work, but not always. If the file is being deleted when sent to the reader, there's no local file to get a date from. Another issue is that sometimes the articles in a feed have no date, so you can't compare article dates to the file date. Still another issue with dates is that the article date isn't necessarily tied to the date you parsed the feed. Articles with older dates are sometimes added to feeds.

Regardless of all that, I eventually decided I didn't like stripping articles, particularly since it had to be done at the individual recipe level. You'd either have two of every recipe, or some recipes would have this and some wouldn't. A feature like this should probably be implemented at a higher level. I was interested in how it could be done, but when I looked at it closely, it wasn't a feature I actually wanted.
Starson17 is offline   Reply With Quote
Advert
Old 11-17-2010, 03:14 AM   #21
oecherprinte
Zealot
oecherprinte began at the beginning.
 
Posts: 115
Karma: 20
Join Date: Jul 2010
Device: Kindle3 3G, Kindle Paperwhite 2
Quote:
Originally Posted by Starson17 View Post
That will often work, but not always. If the file is being deleted when sent to the reader, there's no local file to get a date from. Another issue is that sometimes the articles in a feed have no date, so you can't compare article dates to the file date. Still another issue with dates is that the article date isn't necessarily tied to the date you parsed the feed. Articles with older dates are sometimes added to feeds.
Yup. It will also be a problem with different time zones even if the article dates are correct.

Quote:
Originally Posted by Starson17 View Post
Regardless of all that, I eventually decided I didn't like stripping articles, particularly since it had to be done at the individual recipe level. You'd either have two of every recipe, or some recipes would have this and some wouldn't. A feature like this should probably be implemented at a higher level. I was interested in how it could be done, but when I looked at it closely, it wasn't a feature I actually wanted.
It would be nice if this would be supported by calibre itself and just click a button in the GUI. I kind of solved this problem by creating a popup window in my recipe which prompts the user to input how many feeds should be included. In my specific case (for legal decisions), the articles are grouped in dated feeds and so I have to select the number of feeds.
oecherprinte is offline   Reply With Quote
Old 11-17-2010, 06:52 AM   #22
obiwan
Junior Member
obiwan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: kindle
I have calibre running on my server to get news every day and in works great.

As others have pointed out, it would be great to receive only feeds with new content. Could this be simply solved by hashing downloaded rss, comparing hash to a stored hash and only download and send rss to kindle if a change is detected?

This sounds easy and would have great impact.
obiwan is offline   Reply With Quote
Old 11-17-2010, 11:25 AM   #23
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by obiwan View Post
Could this be simply solved by hashing downloaded rss, comparing hash to a stored hash and only download and send rss to kindle if a change is detected?
No, you can't do it this way. Recipes vary, but most have a date or something else that changes and would change the hash. There are other ways to do it, such as date comparison and article URL history. The problem is mostly finding a developer who wants this enough to do the work.
Starson17 is offline   Reply With Quote
Old 11-17-2010, 02:54 PM   #24
obiwan
Junior Member
obiwan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: kindle
Ah well... I'm Java / c# developer. Out of curiosity I downloaded sources and it seems python is... well... different 8~(

Anyway, I see fetching is done in web/fetcher. Could you point me in the direction where this feature should be implemented?

I'm not promising anything, but I am a dev who really wants this feature :P
obiwan is offline   Reply With Quote
Old 11-17-2010, 03:39 PM   #25
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by obiwan View Post
Ah well... I'm Java / c# developer. Out of curiosity I downloaded sources and it seems python is... well... different 8~(

Anyway, I see fetching is done in web/fetcher. Could you point me in the direction where this feature should be implemented?

I'm not promising anything, but I am a dev who really wants this feature :P
calibre/web/feeds/news.py is where I'd probably start.
Perhaps calibre/web/feeds/feedparser.py as well. One of your first questions will be where to store info for whatever kind of comparison you want to do.

This is how I stored the last time a recipe of "recipe_name" ran and the URL of an article retrieved on that run:

Code:
        url = last_downloaded_article_url
        now = datetime.datetime.now()
        dynamic['recipe_name']['last_time'] = now
        dynamic['recipe_name']['last_url'] = url
You retrieve it with:
Code:
        last_time_this_recipe_ran = dynamic['recipe_name']['last_time']
Have fun.
Edit:
You may need to import pickle and open/load dynamic.pickle, which is where this sort of recipe related history seems to be kept.

Last edited by Starson17; 11-17-2010 at 04:09 PM.
Starson17 is offline   Reply With Quote
Old 11-17-2010, 06:06 PM   #26
obiwan
Junior Member
obiwan began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2010
Device: kindle
Ok, this doesn't seem to be so hard, at least to some degree

I added unique_id to Article class:

def unique_id(self):
md5=hashlib.md5()
md5.update(self.id)
md5.update(self.title)
md5.update(self.url)
return md5.hexdigest()

Then, in news.py after line 920 I added:
if a==0:
last_article=dynamic['recipe_'+self.title+'last_article']
if last_article is not None:
#print last_article
if last_article==article.unique_id():
print " Nothing to do"
raise ValueError('No articles found, aborting')
dynamic['recipe_'+self.title+'last_article']=article.unique_id()

The good: yeey, if the last article is the last article we downloaded processing is aborted

The bad: I see no other (easy) option than throwing an exception, which of course propagates to UI. Any idea where / how to silently handle it?
obiwan is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to transfer only previously unretrieved RSS posts? lymeswold Recipes 10 10-14-2010 07:37 PM
Is there a good way to convert partial rss to full rss feeds. Zorz Other formats 5 05-29-2010 12:17 PM
RSS feeds peejay PocketBook 2 04-26-2010 05:16 AM
RSS feeds ichor iRex 1 03-01-2008 11:30 PM


All times are GMT -4. The time now is 06:38 PM.


MobileRead.com is a privately owned, operated and funded community.