03-12-2008, 09:56 PM | #196 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Profile Request
Hi guys.
I'd like to get my local newspaper into the reader but couldn't do it. I used get_feeds to get articles from rss page but not much luck. I'm only getting the first article with tons of unnecessary pages (tried different patterns, couldn't clean the text). If anybody has some time, please take a look at this one feed (http://feeds.contracostatimes.com/mn...571/200819.xml). Thanks a lot in advance, David |
03-12-2008, 10:56 PM | #197 |
Member
Posts: 13
Karma: 10
Join Date: Dec 2007
Device: PRS-505
|
This is really great. I did have a problem however.
I first put the file on an sd card and was reading in "S" size mode. I pressed zoom as I have limited vision and always need to zoom. After several seconds of the processing arrow image on the screen, instead of coming back to Newsweek the reader did a reset. I then 1. removed the file from the sd card and 2) moved the file from my library to the 505 main memory. The first time I opened the book, the 505 reset before getting to menu. I tried a second time and it acted as when on the SD card, I could navigate through the book on S, but the device reset when trying to redisplay after press zoom. |
Advert | |
|
03-12-2008, 11:30 PM | #198 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
yeah not much i can do about that, its a bug in SONY's reader software. What you can do is use the sony connect software to transfer the file to your reader, all three sizes will have been pre-calculated. And when I actually release the sofware you can specify the base-font-size to whatever you like, so that you dont have to resize.
|
03-14-2008, 01:01 AM | #199 | |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Quote:
But in any event it shows you how to clean up the file so that you get rid of the extra garbage, including the embedded "Advertisement" block. |
|
03-14-2008, 09:56 AM | #200 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Deputy-Dawg, thank you!
It's lots of cleaning, I couldn't get even small part of it. I have no idea what to do for only one article per section but this is already very good. Thanks again for your help. David |
Advert | |
|
03-14-2008, 07:48 PM | #201 | |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Quote:
I tried another site with the same feeds (http://rss.mnginteractive.com/live/C...CN_1916854.xml), not much luck here. Now it gets all the articles but only the summary. All three sites forward to the main newspaper server for articles, but only the first one works correctly. This is out of my league anyway. Dawg, thanks again for your help. Kovid, I moved from 0.4.38 to 0.4.42, fetching news has become mush faster. 30 minutes for NYTimes is down to few minutes. Same thing for other sources. |
|
03-14-2008, 07:54 PM | #202 | |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Quote:
David, I think I have resolved the problem with capturing more than one article in a feed. The problem is that web2lrf sees pubdate as having a different format in the first article in the feed than the format of pubdate in all of the other articles. What it sees as the pubdate in the first article is: Fri, 14 Mar 2008 23:22:24 MDT or Fri, 14 2008 23:22:24 -000 While in all of the articles it sees: 3/14/2008 01:37:26 AM GMT There a couple of solutions (work arounds) each of which have advantages and gotchas. The first, and easiest to implement is to simply set use_pubdate = 'False' which simply tells the program to ignore the embedded pubdate and use the current machine time as the pubdate. This will permit capturing all of the articles in a feed but you will have no record as to when it was published. The second is to create pubdate_fmt which matches the format of articles two and up. Now all of the articles captured will have their appropriate pubdates with the penalty of not capturing the first article in the feed. I have written a script and attached it to this message in which you can test and see the results of this rather odd situation. In C_Cost_2.py there are two lines of code you are interested in: Code:
##pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z' use_pubdate = False Code:
##pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z' ##use_pubdate = False Code:
pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z' ##use_pubdate = False I really am not convinced that there are really two different pubdate formats in the feeds, but we are looking at some other artifact that is confusing the matter for web2lrf. Hopefully Kovid will chime in and tell me what is wrong with my analysis and suggest a much more elegant fix. At least I hope so. In the mean time here is a solution to your problem. Last edited by Deputy-Dawg; 03-14-2008 at 07:56 PM. Reason: To mark code statements |
|
03-14-2008, 08:35 PM | #203 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I would normally, but I'm hip deep in refactoring web2lrf, Hopefully, the new improved version will just automatically parse the date correctly.
|
03-14-2008, 10:30 PM | #204 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Thank you
Deputy-Dawg, thanks a lot.
It's perfectly fine to download all articles from the feed. I added my favorite sections and got the newspaper on reader in few minutes. This is a great community. David |
03-14-2008, 10:40 PM | #205 |
Seeker
Posts: 53
Karma: 363
Join Date: Mar 2008
Location: Ontario, Canada
Device: Sony PRS-505
|
Has anybody written a tutorial for this web2lrf program? One that is geared towards those of us that are clumsy around console commands would be especially nice.
I haven't bothered much with RSS up until now, but what I would like to do is save text based websites with links and all into my PRS-505 so I can read them at my leisure. Is that something I can do with this program as well? |
03-15-2008, 12:16 PM | #206 | |
Groupie
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
|
Quote:
I genuinely appreciate the fact that you are re-doing web2lrf. But I am darn curious as to how it can see two different formats for pubdate in articles in the same feed. This is especially so since I spent most of last evening studying the News feed and for the life of me I can see no difference between the first article and the second. On the other hand I am just a neophyte in parsing RSS feeds. Is there a document that would give me a "road map" |
|
03-15-2008, 12:48 PM | #207 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Looking at one of those feeds, it doesn't seem like the date formats are different. But the new infrastructure has code for auto-detecting date formats based on the RSS/ATOM specifications, so hopefully you wont need to specify a pubdate format anymore.
|
03-15-2008, 07:51 PM | #208 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Here's The Atlantic with some more refinements to the news fetching code. Again, comments are welcome.
|
03-15-2008, 10:19 PM | #209 |
Addict
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
|
Kovad, let me know if you need a tester for new web2lrf.
On a related note, Economist was working in 0.4.38, not in 0.4.42. |
03-16-2008, 01:01 AM | #210 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Economist has been re-written for the new code.. It works now. I'll post links to beta builds here once the new code is ready.
|
Tags |
libprs500, web2lrf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
web2lrf to capture blog archive? | Deputy-Dawg | Sony Reader Dev Corner | 1 | 02-14-2008 11:41 PM |
web2lrf: La Repubblica | alexxxm | Sony Reader | 1 | 11-13-2007 12:27 PM |