Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Other formats > LRF

Notices

Reply
 
Thread Tools Search this Thread
Old 03-12-2008, 09:56 PM   #196
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Profile Request

Hi guys.

I'd like to get my local newspaper into the reader but couldn't do it. I used get_feeds to get articles from rss page but not much luck. I'm only getting the first article with tons of unnecessary pages (tried different patterns, couldn't clean the text).

If anybody has some time, please take a look at this one feed (http://feeds.contracostatimes.com/mn...571/200819.xml).

Thanks a lot in advance,
David
ddavtian is offline   Reply With Quote
Old 03-12-2008, 10:56 PM   #197
bobbyco57
Member
bobbyco57 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Dec 2007
Device: PRS-505
This is really great. I did have a problem however.

I first put the file on an sd card and was reading in "S" size mode. I pressed zoom as I have limited vision and always need to zoom. After several seconds of the processing arrow image on the screen, instead of coming back to Newsweek the reader did a reset.

I then 1. removed the file from the sd card and 2) moved the file from my library to the 505 main memory. The first time I opened the book, the 505 reset before getting to menu. I tried a second time and it acted as when on the SD card, I could navigate through the book on S, but the device reset when trying to redisplay after press zoom.
bobbyco57 is offline   Reply With Quote
Advert
Old 03-12-2008, 11:30 PM   #198
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yeah not much i can do about that, its a bug in SONY's reader software. What you can do is use the sony connect software to transfer the file to your reader, all three sizes will have been pre-calculated. And when I actually release the sofware you can specify the base-font-size to whatever you like, so that you dont have to resize.
kovidgoyal is offline   Reply With Quote
Old 03-14-2008, 01:01 AM   #199
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Quote:
Originally Posted by ddavtian View Post
Hi guys.

I'd like to get my local newspaper into the reader but couldn't do it. I used get_feeds to get articles from rss page but not much luck. I'm only getting the first article with tons of unnecessary pages (tried different patterns, couldn't clean the text).

If anybody has some time, please take a look at this one feed (http://feeds.contracostatimes.com/mn...571/200819.xml).

Thanks a lot in advance,
David
The attached script will download the "Most Viewed" feed. I have thus far been unable to capture more than the lead article from the other feeds. There is some subtle difference in them that is eluding me.

But in any event it shows you how to clean up the file so that you get rid of the extra garbage, including the embedded "Advertisement" block.
Attached Files
File Type: zip C_Costa.py.zip (1.1 KB, 273 views)
Deputy-Dawg is offline   Reply With Quote
Old 03-14-2008, 09:56 AM   #200
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Deputy-Dawg, thank you!

It's lots of cleaning, I couldn't get even small part of it. I have no idea what to do for only one article per section but this is already very good.

Thanks again for your help.
David
ddavtian is offline   Reply With Quote
Advert
Old 03-14-2008, 07:48 PM   #201
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Quote:
Originally Posted by ddavtian View Post
I have no idea what to do for only one article per section but this is already very good.
The only working section for Contra Costa Times is coming from "http://extras.mnginteractive.com/live/xsl/memv/xml/571_most_viewed_rss.xml". When I try to get feeds from newspaper's site (http://feeds.contracostatimes.com/mn...571/200819.xml for example), it brings the first article only.

I tried another site with the same feeds (http://rss.mnginteractive.com/live/C...CN_1916854.xml), not much luck here. Now it gets all the articles but only the summary.

All three sites forward to the main newspaper server for articles, but only the first one works correctly.
This is out of my league anyway.

Dawg, thanks again for your help.

Kovid, I moved from 0.4.38 to 0.4.42, fetching news has become mush faster. 30 minutes for NYTimes is down to few minutes. Same thing for other sources.
ddavtian is offline   Reply With Quote
Old 03-14-2008, 07:54 PM   #202
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Quote:
Originally Posted by ddavtian View Post
Deputy-Dawg, thank you!

It's lots of cleaning, I couldn't get even small part of it. I have no idea what to do for only one article per section but this is already very good.

Thanks again for your help.
David

David,
I think I have resolved the problem with capturing more than one article in a feed. The problem is that web2lrf sees pubdate as having a different format in the first article in the feed than the format of pubdate in all of the other articles. What it sees as the pubdate in the first article is:

Fri, 14 Mar 2008 23:22:24 MDT or Fri, 14 2008 23:22:24 -000

While in all of the articles it sees:

3/14/2008 01:37:26 AM GMT

There a couple of solutions (work arounds) each of which have advantages and gotchas.

The first, and easiest to implement is to simply set use_pubdate = 'False' which simply tells the program to ignore the embedded pubdate and use the current machine time as the pubdate. This will permit capturing all of the articles in a feed but you will have no record as to when it was published.

The second is to create pubdate_fmt which matches the format of articles two and up. Now all of the articles captured will have their appropriate pubdates with the penalty of not capturing the first article in the feed.

I have written a script and attached it to this message in which you can test and see the results of this rather odd situation. In C_Cost_2.py there are two lines of code you are interested in:

Code:
    ##pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z'
    use_pubdate = False
Configured as above it will ignore the embedded pubdate and capture all of the articles in the feed(s)

Code:
    ##pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z'
    ##use_pubdate = False
Configured this way it will only capture the first article in a feed.

Code:
    pubdate_fmt = '%m/%d/%Y %I:%M:%S %p %Z'
    ##use_pubdate = False
and configured this way it will capture all the files except the first file in a feed.

I really am not convinced that there are really two different pubdate formats in the feeds, but we are looking at some other artifact that is confusing the matter for web2lrf. Hopefully Kovid will chime in and tell me what is wrong with my analysis and suggest a much more elegant fix. At least I hope so. In the mean time here is a solution to your problem.
Attached Files
File Type: zip C_Costa_2.py.zip (1.2 KB, 270 views)

Last edited by Deputy-Dawg; 03-14-2008 at 07:56 PM. Reason: To mark code statements
Deputy-Dawg is offline   Reply With Quote
Old 03-14-2008, 08:35 PM   #203
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I would normally, but I'm hip deep in refactoring web2lrf, Hopefully, the new improved version will just automatically parse the date correctly.
kovidgoyal is offline   Reply With Quote
Old 03-14-2008, 10:30 PM   #204
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Thank you

Deputy-Dawg, thanks a lot.
It's perfectly fine to download all articles from the feed. I added my favorite sections and got the newspaper on reader in few minutes.

This is a great community.

David
ddavtian is offline   Reply With Quote
Old 03-14-2008, 10:40 PM   #205
Rick C
Seeker
Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.Rick C has a complete set of Star Wars action figures.
 
Rick C's Avatar
 
Posts: 53
Karma: 363
Join Date: Mar 2008
Location: Ontario, Canada
Device: Sony PRS-505
Has anybody written a tutorial for this web2lrf program? One that is geared towards those of us that are clumsy around console commands would be especially nice.
I haven't bothered much with RSS up until now, but what I would like to do is save text based websites with links and all into my PRS-505 so I can read them at my leisure. Is that something I can do with this program as well?
Rick C is offline   Reply With Quote
Old 03-15-2008, 12:16 PM   #206
Deputy-Dawg
Groupie
Deputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-booksDeputy-Dawg has learned how to read e-books
 
Deputy-Dawg's Avatar
 
Posts: 153
Karma: 799
Join Date: Dec 2007
Device: sony prs505
Quote:
Originally Posted by kovidgoyal View Post
I would normally, but I'm hip deep in refactoring web2lrf, Hopefully, the new improved version will just automatically parse the date correctly.
And it is difficult, at times, to remember that the task is to drain the swamp when you are up to your A** in alligators.

I genuinely appreciate the fact that you are re-doing web2lrf. But I am darn curious as to how it can see two different formats for pubdate in articles in the same feed. This is especially so since I spent most of last evening studying the News feed and for the life of me I can see no difference between the first article and the second. On the other hand I am just a neophyte in parsing RSS feeds. Is there a document that would give me a "road map"
Deputy-Dawg is offline   Reply With Quote
Old 03-15-2008, 12:48 PM   #207
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Looking at one of those feeds, it doesn't seem like the date formats are different. But the new infrastructure has code for auto-detecting date formats based on the RSS/ATOM specifications, so hopefully you wont need to specify a pubdate format anymore.
kovidgoyal is offline   Reply With Quote
Old 03-15-2008, 07:51 PM   #208
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Here's The Atlantic with some more refinements to the news fetching code. Again, comments are welcome.
Attached Files
File Type: lrf The Atlantic [March 2008].lrf (1.80 MB, 274 views)
kovidgoyal is offline   Reply With Quote
Old 03-15-2008, 10:19 PM   #209
ddavtian
Addict
ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.ddavtian has a complete set of Star Wars action figures.
 
Posts: 271
Karma: 332
Join Date: Nov 2003
Location: San Francisco, USA
Device: Sage, Elipsa, Oasis, Galaxy Tab 8U, S22U
Kovad, let me know if you need a tester for new web2lrf.

On a related note, Economist was working in 0.4.38, not in 0.4.42.
ddavtian is offline   Reply With Quote
Old 03-16-2008, 01:01 AM   #210
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Economist has been re-written for the new code.. It works now. I'll post links to beta builds here once the new code is ready.
kovidgoyal is offline   Reply With Quote
Reply

Tags
libprs500, web2lrf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
web2lrf to capture blog archive? Deputy-Dawg Sony Reader Dev Corner 1 02-14-2008 11:41 PM
web2lrf: La Repubblica alexxxm Sony Reader 1 11-13-2007 12:27 PM


All times are GMT -4. The time now is 07:32 AM.


MobileRead.com is a privately owned, operated and funded community.