Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2006, 07:37 PM   #1
heavyB
Member
heavyB began at the beginning.
 
Posts: 23
Karma: 47
Join Date: Oct 2006
Device: Sony Reader/Treo 600
Full Newspaper - The Christain Science Monitor

I love rss2book that geekraver put together. But I kept longing to read the whole article, not just RSS feeds. The idea of downloading a full newspaper and reading over a cup of coffee is the ultimate Sunday morning for me. I've found a way to download one of my favorite papers online, highly readable, with a table of contents and ready in 2 minutes.

After searching for RSS feeds that delivered full news articles (I found none) I found one of my favorite newspapers offers a "text version" of their site. This is my round about way to get the whole content of the online version of The Christian Science Monitor on my Sony Reader. Feel free to offer suggestions, I am by no means a programmer, just a hack. Of course this is all for naught if everyone out there but me knows of full news text sites in RSS

I'm using Windows XP, Firefox (I'm using 2.0), HTMLdoc (see this post) and TextPad (free @ http://www.textpad.com )

Download the attached "getcs.bat" file and place it in your directory where you have HTMLdoc installed (usually c:\program files\HTMLdoc).

Open Firefox and browse to: http://www.csmonitor.com/cgi-bin/red...pl?textEdition

Right click over the newly loaded page and select "View Page Info" This will popup a dialog box with tabs along the top, select the "Links" tab. This displays all available links of this page. Drag this window a bit bigger so you can see what you've got going on in here. You'll notice where the category links end and the stories begin. The articles all have a year in the URL like this: http://www.csmonitor.com/2006/1102/p13s01-lign.htm . I select the first link by clicking on it, then scroll to the end of the article link list and while holding the [Shift] key on my keyboard, I click the last article. You should have all links that are articles selected. Right click on this selection and left click on "Copy".

Open the "getcs.bat' file you downloaded from the link below with textPad. You'll see "[PASTE CSMONITOR LINKS HERE]" in the text. delete this, leaving the space after the text "http://www.csmonitor.com/cgi-bin/redirect.pl?textEdition". Here is where we paste the links from Firefox by right clicking in textPad and left clicking "Paste".

Right click again and select "reformat" This is important, it strips the return characters from the firefox link paste. (you may need to have wordwrap on in textPad to see what you're doing, which is fine, wordwrap has no effect on the saved file)

Save this edited file by clicking "File" and "save" from the top left of the textPad. If you've ever seen the config file for rss2book, you'll see here I borrow heavily from Geekraver for my HTMLdoc settings.

To run, double click "getcs.bat". A quick warning regarding .bat files by the way. Malicious folks can put nasty thing in these files and you should never run one without viewing it first. You can see in this .bat file, the only file being run is htmlDoc. It should create a 'csmonitor.pdf' file in same directory your htmlDoc is installed in.

That should do it. Excuse me if I rambled, was too simplistic or not explanitive enough.

I've attached the csget.bat file, a csgetSample.bat you can copy and run right away, and a sample csmonitor.pdf.
Attached Files
File Type: bat getcs.bat (308 Bytes, 328 views)
File Type: pdf csmonitor.pdf (959.3 KB, 2028 views)
File Type: bat getcsSample.bat (3.4 KB, 347 views)
heavyB is offline   Reply With Quote
Old 11-01-2006, 08:21 PM   #2
geekraver
Addict
geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.
 
Posts: 364
Karma: 1035291
Join Date: Jul 2006
Location: Redmond, WA
Device: iPad Mini,Kindle Paperwhite
Actually I've been thinking about doing something vaguely like this. I recently saw a set of scripts someone put together to pull wikipedia content together (kind of like web spidering starting with a small set of articles) and putting this on an iPod. I thought that would be cool to do, and ultimately should be generalized. One approach wrt RSS would be to flag each feed as being complete or partial, and in the case of partial ones following the links to the full text. This might require some addtional configuration to know how to extract the signal from the noise of the full pages but in many cases this could be done just scanning for appropriate DIV tag class attributes.
geekraver is offline   Reply With Quote
Advert
Old 11-01-2006, 11:31 PM   #3
neilm2
Enthusiast
neilm2 began at the beginning.
 
Posts: 35
Karma: 12
Join Date: Oct 2006
Device: Amazon Kindle, Sony Reader
Smile

You rock, Heavy B! Now I'm searching around for other newspapers that offer text-only versions that work with this .bat file.
neilm2 is offline   Reply With Quote
Old 11-01-2006, 11:37 PM   #4
neilm2
Enthusiast
neilm2 began at the beginning.
 
Posts: 35
Karma: 12
Join Date: Oct 2006
Device: Amazon Kindle, Sony Reader
BBC News works pretty well...
http://news.bbc.co.uk/2/low/default.stm
neilm2 is offline   Reply With Quote
Old 11-02-2006, 12:47 AM   #5
heavyB
Member
heavyB began at the beginning.
 
Posts: 23
Karma: 47
Join Date: Oct 2006
Device: Sony Reader/Treo 600
Thanks Neilm2 and nice find on that BBC link. They're few and far between. The New York Times has a nice print only format, but only after loading the full story page (no index).

Geekraver, I fully agree with what you're saying here. Basically a scraper app or scraping service with individual profiles for different web sites. Heh, an online scraping service wouldn't last long, but if we had a combo app that offered updated profiles of web sites via online service that would work with an app like your rss2book (scrape2book?) there wouldn't be much trouble with getting shutdown by the ad mongers (the real reason sites don't serve full text RSS or offer text only services)

I'm not too shabby at Web app dev (mostly CFML) and parsing, but I have little to no stand alone app dev experience. This of course should be moved to the dev subcategory in the forum. I'd be interested in discussing it futher.
heavyB is offline   Reply With Quote
Advert
Old 11-02-2006, 03:20 AM   #6
geekraver
Addict
geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.
 
Posts: 364
Karma: 1035291
Join Date: Jul 2006
Location: Redmond, WA
Device: iPad Mini,Kindle Paperwhite
Okay, I'm about to post an updated version of rss2book. It certainly doesn't do everything but it has enough added functionality that you can make a nice PDF of BBC news.
geekraver is offline   Reply With Quote
Old 11-02-2006, 11:21 AM   #7
neilm2
Enthusiast
neilm2 began at the beginning.
 
Posts: 35
Karma: 12
Join Date: Oct 2006
Device: Amazon Kindle, Sony Reader
That's great, Geekraver! I'm looking forward to it.
neilm2 is offline   Reply With Quote
Old 11-04-2006, 03:52 AM   #8
geekraver
Addict
geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.geekraver ought to be getting tired of karma fortunes by now.
 
Posts: 364
Karma: 1035291
Join Date: Jul 2006
Location: Redmond, WA
Device: iPad Mini,Kindle Paperwhite
It's not the full paper; just the main world news, but import the XML file below into rss2book release 7 and you're on your way!

Attached Files
File Type: xml Christian Science Monitor World.xml (467 Bytes, 591 views)
geekraver is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Seriously thoughtful When science fiction meets science fact pilotbob Lounge 51 04-25-2009 03:30 PM
Christian Science Monitor labels Kindle a ‘Trojan horse’ dreams News 72 03-22-2009 03:24 PM
Christian Science Monitor has article about e-books Liviu_5 News 0 10-20-2007 10:29 PM
Soft on the Science - Science Fiction Domokos Reading Recommendations 0 01-29-2006 09:18 PM


All times are GMT -4. The time now is 12:23 PM.


MobileRead.com is a privately owned, operated and funded community.