View Single Post
Old 11-01-2006, 07:37 PM   #1
heavyB
Member
heavyB began at the beginning.
 
Posts: 23
Karma: 47
Join Date: Oct 2006
Device: Sony Reader/Treo 600
Full Newspaper - The Christain Science Monitor

I love rss2book that geekraver put together. But I kept longing to read the whole article, not just RSS feeds. The idea of downloading a full newspaper and reading over a cup of coffee is the ultimate Sunday morning for me. I've found a way to download one of my favorite papers online, highly readable, with a table of contents and ready in 2 minutes.

After searching for RSS feeds that delivered full news articles (I found none) I found one of my favorite newspapers offers a "text version" of their site. This is my round about way to get the whole content of the online version of The Christian Science Monitor on my Sony Reader. Feel free to offer suggestions, I am by no means a programmer, just a hack. Of course this is all for naught if everyone out there but me knows of full news text sites in RSS

I'm using Windows XP, Firefox (I'm using 2.0), HTMLdoc (see this post) and TextPad (free @ http://www.textpad.com )

Download the attached "getcs.bat" file and place it in your directory where you have HTMLdoc installed (usually c:\program files\HTMLdoc).

Open Firefox and browse to: http://www.csmonitor.com/cgi-bin/red...pl?textEdition

Right click over the newly loaded page and select "View Page Info" This will popup a dialog box with tabs along the top, select the "Links" tab. This displays all available links of this page. Drag this window a bit bigger so you can see what you've got going on in here. You'll notice where the category links end and the stories begin. The articles all have a year in the URL like this: http://www.csmonitor.com/2006/1102/p13s01-lign.htm . I select the first link by clicking on it, then scroll to the end of the article link list and while holding the [Shift] key on my keyboard, I click the last article. You should have all links that are articles selected. Right click on this selection and left click on "Copy".

Open the "getcs.bat' file you downloaded from the link below with textPad. You'll see "[PASTE CSMONITOR LINKS HERE]" in the text. delete this, leaving the space after the text "http://www.csmonitor.com/cgi-bin/redirect.pl?textEdition". Here is where we paste the links from Firefox by right clicking in textPad and left clicking "Paste".

Right click again and select "reformat" This is important, it strips the return characters from the firefox link paste. (you may need to have wordwrap on in textPad to see what you're doing, which is fine, wordwrap has no effect on the saved file)

Save this edited file by clicking "File" and "save" from the top left of the textPad. If you've ever seen the config file for rss2book, you'll see here I borrow heavily from Geekraver for my HTMLdoc settings.

To run, double click "getcs.bat". A quick warning regarding .bat files by the way. Malicious folks can put nasty thing in these files and you should never run one without viewing it first. You can see in this .bat file, the only file being run is htmlDoc. It should create a 'csmonitor.pdf' file in same directory your htmlDoc is installed in.

That should do it. Excuse me if I rambled, was too simplistic or not explanitive enough.

I've attached the csget.bat file, a csgetSample.bat you can copy and run right away, and a sample csmonitor.pdf.
Attached Files
File Type: bat getcs.bat (308 Bytes, 360 views)
File Type: pdf csmonitor.pdf (959.3 KB, 2076 views)
File Type: bat getcsSample.bat (3.4 KB, 389 views)
heavyB is offline   Reply With Quote