The Daily iLiadian

scotty1024 · 09-29-2006, 06:53 PM

With some issues we have iRex standing firmly on our throats holding us down.

But with other issues we can take matters into our own hands, sources or no sources, SDK or no SDK.

I'm attaching a sneak peak at what my new command line tool is presently capable of doing. Yes, its pretty lame, but I'm just one break through away from it looking seriously better!

I could have waited on that break through, but I figured everyone could use a ray of hope about now, even if its presently a lame looking ray of hope. For sure we aren't getting any rays, lame or otherwise, out of iRex.

So I call it "The Daily iLiadian". It pulls an RSS XML file, parses it into items. It then builds a directory for each item in the output directory and begins work on the RSS link for that item.

It pulls down the HTML from the rss link and parses it. Each IMG tag is processed to copy down the image file and place it into the item directory with a re-written name (so there won't be any naming conflicts in the images). It then re-writes the IMG tag(s) to point to the new local copy and writes out the HTML as index.html in the item's directory.

After all items are processed a main index.html with the RSS contents and re-written links to the local items is written along with a manifest.xml to wrap it all up nicely for the iLiad.

I've still got some massaging to do on the HTML so many of the items have HTML issues that keep them from showing properly. But the Boston Globe item is mostly viewable.

Enjoy and everyone have a happier weekend (I hope).

Riocaz · 09-30-2006, 05:30 AM

Scotty, that looks really quite good.

emkay · 09-30-2006, 06:17 AM

Great work Scotty,
I will try to check that out this weekend...

jæd · 09-30-2006, 08:26 AM

I've been thinking along the same ideas...

See the attached tar ball... Its basically the BBC rss feed parsed and then downloading the low-bandwdith version of each page. I'll play with more later, but this is my proof-of-concept, daily paper version...

Oh, if you haven't paid your BBC license fee, please don't download this. And if the supplied html bricks your Illiad, then please contact Irex, and not me...!

http://208.254.38.124/pub/bbc_news.tar.gz

scotty1024 · 09-30-2006, 05:46 PM

@jaed

Good work! Yours is light weight but the iLiad does have a pretty hefty browser, it can handle more than mobile sized articles...

Plus I like more color, err gray tones.

I did find a big issue, forgot the LINK tags... I also coalesced the images and links so they take up less space when processing the RSS for a paper site. So here is The Daily iLiadian version of the BBC. (and yes, I'm working on all those extra >'s...)

It's a bit larger but the BBC RSS articles have more pictures than the mobile feed version.

deadite66 · 09-30-2006, 06:22 PM

i had a go at converting the rss via the print version on bbc
http://ghostpilot.org/share/bbc_uk_rss.pdf

scotty1024 · 09-30-2006, 11:43 PM

Quote:

Originally Posted by deadite66

i had a go at converting the rss via the print version on bbc
http://ghostpilot.org/share/bbc_uk_rss.pdf

No pictures? No snazzy banner graphics? No RSS index page?

I guess I've gotten addicted to the sound bite and making a go/no-go on whether I want to read the article at all rather than fast-forward through it.

It's been fun seeing what others think an iLiad newspaper should look like, thank you for sharing!

I can see I'll need a few command line switches I hadn't been planning on...

I've nailed the extra >'s and I'm on to sucking up the "Page M of N" links that the NY Times , and other sites, use. I really dislike it when I decide to read an article and you find it's the first page of 5 and you only have the first page.

I've also been manually building RSS files to allow me to grab up things like bus schedules and weather to bring along with me. This is turning into a very handy tool.

Tommy · 01-21-2007, 07:08 AM

Hi all,

it seems as if there is some interest to retrieve RSS feeds and read them on the iliad... It might be that I have the hack that makes this possible:
It's a perl script I named getfeed.pl. In order to run it, it might that one or two perl-packages need to be installed beforehand:
1. XML-XPath-1.13 (can be obtained at CPAN's)
2. LWP::Simple (came along with my perl distro [SUSE 10.1])

If you're a Penguin you can run it immediately from the console:

Code:

getfeed.pl -f http://feeds.feedburner.com/spaceheadlines -o myFeed.html

The above will fetch the given feed and produce an HTML file that you can copy to your iliad and read.

Please note, this hack, will only display the text of the feed, i.e. there won't be any images, neither will the "full article" be downloaded and incorporated!

So, essentially that's all... however, convenient usage looks differently

Therefore I made the script look for a config file (.getfeedrc) in the user's home directory and if present, read and parse it. Thus, setting up this config file accordingly will enable you to just enter

Code:

 getfeed.pl

and everything runs automatically.
I also attached my personal .getfeedrc. Having a look at it might help setting up your own one.
The "syntax" is pretty shellish, i.e. a '#' introduces a comment, so everything to the left of it will be ignored.

For those of you who know that LaTeX is not only the stuff from which medical gloves are made but the most powerful typesetting program out there, will find the possibility to create LaTeX files including a given style file quite handy:
If you enter

Code:

 getfeed.pl -F tex -C pdflatex -interaction=nonstopmode -o myFeeds.tex -S iliad.sty -o myFeeds.tex

on the command line (or uncomment the respective lines in the cofig file attached), a LaTeX file (myFeeds.tex) will be created and pdflatex will be called to create the PDF file(myFeeds.pdf). If you use stylefile iliad.sty I attached to this post, you'll be able to read the PDF without further zooming or such.

I hope one or the other of you out there will find it useful :-)

Best regards,
Tommy

jæd · 01-21-2007, 07:16 AM

Quote:

Originally Posted by Tommy

it seems as if there is some interest to retrieve RSS feeds and read them on the iliad...

Sounds cool... Will check it out when I get a second... My current best stab at Rss->Feeds (and a perl tool to do the same with html) is here ...

Tommy · 01-21-2007, 02:21 PM

Quote:

Originally Posted by jæd

... My current best stab at Rss->Feeds (and a perl tool to do the same with html) is here ...

pretty cool, your hack downloads the pages behind the feeds, mine does no longer. I removed this feature, as I failed to nicely de-htmlise the pages. I could strip off the tags, but I could find a means to extract the "content of the article", and so all the nav-bars, ads etc were still present. The output - especially the LaTeX - was just ugly

But if there's someone interested in this feature, just let me know...

b_k · 01-21-2007, 03:01 PM

if you would have the option to read a settings file, you could read in a RegEx defining the begin and end of the content.

For example, the newsticker for heise.de would be easy, as they have <HEISETEXT> </HEISETEXT> around the article.

Tommy · 01-21-2007, 05:08 PM

Quote:

Originally Posted by b_k

the newsticker for heise.de would be easy, as they have <HEISETEXT> </HEISETEXT> around the article.

That's true... for heise, but tagesschaue.de for example does not have something easy to parse for (and in particular not <HEISETEXT> :-)

jæd · 01-21-2007, 05:16 PM

Quote:

Originally Posted by Tommy

pretty cool, your hack downloads the pages behind the feeds, mine does no longer. I removed this feature, as I failed to nicely de-htmlise the pages. I could strip off the tags, but I could find a means to extract the "content of the article", and so all the nav-bars, ads etc were still present. The output - especially the LaTeX - was just ugly

But if there's someone interested in this feature, just let me know...

Well... Thats the reason why I avoided Latex as the intermediate file. I use htmldoc to produce a temporary pdf and then glue the pdfs together and add links to each page. Its an evolution of my perl script which does the same thing, but outputs in html.

When I get a moment I'll post up the php file that does this...

b_k · 01-22-2007, 10:45 AM

Quote:

Originally Posted by Tommy

That's true... for heise, but tagesschaue.de for example does not have something easy to parse for (and in particular not <HEISETEXT> :-)

well, not clean text, but look what is in a tagesschau.de html between "<div class="contModule conttext article">" and "<div class="standDatum">Stand: DD.MM.YYYY HH:MM Uhr</div>"

Tommy · 01-22-2007, 04:05 PM

well, yeah, sounds feasible... this will need some testing whether all those pages have some pattern (that users will have to specify per feed) that can be used to pull the news out of the HTML.
Thanks for the hint!!!
But then - depending on the number of feeds digested - the document will become somewhat large and a 45 min ride on public transport might become too short a time to read that all

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Classic The Daily	hjordanh	Barnes & Noble NOOK	3	02-05-2010 10:48 AM
Daily notifications?	devilsadvocate	Feedback	8	01-22-2010 12:24 PM
Daily Dilbert	billbadger	Calibre	2	12-09-2009 02:42 PM
Daily Comics	billbadger	Calibre	0	12-08-2009 07:22 PM
Amazon Daily	daffy4u	Amazon Kindle	13	06-04-2008 07:07 PM

09-30-2006, 05:30 AM	#2
Riocaz Fulfilled but not by iRex Posts: 932 Karma: 286846 Join Date: May 2006 Location: London Device: Far too many	Scotty, that looks really quite good.

09-30-2006, 06:17 AM	#3
emkay Zealot Posts: 103 Karma: 11 Join Date: Jul 2006	Great work Scotty, I will try to check that out this weekend...

09-30-2006, 08:26 AM	#4
jæd Evangelist Posts: 458 Karma: 293 Join Date: May 2006	I've been thinking along the same ideas... See the attached tar ball... Its basically the BBC rss feed parsed and then downloading the low-bandwdith version of each page. I'll play with more later, but this is my proof-of-concept, daily paper version... Oh, if you haven't paid your BBC license fee, please don't download this. And if the supplied html bricks your Illiad, then please contact Irex, and not me...! http://208.254.38.124/pub/bbc_news.tar.gz

09-30-2006, 06:22 PM	#6
deadite66 Groupie Posts: 197 Karma: 16 Join Date: Apr 2006 Device: irex iliad, uk Kindle gen3	i had a go at converting the rss via the print version on bbc http://ghostpilot.org/share/bbc_uk_rss.pdf

01-21-2007, 03:01 PM	#11
b_k Übernerd Posts: 238 Karma: 74 Join Date: Jun 2006 Location: Germany Device: iRex iLiad	if you would have the option to read a settings file, you could read in a RegEx defining the begin and end of the content. For example, the newsticker for heise.de would be easy, as they have <HEISETEXT> </HEISETEXT> around the article.

01-22-2007, 04:05 PM	#15
Tommy Enthusiast Posts: 32 Karma: 10 Join Date: Oct 2006 Location: Germany Device: Iliad, Sony 505	well, yeah, sounds feasible... this will need some testing whether all those pages have some pattern (that users will have to specify per feed) that can be used to pull the news out of the HTML. Thanks for the hint!!! But then - depending on the number of feeds digested - the document will become somewhat large and a 45 min ride on public transport might become too short a time to read that all

Advert

Advert