Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > More E-Book Readers > iRex

Notices

Reply
 
Thread Tools Search this Thread
Old 09-29-2006, 06:53 PM   #1
scotty1024
Banned
scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.
 
Posts: 1,300
Karma: 1479
Join Date: Jul 2006
Location: Peoples Republic of Washington
Device: Reader / iPhone / Librie / Kindle
The Daily iLiadian

With some issues we have iRex standing firmly on our throats holding us down.

But with other issues we can take matters into our own hands, sources or no sources, SDK or no SDK.

I'm attaching a sneak peak at what my new command line tool is presently capable of doing. Yes, its pretty lame, but I'm just one break through away from it looking seriously better!

I could have waited on that break through, but I figured everyone could use a ray of hope about now, even if its presently a lame looking ray of hope. For sure we aren't getting any rays, lame or otherwise, out of iRex.

So I call it "The Daily iLiadian". It pulls an RSS XML file, parses it into items. It then builds a directory for each item in the output directory and begins work on the RSS link for that item.

It pulls down the HTML from the rss link and parses it. Each IMG tag is processed to copy down the image file and place it into the item directory with a re-written name (so there won't be any naming conflicts in the images). It then re-writes the IMG tag(s) to point to the new local copy and writes out the HTML as index.html in the item's directory.

After all items are processed a main index.html with the RSS contents and re-written links to the local items is written along with a manifest.xml to wrap it all up nicely for the iLiad.

I've still got some massaging to do on the HTML so many of the items have HTML issues that keep them from showing properly. But the Boston Globe item is mostly viewable.

Enjoy and everyone have a happier weekend (I hope).
Attached Files
File Type: zip TheDailyiLiadian.zip (409.8 KB, 647 views)
scotty1024 is offline   Reply With Quote
Old 09-30-2006, 05:30 AM   #2
Riocaz
Fulfilled but not by iRex
Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.Riocaz ought to be getting tired of karma fortunes by now.
 
Posts: 932
Karma: 286846
Join Date: May 2006
Location: London
Device: Far too many
Scotty, that looks really quite good.
Riocaz is offline   Reply With Quote
Advert
Old 09-30-2006, 06:17 AM   #3
emkay
Zealot
emkay began at the beginning.
 
Posts: 103
Karma: 11
Join Date: Jul 2006
Great work Scotty,
I will try to check that out this weekend...
emkay is offline   Reply With Quote
Old 09-30-2006, 08:26 AM   #4
jęd
Evangelist
jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.
 
Posts: 458
Karma: 293
Join Date: May 2006
I've been thinking along the same ideas...

See the attached tar ball... Its basically the BBC rss feed parsed and then downloading the low-bandwdith version of each page. I'll play with more later, but this is my proof-of-concept, daily paper version...

Oh, if you haven't paid your BBC license fee, please don't download this. And if the supplied html bricks your Illiad, then please contact Irex, and not me...!

http://208.254.38.124/pub/bbc_news.tar.gz
jęd is offline   Reply With Quote
Old 09-30-2006, 05:46 PM   #5
scotty1024
Banned
scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.
 
Posts: 1,300
Karma: 1479
Join Date: Jul 2006
Location: Peoples Republic of Washington
Device: Reader / iPhone / Librie / Kindle
@jaed

Good work! Yours is light weight but the iLiad does have a pretty hefty browser, it can handle more than mobile sized articles...

Plus I like more color, err gray tones.

I did find a big issue, forgot the LINK tags... I also coalesced the images and links so they take up less space when processing the RSS for a paper site. So here is The Daily iLiadian version of the BBC. (and yes, I'm working on all those extra >'s...)

It's a bit larger but the BBC RSS articles have more pictures than the mobile feed version.
Attached Files
File Type: zip bbc.zip (1,001.9 KB, 510 views)
scotty1024 is offline   Reply With Quote
Advert
Old 09-30-2006, 06:22 PM   #6
deadite66
Groupie
deadite66 began at the beginning.
 
deadite66's Avatar
 
Posts: 197
Karma: 16
Join Date: Apr 2006
Device: irex iliad, uk Kindle gen3
i had a go at converting the rss via the print version on bbc
http://ghostpilot.org/share/bbc_uk_rss.pdf
deadite66 is offline   Reply With Quote
Old 09-30-2006, 11:43 PM   #7
scotty1024
Banned
scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.scotty1024 is no ebook tyro.
 
Posts: 1,300
Karma: 1479
Join Date: Jul 2006
Location: Peoples Republic of Washington
Device: Reader / iPhone / Librie / Kindle
Quote:
Originally Posted by deadite66
i had a go at converting the rss via the print version on bbc
http://ghostpilot.org/share/bbc_uk_rss.pdf
No pictures? No snazzy banner graphics? No RSS index page?

I guess I've gotten addicted to the sound bite and making a go/no-go on whether I want to read the article at all rather than fast-forward through it.

It's been fun seeing what others think an iLiad newspaper should look like, thank you for sharing!

I can see I'll need a few command line switches I hadn't been planning on...

I've nailed the extra >'s and I'm on to sucking up the "Page M of N" links that the NY Times , and other sites, use. I really dislike it when I decide to read an article and you find it's the first page of 5 and you only have the first page.

I've also been manually building RSS files to allow me to grab up things like bus schedules and weather to bring along with me. This is turning into a very handy tool.
scotty1024 is offline   Reply With Quote
Old 01-21-2007, 07:08 AM   #8
Tommy
Enthusiast
Tommy began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
RSS feeds on your iliad

Hi all,

it seems as if there is some interest to retrieve RSS feeds and read them on the iliad... It might be that I have the hack that makes this possible:
It's a perl script I named getfeed.pl. In order to run it, it might that one or two perl-packages need to be installed beforehand:
1. XML-XPath-1.13 (can be obtained at CPAN's)
2. LWP::Simple (came along with my perl distro [SUSE 10.1])

If you're a Penguin you can run it immediately from the console:

Code:
getfeed.pl -f http://feeds.feedburner.com/spaceheadlines -o myFeed.html
The above will fetch the given feed and produce an HTML file that you can copy to your iliad and read.

Please note, this hack, will only display the text of the feed, i.e. there won't be any images, neither will the "full article" be downloaded and incorporated!

So, essentially that's all... however, convenient usage looks differently
Therefore I made the script look for a config file (.getfeedrc) in the user's home directory and if present, read and parse it. Thus, setting up this config file accordingly will enable you to just enter
Code:
 getfeed.pl
and everything runs automatically.
I also attached my personal .getfeedrc. Having a look at it might help setting up your own one.
The "syntax" is pretty shellish, i.e. a '#' introduces a comment, so everything to the left of it will be ignored.

For those of you who know that LaTeX is not only the stuff from which medical gloves are made but the most powerful typesetting program out there, will find the possibility to create LaTeX files including a given style file quite handy:
If you enter
Code:
 getfeed.pl -F tex -C pdflatex -interaction=nonstopmode -o myFeeds.tex -S iliad.sty -o myFeeds.tex
on the command line (or uncomment the respective lines in the cofig file attached), a LaTeX file (myFeeds.tex) will be created and pdflatex will be called to create the PDF file(myFeeds.pdf). If you use stylefile iliad.sty I attached to this post, you'll be able to read the PDF without further zooming or such.

I hope one or the other of you out there will find it useful :-)

Best regards,
Tommy
Attached Files
File Type: pl getfeed.pl (14.5 KB, 492 views)
File Type: zip .getfeedrc.zip (572 Bytes, 475 views)
File Type: zip iliad.sty.zip (392 Bytes, 450 views)
Tommy is offline   Reply With Quote
Old 01-21-2007, 07:16 AM   #9
jęd
Evangelist
jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.
 
Posts: 458
Karma: 293
Join Date: May 2006
Quote:
Originally Posted by Tommy
it seems as if there is some interest to retrieve RSS feeds and read them on the iliad...
Sounds cool... Will check it out when I get a second... My current best stab at Rss->Feeds (and a perl tool to do the same with html) is here ...
jęd is offline   Reply With Quote
Old 01-21-2007, 02:21 PM   #10
Tommy
Enthusiast
Tommy began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
Quote:
Originally Posted by jęd
... My current best stab at Rss->Feeds (and a perl tool to do the same with html) is here ...
pretty cool, your hack downloads the pages behind the feeds, mine does no longer. I removed this feature, as I failed to nicely de-htmlise the pages. I could strip off the tags, but I could find a means to extract the "content of the article", and so all the nav-bars, ads etc were still present. The output - especially the LaTeX - was just ugly
But if there's someone interested in this feature, just let me know...
Tommy is offline   Reply With Quote
Old 01-21-2007, 03:01 PM   #11
b_k
Übernerd
b_k is on a distinguished road
 
Posts: 238
Karma: 74
Join Date: Jun 2006
Location: Germany
Device: iRex iLiad
if you would have the option to read a settings file, you could read in a RegEx defining the begin and end of the content.

For example, the newsticker for heise.de would be easy, as they have <HEISETEXT> </HEISETEXT> around the article.
b_k is offline   Reply With Quote
Old 01-21-2007, 05:08 PM   #12
Tommy
Enthusiast
Tommy began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
Quote:
Originally Posted by b_k
the newsticker for heise.de would be easy, as they have <HEISETEXT> </HEISETEXT> around the article.
That's true... for heise, but tagesschaue.de for example does not have something easy to parse for (and in particular not <HEISETEXT> :-)
Tommy is offline   Reply With Quote
Old 01-21-2007, 05:16 PM   #13
jęd
Evangelist
jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.jęd has a complete set of Star Wars action figures.
 
Posts: 458
Karma: 293
Join Date: May 2006
Quote:
Originally Posted by Tommy
pretty cool, your hack downloads the pages behind the feeds, mine does no longer. I removed this feature, as I failed to nicely de-htmlise the pages. I could strip off the tags, but I could find a means to extract the "content of the article", and so all the nav-bars, ads etc were still present. The output - especially the LaTeX - was just ugly
But if there's someone interested in this feature, just let me know...
Well... Thats the reason why I avoided Latex as the intermediate file. I use htmldoc to produce a temporary pdf and then glue the pdfs together and add links to each page. Its an evolution of my perl script which does the same thing, but outputs in html.

When I get a moment I'll post up the php file that does this...
jęd is offline   Reply With Quote
Old 01-22-2007, 10:45 AM   #14
b_k
Übernerd
b_k is on a distinguished road
 
Posts: 238
Karma: 74
Join Date: Jun 2006
Location: Germany
Device: iRex iLiad
Quote:
Originally Posted by Tommy
That's true... for heise, but tagesschaue.de for example does not have something easy to parse for (and in particular not <HEISETEXT> :-)
well, not clean text, but look what is in a tagesschau.de html between "<div class="contModule conttext article">" and "<div class="standDatum">Stand: DD.MM.YYYY HH:MM Uhr</div>"
b_k is offline   Reply With Quote
Old 01-22-2007, 04:05 PM   #15
Tommy
Enthusiast
Tommy began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Oct 2006
Location: Germany
Device: Iliad, Sony 505
well, yeah, sounds feasible... this will need some testing whether all those pages have some pattern (that users will have to specify per feed) that can be used to pull the news out of the HTML.
Thanks for the hint!!!
But then - depending on the number of feeds digested - the document will become somewhat large and a 45 min ride on public transport might become too short a time to read that all
Tommy is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic The Daily hjordanh Barnes & Noble NOOK 3 02-05-2010 10:48 AM
Daily notifications? devilsadvocate Feedback 8 01-22-2010 12:24 PM
Daily Dilbert billbadger Calibre 2 12-09-2009 02:42 PM
Daily Comics billbadger Calibre 0 12-08-2009 07:22 PM
Amazon Daily daffy4u Amazon Kindle 13 06-04-2008 07:07 PM


All times are GMT -4. The time now is 06:56 PM.


MobileRead.com is a privately owned, operated and funded community.