View Single Post
Old 02-03-2010, 03:48 PM   #2
eksor
Connoisseur
eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.
 
eksor's Avatar
 
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
Quote:
Originally Posted by posativ View Post
Hi,

I hope, I've choosen the right subforum.
I am not really familiar with all these ebooks standards...

I would like to make a little python script which downloads from the given wikipedia article all mentioned and linked wikipedia-entries for lets say 1 or 2 recursion depth.

My output would be the following some html files.

How can I convert them to e.g. LRF, so I can click on a link in the LRF to get the related article in another LRF-file?
I think that lrf files are self contained, the whole bunch of images, html/xml files and so on are compressed in a single file (ala chm), without possibility of external lrf files linking.

calibre http://calibre-ebook.com/ relies in python, i think, and already has web2disk and ebook-convert cmd line utilities that should do what you want.

The bad thing is that I tried that with mixed results, blame on wikipedia layout not calibre (I would trashcan my prs700 without that marvelous software). To be fair with wikipedia, I think that recursive downloading of articles it is not recommended in the TOS or something similar. And I can understand that, overloading of the servers and things like that.

If plucker format is fine for you, you can try plucker or sunrisexp, this two work very well, I was able to read the whole Solar System (60-70MB) article in my ppc.

Regards.
eksor is offline   Reply With Quote