Quote:
Originally Posted by posativ
Hi,
I hope, I've choosen the right subforum.
I am not really familiar with all these ebooks standards...
I would like to make a little python script which downloads from the given wikipedia article all mentioned and linked wikipedia-entries for lets say 1 or 2 recursion depth.
My output would be the following some html files.
How can I convert them to e.g. LRF, so I can click on a link in the LRF to get the related article in another LRF-file?
|
I think that lrf files are self contained, the whole bunch of images, html/xml files and so on are compressed in a single file (ala chm), without possibility of external lrf files linking.
calibre
http://calibre-ebook.com/ relies in python, i think, and already has web2disk and ebook-convert cmd line utilities that should do what you want.
The bad thing is that I tried that with mixed results, blame on wikipedia layout not calibre (I would trashcan my prs700 without that marvelous software). To be fair with wikipedia, I think that recursive downloading of articles it is not recommended in the TOS or something similar. And I can understand that, overloading of the servers and things like that.
If plucker format is fine for you, you can try plucker or sunrisexp, this two work very well, I was able to read the whole Solar System (60-70MB) article in my ppc.
Regards.