View Full Version : HTML with external links


posativ
12-01-2009, 04:10 PM
Hi,

I hope, I've choosen the right subforum.
I am not really familiar with all these ebooks standards...

I would like to make a little python script which downloads from the given wikipedia article all mentioned and linked wikipedia-entries for lets say 1 or 2 recursion depth.

My output would be the following some html files.

How can I convert them to e.g. LRF, so I can click on a link in the LRF to get the related article in another LRF-file?

eksor
02-03-2010, 04:48 PM
Hi,

I hope, I've choosen the right subforum.
I am not really familiar with all these ebooks standards...

I would like to make a little python script which downloads from the given wikipedia article all mentioned and linked wikipedia-entries for lets say 1 or 2 recursion depth.

My output would be the following some html files.

How can I convert them to e.g. LRF, so I can click on a link in the LRF to get the related article in another LRF-file?

I think that lrf files are self contained, the whole bunch of images, html/xml files and so on are compressed in a single file (ala chm), without possibility of external lrf files linking.

calibre http://calibre-ebook.com/ relies in python, i think, and already has web2disk and ebook-convert cmd line utilities that should do what you want.

The bad thing is that I tried that with mixed results, blame on wikipedia layout not calibre (I would trashcan my prs700 without that marvelous software). To be fair with wikipedia, I think that recursive downloading of articles it is not recommended in the TOS or something similar. And I can understand that, overloading of the servers and things like that.

If plucker format is fine for you, you can try plucker or sunrisexp, this two work very well, I was able to read the whole Solar System (60-70MB) article in my ppc.

Regards.

HarryT
02-07-2010, 08:27 AM
The Mobipocket file format supports linking to other files in some of its implementations. I don't know, off-hand, of any other eBook file format that does.