View Single Post
Old 11-23-2009, 03:19 PM   #11
eksor
Connoisseur
eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.
 
eksor's Avatar
 
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
Hi again:

I thought that during the local morning I wrote something about this, never mind, I had a bad night and it was plenty of mistakes, I beg your pardon if you saw it.

Finally, I just tested this a few minutes again:

1) web2disk -r 1 http://schools-wikipedia.org/wp/inde...t.Music.Musica
l_Instruments.htm

This puts subject.Music.Musical_Instruments.xhtml in the working directory

1) web2disk -r 1 http://schools-wikipedia.org/wp/inde...nstruments.htm

This puts subject.Music.Musical_Instruments.xhtml in the same place

2) Then I edited test.html with this content

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=utf-8">
<TITLE></TITLE>
<META NAME="GENERATOR" CONTENT="OpenOffice.org 2.3 (Unix)">
<META NAME="CREATED" CONTENT="20091123;19424400">
<META NAME="CHANGED" CONTENT="20091123;19471300">
<STYLE TYPE="text/css">
<!--
@page { size: 21cm 29.7cm; margin: 2cm }
P { margin-bottom: 0.21cm }
-->
</STYLE>
</HEAD>
<BODY LANG="en-GB" DIR="LTR">
<P STYLE="margin-bottom: 0cm"><A HREF="subject.Music.Musical_Instruments.xhtml">
Instruments</A></P>
<P STYLE="margin-bottom: 0cm"><A HREF="subject.Music.Musical_Recordings_and_composi tions.xhtml">Recordings
and compositions</A></P>
</BODY>
</HTML>

3) finally ebook-convert test.html test.epub converted the html in a epub file with images links and so on in 3.6 MB. If you open it with ark or other archiver you can see the structer and the 762 files. Imagine the whole thing.

Conclusion, probably you could download the whole wiki for schools from the top page with web2disk, but:

1) Probably it is not exactly polite since a torrent is offered.
2) It would take a lot.
3) It would take a lot to convert.
4) The resulting epub will be huge.

Regards.

Quote:
Originally Posted by okalyddude View Post
Hmm ok, the version I have of the wikipedia does not have an indexed html for the letters.. merely a folder containing all the html and jpg files (and jpg files are not well labeled)

I will try to look at what you linked, and see if I can get it working that way.

I did intend to buy an expandable memory slot, but only if I got this working in an efficient manner.

In the way you're describing, I would have a single epub file for each letter? And the chapters catalogue would be links to all the html pages. How do you include the jpgs in the html pages? Are they already in the 'index' html? (i'm new and quite clueless to this, and have yet to try it out)

Then, with the search function of the 600, would you have to open the letter you want, then search, and the first result will be from the index?

Would there be any way to index them all in one file? Would the reader be able to handle this? Would it be able to handle the size of individual letters that are quite large?

I've been busy, so haven't had time to play around with the wikipedia version I have, I will check out the link you provided to see how the indexing works by letter...

The poster above you mentioned what I thought I'd have to do (as what I have is a bunch of randomly named html and jpg files) but it would be a ton of work - but I could also index them all in one, by letter, etc...
eksor is offline   Reply With Quote