Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 04-10-2009, 05:59 PM   #1
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,367
Karma: 3161371
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Wikipedia Ebook Project

There is a 3.4GB offline version of Wikipedia. I have a copy, and I'm going to try to turn it into an ebook.

I've been looking at it today. The text by itself is only 400MB, and I should be able to cut that in half (or more) by removing most of the formatting.

The images take up the other 3GB. I expect to be able to reduce this to 1GB by abandoning the highest resolution images and by eliminating the duplicate images. I might get it even further.


You can download it as a torrent here. You will need a bittorrent client.
Nate the great is offline   Reply With Quote
Old 04-10-2009, 06:24 PM   #2
ShortNCuddlyAm
WWHALD
ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.
 
ShortNCuddlyAm's Avatar
 
Posts: 7,881
Karma: 337114
Join Date: Sep 2008
Location: Mitcham, Surrey, UK
Device: iPad. Selling my silver 505 here
Good luck!
ShortNCuddlyAm is offline   Reply With Quote
Old 04-10-2009, 07:20 PM   #3
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,367
Karma: 3161371
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by ShortNCuddlyAm View Post
Good luck!
Thank you.

Actually, what I need more than luck is some serious processing power. I will be working on 8,779 html files. With an estimate of 5 minutes per file (with my current computer), that comes to over 43k minutes (~731 hours). And that is just one run. If I have to redo it...

I _so_ wish I could afford to build a Beowulf Cluster.
Nate the great is offline   Reply With Quote
Old 04-10-2009, 07:29 PM   #4
Hypernova
Hyperreader
Hypernova began at the beginning.
 
Posts: 66
Karma: 10
Join Date: Feb 2009
Device: Kindle DXG;Pocketbook 360
I don't know much about this at all, but I'll say this since it might help. How about using the cluster that you can access? Maybe Microsoft Azure platform or other? Here's a program that, acccording to them, would take weeks to calculate that graph but they can do it in seconds using Azure cluster

http://www.dotnetsolutions.ltd.uk/ev.../wikiexplorer/

Again, it just come to my mind when I saw the thread. I'm looking forward the end result
Hypernova is offline   Reply With Quote
Old 04-10-2009, 07:53 PM   #5
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,367
Karma: 3161371
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by Hypernova View Post
I don't know much about this at all, but I'll say this since it might help. How about using the cluster that you can access? Maybe Microsoft Azure platform or other? Here's a program that, acccording to them, would take weeks to calculate that graph but they can do it in seconds using Azure cluster

http://www.dotnetsolutions.ltd.uk/ev.../wikiexplorer/

Again, it just come to my mind when I saw the thread. I'm looking forward the end result
That won't help me because that is a software visualization tool.

from Wikipedia:
Quote:
Originally referring to a specific computer built in 1994, Beowulf is a class of computer clusters similar to the original NASA system. They are high-performance parallel computing clusters of inexpensive PC hardware. The name comes from the main character in the Old English epic poem Beowulf.
Basically, you build a supercomputer from a bunch of PCs (anywhere from 10 to 1k).
Nate the great is offline   Reply With Quote
Old 04-10-2009, 08:16 PM   #6
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Nate the great View Post
There is a 3.4GB offline version of Wikipedia. I have a copy, and I'm going to try to turn it into an ebook.
I recently had the same idea. But since the 2008 Wikipedia is just TOO huge I went looking for a smaller (earlier) version to convert to an ebook. I settled on the 2006 Wikipedia SOS Children CD converted from an existing plucker .pdb version. See this thread for more info and screenshots.

Quote:
I've been looking at it today. The text by itself is only 400MB, and I should be able to cut that in half (or more) by removing most of the formatting.
My conversion of the 2006 Wikipedia CD showed that a text only (omitting images) ebook resulted in a manageable 15-20 MB ebook without any reformatting or tweaking.

Quote:
The images take up the other 3GB. I expect to be able to reduce this to 1GB by abandoning the highest resolution images and by eliminating the duplicate images. I might get it even further.
For the images, I found converting them to 4-bit (color) .gif worked best and if you use ifranview, then you can also automatically convert the larger images down to max. size 300x300 (or 150x150 which the plucker .pdb had).

My initial tests showed that if the max. image size is 300x300, then all the (160 MB) images on the 2006 CD occupy about 60 MB whereas the 150x150 max ones only occupied about 27 MB.

As a test, I actually loaded a 50MB .imp of the 2006 Wikipedia onto my REB1200 and it worked flawlessly!!! Legacy reader my a**!

Cheers,

p.s. BTW, we seem to have the same tastes in converting HUGE ebooks, so I'll throw one at you that I have not managed to try. It's the www.imdb.com!

Others that I've already done, but can't distribute due to copyrights, are:
--> CIA_World_Factbook_2005 - 13MB (you know about this one)
--> Biographies of Mathematicians (link) - 28 MB
--> The Encyclopedia of World History - Ancient, Medieval, and Modern, 6th ed (link) - 12MB (www.bartleby.com/67 site no longer available, try this archived site instead)
--> NAB-New American Bible for Catholics (2002) (link) - 15MB
--> Sacred Texts - Secret Teachings of All Ages (link) - 16MB
--> Sacred Texts - Notebooks of Leonardo Da Vinci (link) - 15MB
--> Euclid (The Elements) (link) - 4MB

Last edited by nrapallo; 05-05-2013 at 05:20 PM.
nrapallo is offline   Reply With Quote
Old 04-10-2009, 08:25 PM   #7
igorsk
Wizard
igorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfoldedigorsk reads XML... blindfolded
 
Posts: 3,443
Karma: 52235
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Why can't you distribute CIA World Factbook? Isn't it public domain?
igorsk is offline   Reply With Quote
Old 04-10-2009, 08:27 PM   #8
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Nate the great View Post
Thank you.

Actually, what I need more than luck is some serious processing power. I will be working on 8,779 html files. With an estimate of 5 minutes per file (with my current computer), that comes to over 43k minutes (~731 hours). And that is just one run. If I have to redo it...

I _so_ wish I could afford to build a Beowulf Cluster.
My initial tests, a month ago, showed that I would have to do a lot a HTML code tweaking to get the Wikipedia (soup) to display properly on small screen ebook readers. I sort of abandoned it when I saw the results of a 50-page test .imp.

You can use the ETI PC viewer to see what I mean. :yuk:
Attached Files
File Type: imp schools-wikipedia-full-20081023-test.imp (871.0 KB, 132 views)
nrapallo is offline   Reply With Quote
Old 04-10-2009, 08:30 PM   #9
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by igorsk View Post
Why can't you distribute CIA World Factbook? Isn't it public domain?
Sorry, you're right, that one is PD and was already uploaded here.

I just lumped it in with the other huge websites that I've spidered and converted to ebooks.
nrapallo is offline   Reply With Quote
Old 04-10-2009, 08:55 PM   #10
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,367
Karma: 3161371
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
Quote:
Originally Posted by nrapallo View Post
BTW, we seem to have the same tastes in converting HUGE ebooks, so I'll throw one at you that I have not managed to try. It's the www.imdb.com!
No, thank you. I might be crazy, but I'm not that crazy.

Quote:

Others that I've already done, but can't distribute due to copyrights, are:
--> CIA_World_Factbook_2005 - 13MB (you know about this one)
--> Biographies of Mathematicians (link) - 28 MB
--> The Encyclopedia of World History - Ancient, Medieval, and Modern, 6th ed (link) - 12MB
--> NAB-New American Bible for Catholics (2002) (link) - 15MB
--> Sacred Texts - Secret Teachings of All Ages (link) - 16MB
--> Sacred Texts - Notebooks of Leonardo Da Vinci (link) - 15MB
--> Euclid (The Elements) (link) - 4MB
Thank you for pointing me at the Encyclopedia of World History. Have you seen the _maps_? I simply have to convert it.

On a side note, have you seen my version of the 2008 World Fact Book?
Nate the great is offline   Reply With Quote
Old 04-10-2009, 09:48 PM   #11
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by Nate the great View Post
No, thank you. I might be crazy, but I'm not that crazy.
Yep, I'll pass too!

Quote:
Thank you for pointing me at the Encyclopedia of World History. Have you seen the _maps_? I simply have to convert it.
Yes, I've seen them and included them into my .imp ebook done two years ago (April 2007). See below for some screenshots.

I actually did a downgraded version without tables/appendices/indices (12 MB), a near perfect copy minus the Subject/Alpha indices (25 MB) and a near perfect copy that crashed my reader due to hyperlinks overload (33 MB). The screenshots are from the 25 MB version.

Quote:
On a side note, have you seen my version of the 2008 World Fact Book?
Yes, I did get a copy of version 0.5 but didn't like the look and feel (too different from what I was producing), but thanks to your heads up, I got version 0.6 and I'm very impressed. It's a wonderful conversion and a keeper! I'll try to convert it to .imp versions if I may.

Great job, Nate!
Attached Thumbnails
Click image for larger version

Name:	WorldHistory_1.jpg
Views:	166
Size:	161.6 KB
ID:	27416   Click image for larger version

Name:	WorldHistory_2.jpg
Views:	155
Size:	135.3 KB
ID:	27417   Click image for larger version

Name:	WorldHistory_3.jpg
Views:	147
Size:	149.6 KB
ID:	27418   Click image for larger version

Name:	WorldHistory_4.jpg
Views:	163
Size:	124.9 KB
ID:	27419   Click image for larger version

Name:	WorldHistory_5.jpg
Views:	146
Size:	178.0 KB
ID:	27420  

Last edited by nrapallo; 04-10-2009 at 10:03 PM.
nrapallo is offline   Reply With Quote
Old 04-10-2009, 10:16 PM   #12
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 2,308
Karma: 5761596
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PRS-350, Nexus S, Galaxy S, Nook Color, iPhone4, iPT4, iPad 2012
Quote:
Originally Posted by Nate the great View Post
Thank you.

Actually, what I need more than luck is some serious processing power. I will be working on 8,779 html files. With an estimate of 5 minutes per file (with my current computer), that comes to over 43k minutes (~731 hours). And that is just one run. If I have to redo it...

I _so_ wish I could afford to build a Beowulf Cluster.
Building your own "Beowulf" cluster isn't as expensive as it once was.
http://www.calvin.edu/~adams/research/microwulf/
ilovejedd is offline   Reply With Quote
Old 04-11-2009, 07:32 AM   #13
ShortNCuddlyAm
WWHALD
ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.ShortNCuddlyAm ought to be getting tired of karma fortunes by now.
 
ShortNCuddlyAm's Avatar
 
Posts: 7,881
Karma: 337114
Join Date: Sep 2008
Location: Mitcham, Surrey, UK
Device: iPad. Selling my silver 505 here
Could something like BOINC (the software used in things like SETI@Home) help? Of course, you'd need to find people willing to help, but I can't be the only one here who would be?
ShortNCuddlyAm is offline   Reply With Quote
Old 04-11-2009, 08:31 AM   #14
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,367
Karma: 3161371
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
On a related note, there is a link somewhere on MobileRead to a 4.7GB version of Wikipedia in Mobipocket. It's divided into about 40 ebooks, and has an uncompressed size of 7.5GB. I can't find the link, so I'll post a follow up here.

I downloaded it overnight, and have found it to be almost unusable. The links between the ebooks do not work.
Nate the great is offline   Reply With Quote
Old 04-17-2009, 10:46 AM   #15
dsip
Connoisseur
dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.dsip ought to be getting tired of karma fortunes by now.
 
Posts: 73
Karma: 495694
Join Date: Feb 2009
Device: Between Devices..
Quote:
Originally Posted by Nate the great View Post
I downloaded it overnight, and have found it to be almost unusable. The links between the ebooks do not work.
The links work fine for me in Mobipocket Reader (i.e. on Windows), when I have them all loaded in my library, but they don't really work properly on my Reader (K1).

I used the German version though, but I think that shouldn't really matter. When I browse the index with the Mobipocket reader, there's a search field you can type in, that doesn't exist on the Kindle either - and the links in the index file are not even shown as links on the Kindle. I guess the Kindle doesn't support links from one book to another yet? :\

Could there be any way to create a huge book from all those files with a searchable index?

http://pinguinburg.de/wpmp - for anyone who's interested.
At least I have an offline Wikipedia on my laptop now.. I just need to get it to work on the Kindle somehow..
dsip is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Project: The ebook word cloud Nate the great General Discussions 20 09-25-2010 06:31 PM
If your library is part of the Ohio eBook Project maxbookworm General Discussions 11 05-04-2010 08:44 PM
2008 Wikipedia Ebook Available For Free Download MatYadabyte News 7 10-21-2009 04:26 AM
Wikipedia to Ebook Gideon News 18 03-29-2009 09:03 PM
Charitable Project for Ebook Enthusiasts vivaldirules Lounge 1 12-28-2007 03:11 PM


All times are GMT -4. The time now is 09:39 PM.


MobileRead.com is a privately owned, operated and funded community.