04-10-2009, 05:59 PM | #1 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Wikipedia Ebook Project
There is a 3.4GB offline version of Wikipedia. I have a copy, and I'm going to try to turn it into an ebook.
I've been looking at it today. The text by itself is only 400MB, and I should be able to cut that in half (or more) by removing most of the formatting. The images take up the other 3GB. I expect to be able to reduce this to 1GB by abandoning the highest resolution images and by eliminating the duplicate images. I might get it even further. You can download it as a torrent here. You will need a bittorrent client. |
04-10-2009, 06:24 PM | #2 |
WWHALD
Posts: 7,879
Karma: 337114
Join Date: Sep 2008
Location: Mitcham, Surrey, UK
Device: iPad. Selling my silver 505 here
|
Good luck!
|
Advert | |
|
04-10-2009, 07:20 PM | #3 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Thank you.
Actually, what I need more than luck is some serious processing power. I will be working on 8,779 html files. With an estimate of 5 minutes per file (with my current computer), that comes to over 43k minutes (~731 hours). And that is just one run. If I have to redo it... I _so_ wish I could afford to build a Beowulf Cluster. |
04-10-2009, 07:29 PM | #4 |
Hyperreader
Posts: 130
Karma: 28678
Join Date: Feb 2009
Device: Current: Boox Leaf2 (broken) Past: H2O, Kindle PW1, DXG;Pocketbook 360
|
I don't know much about this at all, but I'll say this since it might help. How about using the cluster that you can access? Maybe Microsoft Azure platform or other? Here's a program that, acccording to them, would take weeks to calculate that graph but they can do it in seconds using Azure cluster
http://www.dotnetsolutions.ltd.uk/ev.../wikiexplorer/ Again, it just come to my mind when I saw the thread. I'm looking forward the end result |
04-10-2009, 07:53 PM | #5 | ||
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Quote:
from Wikipedia: Quote:
|
||
Advert | |
|
04-10-2009, 08:16 PM | #6 | |||
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Quote:
Quote:
My initial tests showed that if the max. image size is 300x300, then all the (160 MB) images on the 2006 CD occupy about 60 MB whereas the 150x150 max ones only occupied about 27 MB. As a test, I actually loaded a 50MB .imp of the 2006 Wikipedia onto my REB1200 and it worked flawlessly!!! Legacy reader my a**! Cheers, p.s. BTW, we seem to have the same tastes in converting HUGE ebooks, so I'll throw one at you that I have not managed to try. It's the www.imdb.com! Others that I've already done, but can't distribute due to copyrights, are: --> CIA_World_Factbook_2005 - 13MB (you know about this one) --> Biographies of Mathematicians (link) - 28 MB --> The Encyclopedia of World History - Ancient, Medieval, and Modern, 6th ed (link) - 12MB (www.bartleby.com/67 site no longer available, try this archived site instead) --> NAB-New American Bible for Catholics (2002) (link) - 15MB --> Sacred Texts - Secret Teachings of All Ages (link) - 16MB --> Sacred Texts - Notebooks of Leonardo Da Vinci (link) - 15MB --> Euclid (The Elements) (link) - 4MB Last edited by nrapallo; 05-05-2013 at 05:20 PM. |
|||
04-10-2009, 08:25 PM | #7 |
Wizard
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Why can't you distribute CIA World Factbook? Isn't it public domain?
|
04-10-2009, 08:27 PM | #8 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
You can use the ETI PC viewer to see what I mean. :yuk: |
|
04-10-2009, 08:30 PM | #9 | |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
I just lumped it in with the other huge websites that I've spidered and converted to ebooks. |
|
04-10-2009, 08:55 PM | #10 | ||
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
Quote:
Quote:
On a side note, have you seen my version of the 2008 World Fact Book? |
||
04-10-2009, 09:48 PM | #11 | ||
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Yep, I'll pass too!
Quote:
I actually did a downgraded version without tables/appendices/indices (12 MB), a near perfect copy minus the Subject/Alpha indices (25 MB) and a near perfect copy that crashed my reader due to hyperlinks overload (33 MB). The screenshots are from the 25 MB version. Quote:
Great job, Nate! Last edited by nrapallo; 04-10-2009 at 10:03 PM. |
||
04-10-2009, 10:16 PM | #12 | |
hopeless n00b
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
|
Quote:
http://www.calvin.edu/~adams/research/microwulf/ |
|
04-11-2009, 07:32 AM | #13 |
WWHALD
Posts: 7,879
Karma: 337114
Join Date: Sep 2008
Location: Mitcham, Surrey, UK
Device: iPad. Selling my silver 505 here
|
Could something like BOINC (the software used in things like SETI@Home) help? Of course, you'd need to find people willing to help, but I can't be the only one here who would be?
|
04-11-2009, 08:31 AM | #14 |
Sir Penguin of Edinburgh
Posts: 12,375
Karma: 23555235
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
|
On a related note, there is a link somewhere on MobileRead to a 4.7GB version of Wikipedia in Mobipocket. It's divided into about 40 ebooks, and has an uncompressed size of 7.5GB. I can't find the link, so I'll post a follow up here.
I downloaded it overnight, and have found it to be almost unusable. The links between the ebooks do not work. |
04-17-2009, 10:46 AM | #15 | |
Connoisseur
Posts: 73
Karma: 495694
Join Date: Feb 2009
Device: Between Devices..
|
Quote:
I used the German version though, but I think that shouldn't really matter. When I browse the index with the Mobipocket reader, there's a search field you can type in, that doesn't exist on the Kindle either - and the links in the index file are not even shown as links on the Kindle. I guess the Kindle doesn't support links from one book to another yet? :\ Could there be any way to create a huge book from all those files with a searchable index? http://pinguinburg.de/wpmp - for anyone who's interested. At least I have an offline Wikipedia on my laptop now.. I just need to get it to work on the Kindle somehow.. |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Project: The ebook word cloud | Nate the great | General Discussions | 20 | 09-25-2010 06:31 PM |
If your library is part of the Ohio eBook Project | maxbookworm | General Discussions | 11 | 05-04-2010 08:44 PM |
2008 Wikipedia Ebook Available For Free Download | MatYadabyte | News | 7 | 10-21-2009 04:26 AM |
Wikipedia to Ebook | Gideon | News | 18 | 03-29-2009 09:03 PM |
Charitable Project for Ebook Enthusiasts | vivaldirules | Lounge | 1 | 12-28-2007 03:11 PM |