Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Kobo Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 02-23-2011, 07:32 PM   #1
carlb
Member
carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.
 
Posts: 23
Karma: 200001
Join Date: Feb 2011
Device: BPDN, Kobo wifi
Post Wikipedia?

I've been attempting to import the 1900+ Wikipedia articles from the kiwix.org "kiwix-0.5.iso" CD as an .epub but am encountering a few bugs along the way.

Kiwix is a project intended to create an offline reader for Wikipedia. It currently uses .ZIM, a non-standard compressed file format which can be read with the open-source libzim, according to online documentation. The "kiwix-0.5.iso" archive, however, is an old version which contains an /html/... directory tree with plain, uncompressed web pages for this small selection of Wikipedia texts.

(Update: There is a utility "zimdump" provided as part of the source code package for Zimlib, available from openzim.org; this has been used successfully to convert Kiwix .ZIM archives into CD or DVD-sized piles of individual *.html and *.png/*.jpg files. Once you have zimlib installed from source, go to zimlib/src/tools and type 'make' to build the optional command-line utilities which you will need. At that point, 'zimdump -D destination_directory -f first_article_name input_filename.ZIM' should dump everything back to the original format, articles in destination_directory/A/* and images in destination_directory/I/*. The articles will need to be renamed to add the '.html' suffix, to replace any blank spaces in the name with _ underscores and to fix any URL-encoded accented/Unicode characters before importing this mess into Sigil.)

I'd tried renaming the files (which have names like /html/art/a/w/9.html) to something meaningful and then importing them into a Sigil *.epub document. I have noticed one bug in Sigil; if there are two or more images which have the same base filename but different path, the auto-rename which Sigil attempts to use to resolve this conflict tends to be sporadic at best. This leaves many missing images in the resulting *.epub files. Subsequent attempts based on the Kiwix-style *.ZIM archives appear to be more successful as these abandon the oddball three-level, one-character base name file structure of the old 0.5 version of this collection; this was used for the schools encyclopedia described in a subsequent post to this thread.

I also find that there seems to be a practical limit (likely no more than 500 typical encyclopaedia articles) for what can be contained in a single *.epub file without creating problems. The table of contents generation in Sigil is also problematic, insofar as it insists on taking every HTML heading (h1, h2, h3, h4, h5) from within the individual articles and creating a multi-megabyte table of contents which is unusable to the reader due to its sheer size.

I've split this project into four separate *.epub files (like the alphabetical volumes of a printed encyclopaedia) and removed all but the first-level article names from the table of contents and the result is almost usable. Almost.

The handling of large tables (such as the main "Version 0.5" content overview page which appears as the first chapter of these generated *.epubs) appears to be breaking badly on Kobo wi-fi. Open the encyclopaedia to the first chapter and, instead of using the menus to skip directly to another chapter using the table of contents, just try paging through Chapter 1 (the huge table listing what's in this selection). At some point (usually on the first page turn) the Kobo will decide that it's taking too long to make sense of such a huge, unwieldly HTML table and reboot itself.

This would appear to be a firmware bug, as the text is entirely readable on PC-based tools such as the document viewer in Calibre.

Is there any fix for this issue?
Attached Files
File Type: epub Encyclopedia A-D.epub (14.35 MB, 197 views)
File Type: epub Encyclopedia E-K.epub (10.31 MB, 151 views)
File Type: epub Encyclopedia L-Q.epub (9.52 MB, 150 views)
File Type: epub Encyclopedia R-Z.epub (9.82 MB, 154 views)

Last edited by carlb; 03-04-2011 at 12:44 PM.
carlb is offline   Reply With Quote
Old 02-23-2011, 08:12 PM   #2
almagary
Wears funny hat (cloth)
almagary began at the beginning.
 
Posts: 28
Karma: 26
Join Date: Dec 2010
Location: Limbo
Device: Kobo WiFi, Kobo Touch
Kobo + WikiReader

Good luck on this definitely a worthwhile project. I'm trying it out but unfortunately can't help fix bugs.

As I read on my Kobo I have a $100 Openmoko WikiReader, a little touchscreen monochrome LCD gadget (http://thewikireader.com/) which has most Wikipedia articles (no images, no tables or lists), Wiktionary, Wikitravel, Wikiquotes, and ~33,000 Project Gutenberg books (not very usable but a neat try).
almagary is offline   Reply With Quote
Old 02-24-2011, 12:05 AM   #3
AJ Starr
Guru
AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.AJ Starr ought to be getting tired of karma fortunes by now.
 
AJ Starr's Avatar
 
Posts: 796
Karma: 1029784
Join Date: May 2008
Location: Nebraska, USA
Device: PEZ, Color Libre, 2@Sony T1, Onyx i62HD
I purchased the WikiReader, one for myself, and one for my granddaughter's school work. It's great for a quick check of information while I'm in front of the TV (which is where I keep it)

Anyway the site has the entire wikipedia information where you can download it. Lots of open source files and "how to" get other wiki's. If you can figure it out and translate it, it's there, free for download.

https://github.com/wikireader/wikireader/wiki

The main site is http://thewikireader.com/

http://dev.thewikireader.com/language-packs/ this is another site for the developments.

Unfortunately, all the developement is beyond my programming skills which stopped at Basic (the old DOS days....)

Good luck.

AJ
AJ Starr is offline   Reply With Quote
Old 03-04-2011, 12:07 PM   #4
carlb
Member
carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.
 
Posts: 23
Karma: 200001
Join Date: Feb 2011
Device: BPDN, Kobo wifi
Cool Your door-to-door encyclopaedia salesperson strikes again...

I've posted another set of encyclopaedia .epub's, this one with 5500 Wikipedia articles which had been distributed on CD as an encyclopaedia for schools in October 2008.

As most images (in whatever size they appeared on the original Wikipedia pages) are retained in this collection, the size of each individual volume appears to be about fifty megabytes... effectively requiring a CD's worth of space to store the full fifteen-volume set (186 MB of text, 430 MB of images).

This makes even this severely-abridged set too large to upload here. As such, I'm back to peddling encyclopaedias door-to-door and am in this fine neighbourhood today:

http://epub.wikipedia.cx

I post the first volume as a sample, absolutely free of charge, at the end of this message as a token of thanks for having heard what I have to offer you today (and, perhaps, because it is the only volume which fit in under this site's 20MB *.epub file size limit).

While this set is not itself a project of the Wikimedia Foundation, it does use content generated by various individual Wikipedia contributors and is licensed under the GNU Free Documentation License for free use.

I would urge you to go to http://epub.wikipedia.cx today and acquire this fine set of encyclopaedia volumes... that way you may sleep tonight with the security of knowing that your goldfish will not flunk out of their school for a lack of brain food and that their future will be secure. Certainly this would be a bargain at twice the price... but there's more.

This collection will use about two thirds of the memory in a stock Kobo reader, but perhaps I could interest you in this fine microSD card upgrade which would provide your home with enough shelving space to store the information equivalent of three DVD's for about $C30, using components available at any local computer store. *Some assembly required.
Attached Files
File Type: epub Encyclopedia 0-9.epub (16.18 MB, 143 views)

Last edited by carlb; 03-05-2011 at 12:59 AM.
carlb is offline   Reply With Quote
Old 03-04-2011, 09:15 PM   #5
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,956
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Very nice effort! Well worth the download time...

I did something similar with the 2006 version of the SOS Children's Wikipedia CD and posted my experiences and challenges in the thread entitled Creating HUGE ebooks from the 2006 Wikipedia CD Selection. I too had to locate my ebooks off-site ( see here ), due to the max. upload limit here.

I could contain everything within 1 ebook primarily since I reduced pictures to a max. image size (150x150) and max. color depth of 16 colors (4 bit). Without doing this, I too would have had to split the ebook to fit internal memory, but then I would lose some of the links pointing to the split .epub volumes.

Those pictures were too small, but allowed the final ebook size to be reduced! Also, reducing the color to 16 created some "banding" effects, but again drastically reduced the final ebook size. All in all, an acceptable compromise.

Try reducing your images to 4 bit .png (optimzed) i.e. 16 colours (or .jpg/.gif if smaller), then recreate the ebook to see the file size savings...

Did you try to create a version without ANY images (just rename your images folder temporarily to another name) just to see the min. ebook filesize?

Keep up the good work; I just love these gargantuan ebooks.

Last edited by nrapallo; 03-04-2011 at 09:19 PM.
nrapallo is offline   Reply With Quote
Old 03-05-2011, 03:32 AM   #6
almagary
Wears funny hat (cloth)
almagary began at the beginning.
 
Posts: 28
Karma: 26
Join Date: Dec 2010
Location: Limbo
Device: Kobo WiFi, Kobo Touch
Tnx to Carlb for pointing to the shorter Wikipedia at the .cx address. (CX is Christmas Island, an Australian territory not that far from Jakarta. That's exotic enough but then I went up to the homepage, which seems to be the Portuguese Wikipedia site.) With some effort I loaded the 15 files (total is 600MB) onto the 2MB SD card, using Calibre. Kobo accepted them, after some hiccuping.

Wikipedia on the Kobo lacks search, of course, and getting to an individual article takes loading one of the volumes, then clicking in the TOC to see if your topic is there, but it does work. The format is best at smallest type size, but all the illustrations in the files seem intact.

I hope the next firmware will facilitate using such reference works on the Kobo, ideally with ability to have at least two books open at the same time.
almagary is offline   Reply With Quote
Old 03-05-2011, 08:27 AM   #7
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,956
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by nrapallo View Post
Try reducing your images to 4 bit .png (optimzed) i.e. 16 colours (or .jpg/.gif if smaller), then recreate the ebook to see the file size savings...
I tried this with "Encyclopedia A.epub" which is 50.1 MB (52,606,545 bytes) and produced "Encyclopedia A_fixed16.epub" which is 23.5 MB (24,727,980 bytes), a 53% size reduction!

A few things I had to change (since I'm using a Western-world WinXP computer):
  • the file "Are_You_There_God?_Its_Me,_Margaret.html" was renamed "Are_You_There_God_Its_Me,_Margaret.html" as Windows didn't allow the quotation mark ("?") in the filename!
  • I reduced any image greater than 200x??? so that the short side was 200 using IrfanView and didn't enlarge smaller pictures nor used any dithering. For .gifs and .pngs, I just reduced the colors to 16 (4 bit) while with .jpgs I had better luck reducing the Quality to 70 than changing those images to .png or .gif! Any images less than 200x???, I didn't change the size, but did change the colors as above.
  • I edited all the text files (using notepad++ since it supports Unicode better than my trusty Textpad) to remove any width="???" or height="???" within <img > tags since I had reduced some image dimensions and didn't need/like them stretched back to their specified/original size.
Despite all these efforts, the file is too large to load for my epub reader so I couldn't test my creation thereon. However, it does load your "Encyclopedia 0-9.epub" (16MB) so it must be under some max. limit my Adobe Mobile Edition 9 doesn't support! I did preview that file using ADE (PC version).

P.S. it appears 16 color (4bit) .png images don't display properly on older versions of ADE (like the PC previewer I use), so I also prepared a 256 color .png version of "Encyclopedia A.epub" and called it "Encyclopedia A_fixed256.epub". This one added about 3MB and is 26.4 MB (27,690,600 bytes).

Quote:
Did you try to create a version without ANY images (just rename your images folder temporarily to another name) just to see the min. ebook filesize?
I tried it and for "Encyclopedia A.epub" it resulted in an ebook which is 4.63 MB (4,859,059 bytes); a 10-fold size reduction!

Last edited by nrapallo; 03-05-2011 at 09:54 AM. Reason: typo
nrapallo is offline   Reply With Quote
Old 03-07-2011, 04:05 PM   #8
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,956
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by nrapallo View Post
I tried this with "Encyclopedia A.epub" which is 50.1 MB (52,606,545 bytes) and produced "Encyclopedia A_fixed16.epub" which is 23.5 MB (24,727,980 bytes), a 53% size reduction!

... also prepared a 256 color .png version of "Encyclopedia A.epub" and called it "Encyclopedia A_fixed256.epub". This one added about 3MB and is 26.4 MB (27,690,600 bytes).

... and for "Encyclopedia A_fixed16-noimages.epub" resulted in an ebook which is 4.63 MB (4,859,059 bytes); a 10-fold size reduction!
Those revised ebooks of "Encyclopedia A.epub" have now been uploaded to my Gargantuan eBook server in a directory called Wikipedia_2008_fixed!

Only one for now...

Last edited by nrapallo; 03-07-2011 at 10:17 PM.
nrapallo is offline   Reply With Quote
Old 03-09-2011, 12:12 PM   #9
carlb
Member
carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.carlb ought to be getting tired of karma fortunes by now.
 
Posts: 23
Karma: 200001
Join Date: Feb 2011
Device: BPDN, Kobo wifi
Quote:
Originally Posted by nrapallo View Post
I reduced any image greater than 200x??? so that the short side was 200 using IrfanView and didn't enlarge smaller pictures nor used any dithering. For .gifs and .pngs, I just reduced the colours to 16 (4 bit) while with .jpgs I had better luck reducing the Quality to 70 than changing those images to .png or .gif! Any images less than 200x???, I didn't change the size, but did change the colours as above.
IrfanView?

For such a large quantity of images, it may be easier to use http://www.imagemagick.org/script/convert.php as it can be run from a batch file to -resize or change colour -depth of large numbers of images at once.

Wikipedia's server-side MediaWiki software invokes ImageMagick's free convert utility as one commonly-used means to generate thumbnails from uploaded photos for use on content pages; it won't convert .SVG to other formats but will do just about anything else.
carlb is offline   Reply With Quote
Old 03-09-2011, 12:49 PM   #10
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,956
Karma: 2530531
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by carlb View Post
IrfanView?
http://www.irfanview.com/

Quote:
For such a large quantity of images, it may be easier to use http://www.imagemagick.org/script/convert.php as it can be run from a batch file to -resize or change colour -depth of large numbers of images at once.
Irfanview has a GUI interface that allows batch processing with many options. I'm using Windows and Irfanview is a workhorse that I've exploited countless times...

Click image for larger version

Name:	Irfanview-batch processing options GUI.jpg
Views:	141
Size:	107.0 KB
ID:	68040

By the way, do you use Linux or did you compile the zimdump utility for Windows computers. I wouldn't mind having an executable copy of zimdump...

Last edited by nrapallo; 03-09-2011 at 12:52 PM.
nrapallo is offline   Reply With Quote
Old 11-18-2011, 01:45 PM   #11
alg2468
Member
alg2468 began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2011
Location: RI, USA
Device: Aluratek Libre, Velocity Cruz T301, EZReader, Iview 435TPC, Wikireader
It works on the Aluratek Libre Pro!

The Wikipedia downloads that carlb posted work in the Aluratek Libre Pro! This is excellent! Great work, carlb!
alg2468 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
wikipedia bo-kai Sony Reader 4 09-23-2010 06:32 PM
Wikipedia with 2.12? Gogolo iRex 20 04-29-2008 07:17 PM
iLiad Wikipedia smoogle iRex Developer's Corner 8 03-28-2008 10:59 AM
Reference Wikipedia: SOS Children 2006 Wikipedia CD hn_88 BBeB/LRF Books 0 01-29-2008 12:23 PM
iLiad I want wikipedia... narve iRex Developer's Corner 15 08-16-2007 07:38 AM


All times are GMT -4. The time now is 04:45 AM.


MobileRead.com is a privately owned, operated and funded community.