Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 01-11-2014, 01:36 PM   #1
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Is it feasible to build dictionary with opf2mobi/html2mobi?

I'm trying to build a mobi dictionary using mobiperl (with a view to building big dictionaries that crash mobigen).
Unfortunately, Kindle doesn't recognize my dictionaries as dictionaries, no index is available. Mobigen builds my test dictionary (just two entries) without problems so I used MobiMetaEdotor to copy all relevant EXTH records to mobiperl created mobi, but to no avail.
Tried to explode two dictionaries with KindleUnpack and to compare resulting html files but didn't notice considerable differences.
Please take a look at attached files and advise if building dictionaries with mobiperl makes sense.
Attached Files
File Type: zip Test Dictionary.zip (5.8 KB, 25 views)

Last edited by EbokJunkie; 01-11-2014 at 01:47 PM.
EbokJunkie is offline   Reply With Quote
Old 01-11-2014, 02:51 PM   #2
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,846
Karma: 4630359
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by EbokJunkie View Post
I'm trying to build a mobi dictionary using mobiperl (with a view to building big dictionaries that crash mobigen).
AFAIK, mobiperl is a set of tools not specifically designed to create a dictionary.

Quote:
Originally Posted by EbokJunkie View Post
Unfortunately, Kindle doesn't recognize my dictionaries as dictionaries, no index is available.
In that case the source files are most likely incorrectly formatted. You may want to compile the dictionary with the latest version of KindleGen at a command prompt and compare the output that KindleGen generates for the two-entry dictionary that works with the other one that doesn't work.

Quote:
Originally Posted by EbokJunkie View Post
Mobigen builds my test dictionary (just two entries) without problems so I used MobiMetaEdotor to copy all relevant EXTH records to mobiperl created mobi, but to no avail.
Just changing the metadata is not enough to convert a regular book into a dictionary. You'll need to modify the .opf file and the .html files and recompile the book.
Did you compare the .opf file of the dictionary that doesn't work with the one that does to ensure that it has all the entries required for dictionaries?

Quote:
Originally Posted by EbokJunkie View Post
Tried to explode two dictionaries with KindleUnpack and to compare resulting html files but didn't notice considerable differences.
KindleUnpack only has rudimentary support for dictionaries; for example, it cannot reverse-engineer inflections.
Doitsu is offline   Reply With Quote
 
Enthusiast
Old 01-11-2014, 03:03 PM   #3
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
>Did you compare the .opf file of the dictionary that doesn't work with the one that does
>to ensure that it has all the entries required for dictionaries?
I'm ising the same opf for both mobigen and opf2mobi processing.
EbokJunkie is offline   Reply With Quote
Old 01-11-2014, 03:10 PM   #4
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,846
Karma: 4630359
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by EbokJunkie View Post
>Did you compare the .opf file of the dictionary that doesn't work with the one that does
>to ensure that it has all the entries required for dictionaries?
I'm ising the same opf for both mobigen and opf2mobi processing.
Then there's obviously something wrong with the entry definitions. Why don't you post two entries from the file that doesn't work?
Doitsu is offline   Reply With Quote
Old 01-11-2014, 03:20 PM   #5
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Sorry, I don't understand. I posted source html file with all entries.
EbokJunkie is offline   Reply With Quote
Old 01-11-2014, 03:51 PM   #6
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 30,773
Karma: 85457880
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by EbokJunkie View Post
Sorry, I don't understand. I posted source html file with all entries.
Mobipocket/Kindle dictionaries, to work properly as machine-searchable dictionaries, have to be created from specially formatted source.

See here, here, and here.
pdurrant is offline   Reply With Quote
Old 01-11-2014, 04:17 PM   #7
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
pdurrant
Thank you but I know this.
I create dictionaries by hand in open text format called Lingvo DSL format and convert to html/opf using open source Rubi script dsl2mobi. This script always creates valid html and opf files suitable for subsequent building machine-searchable dictionaries.
My upload is an html file created by dsl2mobi from short two-entry dsl file.
In works perfectly well with mobigen and kindlegen 2.9.
EbokJunkie is offline   Reply With Quote
Old 01-11-2014, 05:00 PM   #8
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 30,773
Karma: 85457880
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
Quote:
Originally Posted by EbokJunkie View Post
pdurrant
Thank you but I know this.
Oh, I see. Sorry — I read your original post too quickly.

I suspect that the only way to get to the bottom of this will be to examine the output very carefully indeed. KindleUnpack might not output sufficient info - you may need to delve into it with a Hex Editor.

I'll take a quick look at what you uploaded.
pdurrant is offline   Reply With Quote
Old 01-11-2014, 05:06 PM   #9
pdurrant
The Grand Mouse
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 30,773
Karma: 85457880
Join Date: Jul 2007
Location: Norfolk, England
Device: NOOK ST GlowLight
OK, clearly mobiperl isn't going to do what you want. And you won't spot differences in the HTML as that's not really the important bit. If you use MobiUnpack with the appropriate flags to dump everything, you'll see that in your example from mobiperl you just have the Mobipocket header, html, ncx and opf sections in your generated file.

In the one from Kindlegen you also four INDX sections, a FCIS section, a FLIS section, and three tiny unknown sections.

If you want to build a Kindle dictionary without Kindlegen, you're going to need to reverse engineer the compiled dictionary format, and then create something that can build it. No easy task!
pdurrant is offline   Reply With Quote
Old 01-11-2014, 05:36 PM   #10
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Thanks, I suspected there is something similar and intimidating .
Unfortunately, kindlgen crashes on 100MB+ html files, Mobipocket eBook Creator dies even on smaller files and mobigen chokes after 300+.
Evidently, Amazon broke something dictionary related in kindlegen (at least related to the source size), and mobigen is the best (although inadequate) tool for this task.
EbokJunkie is offline   Reply With Quote
Old 01-11-2014, 07:00 PM   #11
Doitsu
Wizard
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 1,846
Karma: 4630359
Join Date: Dec 2010
Device: Kindle PW2
Since Mobipocket Creator expects unicode files with byte-order-marks (BOM) and your .html source file doesn't have one, it couldn't hurt to add a BOM to your .html source file, in particular, if it contains non-Latin characters. (KindleGen doesn't require a BOM, but handles utf8 files with a BOM fine.)

Resave your .html source file as a utf8 file with a BOM, empty your Temp folder and execute KindleGen using the following command line:
Code:
KindleGen your.opf > error.log
If it crashes again, have a look at error.log, which should help you identify the line that causes problems. If it doesn't, post the log file here.

If none of the above helps, split your source file into several smaller files. The largest file that I ever compiled was an 80MB utf-16LE source file, which compiled fine with both Mobipocket Creator and KindleGen. Try splitting your source file into several 75 MB files and update the <manifest> and <spine> sections of your .opf accordingly.

Also have a look at the output that your Ruby script creates and check for isolated ampersands (&) or angle brackets (<>) that haven't been escaped as entities (&amp; &lt; &gt; etc.) as these are known to cause problems with many HTML parsers.
For example, having a line such as the following will cause problems:

Code:
<idx:orth>Rock Music > Rock & Roll</idx:orth>

Last edited by Doitsu; 01-12-2014 at 02:11 AM.
Doitsu is offline   Reply With Quote
Old 01-11-2014, 07:51 PM   #12
EbokJunkie
Zealot
EbokJunkie began at the beginning.
 
Posts: 121
Karma: 16
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
Thanks, I'll proceed with caution and check headers for isolated ampersands.
As to splitting source, I usually split source at DSL level and process each part with Ruby script separately. Noticed that mobigen is able to convert at least 300MB html with option C2; this builds 40-50 MB mobi dictionary. However, I have to leave i7 W7 desktop crunching on background for 6-8 hours running a few parts concurrently. TBH, bigger dictionaries may hang Kindle, at least that sometimes happens with PW1.
Added:
Wow!!! Thank you for heads up about Mobipocket Creator and BOM!
302 MB html, no BOM used to crash MPC at the start, now it compiled uncompressed prc in five minutes! Cannot say the same about kindlegen, it still crashed in a minute after start.
MPC+BOM looks like a solutiion. Thanks again.

Last edited by EbokJunkie; 01-11-2014 at 08:13 PM.
EbokJunkie is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CLI vs GUI html2mobi with image - bug? DanielJG Conversion 0 04-23-2012 09:35 AM
Build my own dictionary petervdb PocketBook 1 01-26-2012 08:00 AM
HTML2Mobi and Windows 1252 encoding bizzybody Kindle Formats 0 12-05-2010 08:03 PM
Wikipedia (offline) Dictionary? Available? Feasible? ivanatpr Amazon Kindle 2 10-22-2010 05:39 PM
html2mobi - html formatting brunovg Kindle Formats 2 12-13-2009 05:56 AM


All times are GMT -4. The time now is 07:44 PM.


MobileRead.com is a privately owned, operated and funded community.