01-11-2014, 01:36 PM | #1 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
Is it feasible to build dictionary with opf2mobi/html2mobi?
I'm trying to build a mobi dictionary using mobiperl (with a view to building big dictionaries that crash mobigen).
Unfortunately, Kindle doesn't recognize my dictionaries as dictionaries, no index is available. Mobigen builds my test dictionary (just two entries) without problems so I used MobiMetaEdotor to copy all relevant EXTH records to mobiperl created mobi, but to no avail. Tried to explode two dictionaries with KindleUnpack and to compare resulting html files but didn't notice considerable differences. Please take a look at attached files and advise if building dictionaries with mobiperl makes sense. Last edited by EbokJunkie; 01-11-2014 at 01:47 PM. |
01-11-2014, 02:51 PM | #2 | |||
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Quote:
Quote:
Did you compare the .opf file of the dictionary that doesn't work with the one that does to ensure that it has all the entries required for dictionaries? KindleUnpack only has rudimentary support for dictionaries; for example, it cannot reverse-engineer inflections. |
|||
01-11-2014, 03:03 PM | #3 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
>Did you compare the .opf file of the dictionary that doesn't work with the one that does
>to ensure that it has all the entries required for dictionaries? I'm ising the same opf for both mobigen and opf2mobi processing. |
01-11-2014, 03:10 PM | #4 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Then there's obviously something wrong with the entry definitions. Why don't you post two entries from the file that doesn't work?
|
01-11-2014, 03:20 PM | #5 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
Sorry, I don't understand. I posted source html file with all entries.
|
01-11-2014, 03:51 PM | #6 |
The Grand Mouse 高貴的老鼠
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
|
01-11-2014, 04:17 PM | #7 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
pdurrant
Thank you but I know this. I create dictionaries by hand in open text format called Lingvo DSL format and convert to html/opf using open source Rubi script dsl2mobi. This script always creates valid html and opf files suitable for subsequent building machine-searchable dictionaries. My upload is an html file created by dsl2mobi from short two-entry dsl file. In works perfectly well with mobigen and kindlegen 2.9. |
01-11-2014, 05:00 PM | #8 |
The Grand Mouse 高貴的老鼠
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Oh, I see. Sorry — I read your original post too quickly.
I suspect that the only way to get to the bottom of this will be to examine the output very carefully indeed. KindleUnpack might not output sufficient info - you may need to delve into it with a Hex Editor. I'll take a quick look at what you uploaded. |
01-11-2014, 05:06 PM | #9 |
The Grand Mouse 高貴的老鼠
Posts: 71,506
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
OK, clearly mobiperl isn't going to do what you want. And you won't spot differences in the HTML as that's not really the important bit. If you use MobiUnpack with the appropriate flags to dump everything, you'll see that in your example from mobiperl you just have the Mobipocket header, html, ncx and opf sections in your generated file.
In the one from Kindlegen you also four INDX sections, a FCIS section, a FLIS section, and three tiny unknown sections. If you want to build a Kindle dictionary without Kindlegen, you're going to need to reverse engineer the compiled dictionary format, and then create something that can build it. No easy task! |
01-11-2014, 05:36 PM | #10 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
Thanks, I suspected there is something similar and intimidating .
Unfortunately, kindlgen crashes on 100MB+ html files, Mobipocket eBook Creator dies even on smaller files and mobigen chokes after 300+. Evidently, Amazon broke something dictionary related in kindlegen (at least related to the source size), and mobigen is the best (although inadequate) tool for this task. |
01-11-2014, 07:00 PM | #11 |
Grand Sorcerer
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Since Mobipocket Creator expects unicode files with byte-order-marks (BOM) and your .html source file doesn't have one, it couldn't hurt to add a BOM to your .html source file, in particular, if it contains non-Latin characters. (KindleGen doesn't require a BOM, but handles utf8 files with a BOM fine.)
Resave your .html source file as a utf8 file with a BOM, empty your Temp folder and execute KindleGen using the following command line: Code:
KindleGen your.opf > error.log If none of the above helps, split your source file into several smaller files. The largest file that I ever compiled was an 80MB utf-16LE source file, which compiled fine with both Mobipocket Creator and KindleGen. Try splitting your source file into several 75 MB files and update the <manifest> and <spine> sections of your .opf accordingly. Also have a look at the output that your Ruby script creates and check for isolated ampersands (&) or angle brackets (<>) that haven't been escaped as entities (& < > etc.) as these are known to cause problems with many HTML parsers. For example, having a line such as the following will cause problems: Code:
<idx:orth>Rock Music > Rock & Roll</idx:orth> Last edited by Doitsu; 01-12-2014 at 02:11 AM. |
01-11-2014, 07:51 PM | #12 |
Addict
Posts: 229
Karma: 13495
Join Date: Feb 2009
Location: SoCal
Device: Kindle 3, Kindle PW, Pocketbook 301+, Pocketbook Touch, Sony 950, 350
|
Thanks, I'll proceed with caution and check headers for isolated ampersands.
As to splitting source, I usually split source at DSL level and process each part with Ruby script separately. Noticed that mobigen is able to convert at least 300MB html with option C2; this builds 40-50 MB mobi dictionary. However, I have to leave i7 W7 desktop crunching on background for 6-8 hours running a few parts concurrently. TBH, bigger dictionaries may hang Kindle, at least that sometimes happens with PW1. Added: Wow!!! Thank you for heads up about Mobipocket Creator and BOM! 302 MB html, no BOM used to crash MPC at the start, now it compiled uncompressed prc in five minutes! Cannot say the same about kindlegen, it still crashed in a minute after start. MPC+BOM looks like a solutiion. Thanks again. Last edited by EbokJunkie; 01-11-2014 at 08:13 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CLI vs GUI html2mobi with image - bug? | DanielJG | Conversion | 0 | 04-23-2012 09:35 AM |
Build my own dictionary | petervdb | PocketBook | 1 | 01-26-2012 08:00 AM |
HTML2Mobi and Windows 1252 encoding | bizzybody | Kindle Formats | 0 | 12-05-2010 08:03 PM |
Wikipedia (offline) Dictionary? Available? Feasible? | ivanatpr | Amazon Kindle | 2 | 10-22-2010 05:39 PM |
html2mobi - html formatting | brunovg | Kindle Formats | 2 | 12-13-2009 05:56 AM |