Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 09-01-2014, 04:36 PM   #961
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Doitsu,
It is not syntax related. It is related to how many different inflection rules are needed.
So languages that make heavy use of prefixes and suffixes with a large number of rules means one entire mobi section can't hold enough and multiple mobi sections are needed.

So I will need access to one of the broken dictionaries to figure this out.

Same thing for Japanese.

PM me to figure out how we can arrange these test cases. I should be back in 3 days or so.

Take care,

Kevin
KevinH is offline   Reply With Quote
Old 09-06-2014, 05:23 AM   #962
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
I've been testing the code you post with some dictionaries.

I've one strange case, before the sourcecode changes the dictionary unpacks but orth values where like encrypted. Also the struct is rare, the definition goes after the </idx:entry>
With the new code dictionary fails to unpack.

Before it's for "love" word:

Code:
<idx:entry>
<idx:orth value="owh">
</idx:entry>
<h2><b>love </b>

It fails to unpack with new code

Code:
Parsing dictionary index data 26074
ocnt 0, oentries 0, op1 0, op2 0, otagx 0
parsed INDX header:
len C0 nul1 0 type 1 gen 0 start E35C count C4C code FFFFFFFF lng FFFFFFFF total 0 ordt 0 ligt 0 nligt 0 nctoc 0
{'count': 3148, 'nctoc': 0, 'code': 4294967295L, 'nul1': 0, 'len': 192, 'ligt': 0, 'start': 58204, 'nligt': 0, 'ordt': 0, 'lng': 4294967295L, 'total': 0, 'type': 1, 'gen': 0} None None
Error: unpack requires a string argument of length 0


Error: Unpacking Failed

Testing with a big dictionary it fails before and after, probably to much references or something similar.

With another one, it works before and after, but not inflected forms. It has multiple indx. And also strange structure like the first dictionary

Code:
<idx:entry>
<idx:orth value="love">
</idx:entry>
<div><a id="filepos62042024" />
The idx entry it's closed before the definition html code


Also tested with wordnet3 free dictionary english-spanish. Worked fine before and after. Also inflexions and structure ok, but i suppose this one was generated with mobipocket.
elchamaco is offline   Reply With Quote
Old 09-06-2014, 05:53 AM   #963
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by elchamaco View Post
I've one strange case, before the sourcecode changes the dictionary unpacks but orth values where like encrypted. Also the struct is rare, the definition goes after the </idx:entry>
With the new code dictionary fails to unpack.
If you correctly apply all the mobi_dict.py code updates suggested by KevinH in this thread both the UK and US monolingual dictionaries decompile with correct idx:orth entries (but without inflections).

(I had problems with some files because I didn't indent the code correctly.)

BTW, English inflections are freely available as part of Kevin Atkinson's AGID project.
Doitsu is offline   Reply With Quote
Old 09-06-2014, 04:10 PM   #964
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

If you work with dictionaries, please unzip the attached mobi_dict.py.zip file and use it to replace its namesake in KindleUnpack_v073/lib/ and let me know of any successes and failures. I had a hunch but have too small a set of dictionaries to know if my hunch is correct or not. If so, we may have these things fully unpacking, If not, I am stuck because there are two types of ORDT tables and my hunch as to how to decide which to use will have been wrong. I have my fingers crossed.

FWIW, This version seems to work with all dictionaries I have ... German, French, Sven, Collins, American English, SampleDict, ja_dict, Liddell, etc and even seems to work with multiple inflection sections (but I am not sure if correctly or not as I have no source for most of those).

I hope this does the trick ...

KevinH
Attached Files
File Type: zip mobi_dict.py.zip (4.0 KB, 181 views)
KevinH is offline   Reply With Quote
Old 09-07-2014, 04:28 AM   #965
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
I've tested:

Case 1: Encrypted values. Now works fine, also with inflexions before this new code it went without it, Wow!
Case 2: Huge dictionary or something similar: Fails same error (I send you so you can figure what's happening)

Code:
{'count': 2316, 'nctoc': 0, 'code': 4294967295L, 'nul1': 0, 'oentries': 0, 'len': 192, 'ligt': 0, 'start': 43964, 'otype': 0, 'nligt': 0, 'ordt': 0, 'lng': 4294967295L, 'total': 0, 'type': 1, 'gen': 0} None None
Find link anchors
Insert data into html
Insert hrefs into html
Remove empty anchors from html
Insert image references into html
Error: 


Error: Unpacking Failed
Case 3: Works fine without inflexions
Case 4: Works fine

Final result, except the dictionary that failed now works well in all. And ine case without inflected forms.

elchamaco is offline   Reply With Quote
Old 09-07-2014, 07:07 AM   #966
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
FWIW, This version seems to work with all dictionaries I have ... German, French, Sven, Collins, American English, SampleDict, ja_dict, Liddell, etc and even seems to work with multiple inflection sections (but I am not sure if correctly or not as I have no source for most of those).
This version is definitely a quantum leap from previous versions. I've tested it with a couple of commercial dictionaries and a home-made Arabic-English dictionary and it performed pretty much flawlessly.

                                                                                                                 

For good measure I've also tested it with the default Kindle app dictionaries. It worked with most of them, except for the Spanish dictionary (B005F12G7O_EBOK.azw) and the two Chinese dictionaries (B00AZOHEFU_EBOK.azw & B00AZOHEGE_EBOK.azw).

For some odd reason KindleUnpack apparently assumed that the Chinese dictionaries are mobi files with attached source files, because it tried to extract a build log.

Code:
File contains kindlegen build log, extracting as kindlegenbuild.log
Unpacking raw markup language
Write ncx
Info: Document contains orthographic index, handle as dictionary Error: 

Error: Unpacking Failed
The unpacking of the Spanish dictionary (B005F12G7O_EBOK.azw) failed with this message:

Code:
Info: Document contains orthographic index, handle as dictionary 
Error: Dictionary contains multiple inflection index sections, which is not yet supported
inflectionTagTable: [(5, 1, 3, 0), (26, 1, 12, 0), (27, 1, 48, 0), (0, 0, 0, 1)]
Error: 

Error: Unpacking Failed
All three dictionaries unpacked with v073, however, the idx:orth values contained non-printable characters.


I used the wrong version for the test.

Last edited by Doitsu; 09-07-2014 at 10:08 AM.
Doitsu is offline   Reply With Quote
Old 09-07-2014, 09:46 AM   #967
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Are you sure you used the latest version of the mobi_dict.py I just posted with the Spanish dictionary? This error message should no longer be present in the new mobi_dict.py.

Error: Dictionary contains multiple inflection ..

Thanks,

Kevin
KevinH is offline   Reply With Quote
Old 09-07-2014, 10:07 AM   #968
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
Are you sure you used the latest version of the mobi_dict.py I just posted with the Spanish dictionary? This error message should no longer be present in the new mobi_dict.py.
I'm sorry, I've Kindleunpack installed on two machines. I did a quick test yesterday and accidentally used the older version on my other machine for the monolingual dictionaries test today.

The Spanish dictionary decompiled fine.
Doitsu is offline   Reply With Quote
Old 09-07-2014, 02:14 PM   #969
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi elchamaco:

Quote:
Originally Posted by elchamaco View Post
I've tested:
Case 2: Huge dictionary or something similar: Fails same error (I send you so you can figure what's happening)
Your test case succeeded with no problems on my machine? Perhaps you ran out of disk space or memory? Seems to unpack fine with 64 bit python 2.7.8.

Quote:
Case 3: Works fine without inflexions
The actual error was:

Error: Dictionary uses obsolete inflection rule scheme which is not yet supported

I am sorry but there is no chance I can decode obsolete inflection rules with a sample of one and no source. You will have to live with that failure.

Take care,

KevinH
KevinH is offline   Reply With Quote
Old 09-07-2014, 02:15 PM   #970
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Doitsu,

I am now confused. Do the two chinese dictionaries work with the current version or do that still need some work to decode?

Thanks,

Kevin

Quote:
Originally Posted by Doitsu View Post
I'm sorry, I've Kindleunpack installed on two machines. I did a quick test yesterday and accidentally used the older version on my other machine for the monolingual dictionaries test today.

The Spanish dictionary decompiled fine.
KevinH is offline   Reply With Quote
Old 09-07-2014, 06:00 PM   #971
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by KevinH View Post
I am now confused. Do the two chinese dictionaries work with the current version or do that still need some work to decode.
I'm sorry for not being clearer. Both Chinese dictionaries also decompiled fine. The only odd thing about them is that they both contained attached kindlegen.log files. But this is merely a curiosity.

BTW, the Xian Dai Han Yu Ci Dian dictionary (B00AZOHEFU_EBOK.azw) source files apparently contain several syntax errors according to the kindlegen.log file (idx:entry definitions without idx:orth parameters, unresolved hyperlinks etc.).
Doitsu is offline   Reply With Quote
Old 09-08-2014, 07:48 AM   #972
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi Doitsu,

Sounds good. I will disable all of the debug output and add it into the material for the next release.

Take care,

KevinH
KevinH is offline   Reply With Quote
Old 09-08-2014, 02:24 PM   #973
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
It's strange, i freed some space now i've 20 GB and it's a 6 GB RAM computer, and still it's unable to unpack. I've python 2.7.3, so i updated to 2.7.8, and still fails (Win7 64 bits). I'll try other day with other computer.

And the other thing, it's normal the definition it's not included in the <idx:entry> tag in some dictionaries.
elchamaco is offline   Reply With Quote
Old 09-08-2014, 07:07 PM   #974
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi elchamaco,

Are you testing with the exact same dictionary you posted for me? Are you using KindleUnpack GUI, or Calibre or running kindleunpack.py directly from the command line? I tested by running directly from the command line. Please give that a try.

KevinH

Last edited by KevinH; 09-08-2014 at 10:40 PM.
KevinH is offline   Reply With Quote
Old 09-09-2014, 02:42 PM   #975
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
It's strange, i tried command line, i was using gui version kindleunpack.pyw, but still gives error. this is the error in the command line

Code:
Traceback (most recent call last):
  File "E:\pru\lib\kindleunpack.py", line 936, in <module>
    sys.exit(main())
  File "E:\pru\lib\kindleunpack.py", line 925, in main
    unpackBook(infile, outdir, apnxfile, epubver, use_hd)
  File "E:\pru\lib\kindleunpack.py", line 840, in unpackBook
    process_all_mobi_headers(files, apnxfile, sect, mhlst, K8Boundary, False, epubver, use_hd)
  File "E:\pru\lib\kindleunpack.py", line 763, in process_all_mobi_headers
    processMobi7(mh, metadata, sect, files, imgnames)
  File "E:\pru\lib\kindleunpack.py", line 583, in processMobi7
    srctext, usedmap = proc.insertHREFS()
  File "E:\pru\lib\mobi_html.py", line 101, in insertHREFS
    srctext = srctext[0:12]+'<meta http-equiv="content-type" content="text/html; charset='+metadata.get('Codec')[0]+'" />'+srctext[12:]
MemoryError
Perhaps it needs a 8 GB computer, tomorrow i'll try with other computer with 8 GB.
elchamaco is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can i rotate text and insert images in Mobi and EPUB? JanGLi Kindle Formats 5 02-02-2013 04:16 PM
PDF to Mobi with text and images pocketsprocket Kindle Formats 7 05-21-2012 07:06 AM
Mobi files - images DWC Introduce Yourself 5 07-06-2011 01:43 AM
pdf to mobi... creating images rather than text Dumhed Calibre 5 11-06-2010 12:08 PM
Transfer of images on text files anirudh215 PDF 2 06-22-2009 09:28 AM


All times are GMT -4. The time now is 05:59 AM.


MobileRead.com is a privately owned, operated and funded community.