View Single Post
Old 12-01-2019, 05:05 PM   #2
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 897
Karma: 149877
Join Date: Jul 2013
Location: Netherlands
Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
I spend yesterday trying to guess to restrictions of the pocketbooks dictionary converter.exe* to get the whole of the Oxford Dictionary 2nd Edition into dic-format. Oxford dictionary has entries up to 115k characters, so it not odd converter.exe crashes, just irritating. Duden (de-de) en Oxford Learners Dictionary 8th Ed. (en-en) work with a little tweaking of the xdxf-files.**

Wish I had a clue of that format so I could skip the program converter.exe: The Perl script already runs up to 250 lines!
Does anyone have or know a link to the source code of converter.exe? Does anyone know the format of pocketbook's dic-format, so I can generate it straight from xdxf- or cvs-format?

The restrictions known of converter.exe are
  1. A line should not be >4096 bytes. It cuts the line after this length and messages that the XML is missing closing tags.
  2. If '&' or '>' are found in the XML content outside of tags, etc., it quits and messages about malformed XML.
  3. If an dictionary entry definition, a block enclosed by <def> and </def> tags exceeds 100kB it crashes without messaging. (103916 bytes works, but 104992 bytes already crashes. )***

Possible resolutions are:
  1. Split the dictionary entry at the tags or use something like prettify, auto-ident.
  2. '&' and '<' should be replaced with '&amp' and '&lt'.
  3. I can resolve this by splitting an entry in multiple entries with identical lemma's.

If someone has tinkered with this before and has pointers for me, I would be much obliged.

____________________________________
* I used DictionaryConverter-neu 171109. Search this forum or look here for more info.
** For the conversion of dictionaries to xdxf-format I used linguae. Search this forum or look here for more info.
*** This is different from @Rkomar's post that states that he converted a dictionary with 33283 lines. It seems to be the limit on one dictionary entry.


EDIT:
I just removed all the lines>4096 bytes. The result was:
Loading collates...
Loading morphems...
Loading keyboard...
Loading dictionary file...
140407 words loaded
Sorting dictionary...
Searching for equal words...
Packing dictionary...

maximum block count reached

So it doesn't crash anymore, however, it still can't pack it.
It is slightly larger than Rkomar's claim of 33283 lines: 1,185,340 lines. That's why I wanted it! Maybe if I make the dictionary instead of in the 2 parts that it is now for Stardict in 6 parts for Pocketbook.....crappy

Last edited by Markismus; 11-16-2021 at 02:32 PM.
Markismus is offline   Reply With Quote