View Single Post
Old 08-18-2022, 06:04 AM   #160
Markismus
Guru
Markismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicingMarkismus causes much rejoicing
 
Markismus's Avatar
 
Posts: 959
Karma: 149907
Join Date: Jul 2013
Location: Rotterdam
Device: HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura
@furmirek KindleUnpack, the code used for unpacking mobi- and awz-files, generates the result: rawml-file found. Mobi-file already converted, but KindleUnpack failed to convert it to html.
Then my code gives the rawml-file a try and fails: Will try at the rawml-file, but don't get your hopes up!
Dictionary name is 'English Polish Dictionary'.
...
Converting indexentries from RAWML to XDXF. This will take some time. 20220818 11:43:38
Done at 20220818 11:43:38
Not able to handle the rawml-file. Quitting!


You could write a dedicated conversion in sub convertRAWML2XDXF{} and I would happily merge it with the code.

Looking at a snipped of rawml after adding a newline character after every tag and indenting:
Code:
 
<h2>
    -</h2>
<dl>
    <div height="0">
    </div>
    <dt>
        <b>
            symb.</b>
    </dt>
    <div height="0">
    </div>
    <div height="0">
    </div>
    <dd>
        <blockquote>
            &bull; Used in a compound term when the constituent parts are already hyphenated.
    </dd>
    <div height="0">
    </div>
    <div height="0">
    </div>
    <dd>
        <blockquote>
            &bull; Used to hide letters
    </dd>
    <div height="0">
    </div>
    <div height="0">
    </div>
    <dd>
        <blockquote>
            &bull; Used to connect compound terms with the sense of &quot;to&quot; or to show a relationship
    </dd>
    <div height="0">
    </div>
</dl>
<p>
    <a href="http://en.wiktionary.org/wiki/-">
        <font size="3">
            en&sup2;
    </a>
</p>
<hr />
You can reduce it to:
Code:
<h2> ... </h2>
<dl> ...</dl>
<p> ... </p>
<hr />
Every keyword-meaning pair is separted by a <hr />-tag. So the keyword is in a <h2>-block, and the meaning is given in the <dl>-block. You could include the hyperreference to wikipedia in the <p>-block in the meaning if you want.

Last edited by Markismus; 08-18-2022 at 06:10 AM.
Markismus is offline   Reply With Quote