I recently received a series of great translating dictionaries in epub format. Here is how I converted them to a working mobi dictionary for use on the kindle:
- unzip epub file
- convert html files to tab delimited files
- concatenate and sort tab delimited files
- convert tab delimited file to opf format using tab2opf.py
- edit metadata
- convert opf file to mobi file using kindlegen
This resulted in a mobi file that was recognised as a dictionary by the kindle. The non-obvious steps are detailed below.
Convert HTML to tab delimited files
The HTML files in the epub container consist of 2 lines per word: one line like this:
Code:
<p class="ww"><sub>qword</sub></p>
and one like this:
Code:
<p><span class="gag">word</span> ...
I wrote the attached perl script (htmldict2tab.pl) to convert this to the tab delimited format required by tab2opf.py
Convert tab delimited file to opf format using tab2opf.py
I found this python script at
http://www.klokan.cz/projects/stardict-lingea/
Edit metadata
Edit the .opf file generated by tab2opf.py and set the title etc. to the desired values. The following tags might be important to get it working:
Code:
<DictionaryInLanguage>en-us</DictionaryInLanguage>
<DictionaryOutLanguage>en-us</DictionaryOutLanguage>
Convert opf file to mobi file using kindlegen
Kindlegen is provided for free by amazon:
http://www.amazon.com/gp/feature.htm...cId=1000234621
Everything together
You can use the following (clumsy) script to automate all this a bit more:
Code:
#!/bin/sh
# Argument 1: epub file to convert
TMPDIR=/tmp/convertvandale
MYDIR=`dirname "$0"`;
mkdir $TMPDIR
unzip -d $TMPDIR "$1"
for i in $TMPDIR/content/*.html; do "$MYDIR/dicthtml2tab.pl" $i > $i.tab; done
cat $TMPDIR/content/*.tab |sort > $TMPDIR/dictionary.tab
python "$MYDIR/tab2opf.py"