MobileRead Forums - View Single Post - How I converted an epub dictionary to mobi format

Mindtrap · 06-30-2011, 04:56 AM

I recently received a series of great translating dictionaries in epub format. Here is how I converted them to a working mobi dictionary for use on the kindle:

- unzip epub file
- convert html files to tab delimited files
- concatenate and sort tab delimited files
- convert tab delimited file to opf format using tab2opf.py
- edit metadata
- convert opf file to mobi file using kindlegen

This resulted in a mobi file that was recognised as a dictionary by the kindle. The non-obvious steps are detailed below.

Convert HTML to tab delimited files
The HTML files in the epub container consist of 2 lines per word: one line like this:

Code:

<p class="ww"><sub>qword</sub></p>

and one like this:

Code:

<p><span class="gag">word</span> ...

I wrote the attached perl script (htmldict2tab.pl) to convert this to the tab delimited format required by tab2opf.py

Convert tab delimited file to opf format using tab2opf.py
I found this python script at http://www.klokan.cz/projects/stardict-lingea/

Edit metadata
Edit the .opf file generated by tab2opf.py and set the title etc. to the desired values. The following tags might be important to get it working:

Code:

<DictionaryInLanguage>en-us</DictionaryInLanguage>
<DictionaryOutLanguage>en-us</DictionaryOutLanguage>

Convert opf file to mobi file using kindlegen
Kindlegen is provided for free by amazon: http://www.amazon.com/gp/feature.htm...cId=1000234621

Everything together
You can use the following (clumsy) script to automate all this a bit more:

Code:

#!/bin/sh
# Argument 1: epub file to convert

TMPDIR=/tmp/convertvandale
MYDIR=`dirname "$0"`;

mkdir $TMPDIR
unzip -d $TMPDIR "$1" 

for i in $TMPDIR/content/*.html; do "$MYDIR/dicthtml2tab.pl" $i >  $i.tab; done
cat $TMPDIR/content/*.tab |sort > $TMPDIR/dictionary.tab
python "$MYDIR/tab2opf.py"