View Single Post
Old 06-30-2011, 04:56 AM   #1
Mindtrap
Junior Member
Mindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead communityMindtrap has become a pillar of the MobileRead community
 
Posts: 1
Karma: 15548
Join Date: Jun 2011
Device: Kindle
How I converted an epub dictionary to mobi format

I recently received a series of great translating dictionaries in epub format. Here is how I converted them to a working mobi dictionary for use on the kindle:

- unzip epub file
- convert html files to tab delimited files
- concatenate and sort tab delimited files
- convert tab delimited file to opf format using tab2opf.py
- edit metadata
- convert opf file to mobi file using kindlegen

This resulted in a mobi file that was recognised as a dictionary by the kindle. The non-obvious steps are detailed below.

Convert HTML to tab delimited files
The HTML files in the epub container consist of 2 lines per word: one line like this:
Code:
<p class="ww"><sub>qword</sub></p>
and one like this:
Code:
<p><span class="gag">word</span> ...
I wrote the attached perl script (htmldict2tab.pl) to convert this to the tab delimited format required by tab2opf.py

Convert tab delimited file to opf format using tab2opf.py
I found this python script at http://www.klokan.cz/projects/stardict-lingea/

Edit metadata
Edit the .opf file generated by tab2opf.py and set the title etc. to the desired values. The following tags might be important to get it working:

Code:
<DictionaryInLanguage>en-us</DictionaryInLanguage>
<DictionaryOutLanguage>en-us</DictionaryOutLanguage>
Convert opf file to mobi file using kindlegen
Kindlegen is provided for free by amazon: http://www.amazon.com/gp/feature.htm...cId=1000234621

Everything together
You can use the following (clumsy) script to automate all this a bit more:
Code:
#!/bin/sh
# Argument 1: epub file to convert

TMPDIR=/tmp/convertvandale
MYDIR=`dirname "$0"`;

mkdir $TMPDIR
unzip -d $TMPDIR "$1" 

for i in $TMPDIR/content/*.html; do "$MYDIR/dicthtml2tab.pl" $i >  $i.tab; done
cat $TMPDIR/content/*.tab |sort > $TMPDIR/dictionary.tab
python "$MYDIR/tab2opf.py"
Attached Files
File Type: pl dicthtml2tab.pl (371 Bytes, 620 views)

Last edited by Mindtrap; 06-30-2011 at 07:19 PM.
Mindtrap is offline   Reply With Quote