View Single Post
Old 01-22-2018, 09:12 AM   #7
cryptocoryne
Member
cryptocoryne began at the beginning.
 
cryptocoryne's Avatar
 
Posts: 19
Karma: 10
Join Date: Jul 2014
Location: stupidville, Florida
Device: DXG (B009), K4 (9023), PW1 (B024), KT2 (90C6)
SOLVED:
I got the stardict-tools (GUI and command line tools) from Arch user repos.

To remove HTML and make a nice looking, Duokan-compatible dictionary:
  • Install stardict-tools-git from AUR.
  • "Decompile" your trio of stardict files (whatever.dict, whatever.idx, whatever.ifo) to a "tabfile" using stardict-editor GUI. (Select the .ifo file.) This will give you a text file whatever.tab
  • Use sed, python, whatever script to remove HTML crap from this file. Search around for a regex to strip HTML tags if you don't want to think about it. For example:
    Code:
    sed -e 's/<[^>]*>//g' whatever.tab
    Afterward, you might want to search and remove/replace in a text editor nonstandard(?) stuff like &apos; and &quot;
  • Run
    Code:
    stardict-tabfile whatever.tab
    to generate the new dict, idx, ifo files. You can open the ifo file in a text editor to customize the dictionary name. Leave the other lines alone.

Last edited by cryptocoryne; 01-22-2018 at 09:22 AM.
cryptocoryne is offline   Reply With Quote