And these are probably a bit better, I used --disable-trim and the following substitution rules to kill all tags but <U> (used for grammar), <M> (sillabation, phonetic) <L> (resolve abbreviation) and <F> (acception)
Code:
# remove most of the tags but not all from lingoes hebrew dictionaries
cat $1 |\
sed -e "s|<U>|<br\><i>|g" \
-e "s|</U>|</i> |g" \
-e "s|<M>| <b>|g" \
-e "s|</M>|</b> |g" \
-e "s|<F>|<br\>‣|g" \
-e "s|</F>| |g" \
-e "s|<L>| •|g" \
-e "s|</L>| |g" \
-e "s|<[/]*[NCIŅ]>||g" \
-e "s|<H>|<span>|g" \
-e 's|<H J="rtl">|<span dir="rtl">|g' \
-e 's|<H J="rtl" />||g' \
-e 's|<H />||g' \
-e "s|</H>|</span>|g" \
-e 's|\\\"|״|g' \
-e "s|>>|←|g" \
> $2
ETA: Still within the lines of plain substitutions and not a real XML parser, I've improved a little the treatment of tags, for a better formatting in KR, and caused pyglossary to recognise correctly the dictionary languages. The remaining issue I see is a bidi one, parentheses around Hebrew text in the Hebrew-English dictionary are misplaced, even if they fall inside a <span dir="rtl"></span>