If anyone is interest, I did manage the conversion to Stardict format to use with KOreader, which I found way better.
I first converted the apple dictionaries to the Stardict XML format using pyglossary. This format is human readable and I could see some things which were suspicious compared to working Wiki dictionaries.
I wrote this Python script to do some clean-up:
Code:
import sys
import re
import fileinput
import shutil
import os
def replace_in_xml(xml_file_path):
# Define the regular expression patterns and replacement strings
patterns_to_replace = {
r'<d:': '<',
r'</d:': '</',
r'soundFile="(.+?)" ': ' ',
r'soundFile="(.+?)">': '>',
r'source="(.+?)" ': ' ',
r'source="(.+?)">': '>',
r'<!DOCTYPE html><html><head><link rel="stylesheet" href="style.css"></head>': '',
r' href="(.+?)"': ' ',
r'<span d:': '<span '
# Add more patterns and replacements as needed
}
try:
# Create a temporary file for writing the updated content
with open(xml_file_path + '.tmp', mode='w', encoding='utf-8') as tmp_file:
with fileinput.FileInput(xml_file_path, openhook=fileinput.hook_encoded("utf-8")) as file:
for line in file:
for pattern, replacement in patterns_to_replace.items():
line = re.sub(pattern, replacement, line)
tmp_file.write(line)
# Replace the original file with the updated content
shutil.move(xml_file_path + '.tmp', xml_file_path)
except FileNotFoundError:
print(f"Error: File '{xml_file_path}' not found.")
except Exception as e:
print(f"Error occurred: {str(e)}")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python cleanse_apple_xml.py <xml_file_path>")
else:
xml_file_path = sys.argv[1]
replace_in_xml(xml_file_path)
And then I converted the files to the Stardict ifo format.
For the formatting, I took the css from apple and did some clean up with various tools.
Here a link to the CSS file (there are still some quirks), it should be in the same folder as the .ifo and have the same name.
https://pastebin.com/u2puYFRZ