View Single Post
Old 07-30-2023, 03:22 PM   #3
Legrumsx
Junior Member
Legrumsx began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jul 2023
Device: Kobo Sage
If anyone is interest, I did manage the conversion to Stardict format to use with KOreader, which I found way better.
I first converted the apple dictionaries to the Stardict XML format using pyglossary. This format is human readable and I could see some things which were suspicious compared to working Wiki dictionaries.
I wrote this Python script to do some clean-up:
Code:
import sys
import re
import fileinput
import shutil
import os

def replace_in_xml(xml_file_path):
    # Define the regular expression patterns and replacement strings
    patterns_to_replace = {
        r'<d:': '<',
        r'</d:': '</',
        r'soundFile="(.+?)" ': ' ',
        r'soundFile="(.+?)">': '>',
        r'source="(.+?)" ': ' ',
        r'source="(.+?)">': '>',
        r'<!DOCTYPE html><html><head><link rel="stylesheet" href="style.css"></head>': '',
        r' href="(.+?)"': ' ',
        r'<span d:': '<span '
        # Add more patterns and replacements as needed
    }

    try:
        # Create a temporary file for writing the updated content
        with open(xml_file_path + '.tmp', mode='w', encoding='utf-8') as tmp_file:
            with fileinput.FileInput(xml_file_path, openhook=fileinput.hook_encoded("utf-8")) as file:
                for line in file:
                    for pattern, replacement in patterns_to_replace.items():
                        line = re.sub(pattern, replacement, line)
                    tmp_file.write(line)

        # Replace the original file with the updated content
        shutil.move(xml_file_path + '.tmp', xml_file_path)

    except FileNotFoundError:
        print(f"Error: File '{xml_file_path}' not found.")
    except Exception as e:
        print(f"Error occurred: {str(e)}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python cleanse_apple_xml.py <xml_file_path>")
    else:
        xml_file_path = sys.argv[1]
        replace_in_xml(xml_file_path)

And then I converted the files to the Stardict ifo format.
For the formatting, I took the css from apple and did some clean up with various tools.

Here a link to the CSS file (there are still some quirks), it should be in the same folder as the .ifo and have the same name.
https://pastebin.com/u2puYFRZ
Legrumsx is offline   Reply With Quote