MobileRead Forums - View Single Post - Issues Converting Translated ZIP Content Back to EPUB in Calibre

DNSB · 03-09-2024, 01:17 PM

Quote:

Originally Posted by Gaunc

Hello everyone,

This is my first thread here, and I'm reaching out for some assistance with Calibre. I've recently begun managing a small catalogue of EPUB files, which has led me to learn and use Calibre for the first time.

My workflow involves several steps designed around translating EPUB content. Here's a brief overview:

Convert EPUB to ZIP: Using Calibre, I first convert my EPUB files into ZIP format.
Unzip and Translate: After unzipping the EPUB, I run a script that translates the content within all the HTML files to another language, utilizing an AI LLM API.
Re-zip and Convert Back to EPUB: Once translation is complete, I rezip the files and attempt to convert this ZIP back into an EPUB format using Calibre.
The issue arises in the final step. Despite the translated HTML content displaying perfectly in a web browser, the re-converted EPUB file from the ZIP is a total mess. Interestingly, this issue persists even when I try converting the original EPUB to ZIP and then back again, without any modifications.

As someone new to Calibre and this process, I'm unsure where the problem lies or how to fix it. Has anyone here dealt with similar conversion challenges or have experience with translating content in EPUB files? Any insights or advice would be greatly appreciated.

Thank you in advance for your help!

For what it may be worth, an ePub is a zip container so no need to convert to zip. You do have to maintain the structure of the container so links point to the correct locations. I would suggest unzipping the epub into a directory, trying your translation on the html/xhtml files only and then rezipping the directory contents. One special note is that the mimetype file must be in the root of the .zip container and must be stored with no compression.

You will need to correct language references in the ePub. If for instance you are translating from English to German, any references to lang="en" or xml:lang="en" would need to be changed to lang="de" or xml:lang="de".

The last time I saw an attempt to machine translate an ePub, it also ran into the issue that the translation attempted to translate everything. I.e. <body> was translated to <körper> which is not valid and class= was translated as klasse= which again is not valid. Hopefully, the tools have improved over the last few years so that will not be an issue.

You might be better off converting the ePub to a .docx Word document, translating that document and then converting back to ePub.