03-09-2024, 10:16 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Mar 2024
Device: Ipad Pro 12.9, Kobo
|
Issues Converting Translated ZIP Content Back to EPUB in Calibre
Hello everyone,
This is my first thread here, and I'm reaching out for some assistance with Calibre. I've recently begun managing a small catalogue of EPUB files, which has led me to learn and use Calibre for the first time. My workflow involves several steps designed around translating EPUB content. Here's a brief overview: Convert EPUB to ZIP: Using Calibre, I first convert my EPUB files into ZIP format. Unzip and Translate: After unzipping the EPUB, I run a script that translates the content within all the HTML files to another language, utilizing an AI LLM API. Re-zip and Convert Back to EPUB: Once translation is complete, I rezip the files and attempt to convert this ZIP back into an EPUB format using Calibre. The issue arises in the final step. Despite the translated HTML content displaying perfectly in a web browser, the re-converted EPUB file from the ZIP is a total mess. Interestingly, this issue persists even when I try converting the original EPUB to ZIP and then back again, without any modifications. As someone new to Calibre and this process, I'm unsure where the problem lies or how to fix it. Has anyone here dealt with similar conversion challenges or have experience with translating content in EPUB files? Any insights or advice would be greatly appreciated. Thank you in advance for your help! |
03-09-2024, 12:17 PM | #2 | |
Bibliophagist
Posts: 36,873
Karma: 147879470
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
You will need to correct language references in the ePub. If for instance you are translating from English to German, any references to lang="en" or xml:lang="en" would need to be changed to lang="de" or xml:lang="de". The last time I saw an attempt to machine translate an ePub, it also ran into the issue that the translation attempted to translate everything. I.e. <body> was translated to <körper> which is not valid and class= was translated as klasse= which again is not valid. Hopefully, the tools have improved over the last few years so that will not be an issue. You might be better off converting the ePub to a .docx Word document, translating that document and then converting back to ePub. |
|
03-09-2024, 12:35 PM | #3 |
the rook, bossing Never.
Posts: 11,701
Karma: 87663461
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
|
Epub is a zip. just rename it.
However export/conversion to docx, rtf or whatever for translation would be better. Your method risks messing up the epub manifest and css etc, as you've discovered. Also the elephant in the room is the so called AI, the LLM. Either they are rubbish or plagiarising. An epub is simply a zip, but the contents for an epub2 are: HTML files, in order. Each new file causes a page break. The HTML headers ideally import css. The CSS file(s), if any. Bad design if there are not. The font files, if any. Order is irrelevant. The image files, if any. Order is irrelevant. A content.opf which is mandatory. It lists the files and what they do (a manifest). A toc.ncx which is optional. It's the "system" Table of Contents for an app or ereader. Epub3 has other possible files Calibre has an editor which manages the relation between the files. Editing the HTML outside of Calibre is risky. Simply passing the HTML via an API is a disaster as IDs (anchors), imports, classes etc won't be preserved, apart from risk of mangling the HTML tags. It's best to export docx, translate each section/chapter separately (copy / paste only same style blocks with no images), check all links, anchors, headings, etc, save as docx, import to Calibre. What you are doing only works (badly) for web pages. The html files in an ebook are not the same as a standalone web page even though using HTML5. An epub3 is even more frought with disaster to do this. Last edited by Quoth; 03-09-2024 at 12:39 PM. |
03-09-2024, 03:11 PM | #4 | |
null operator (he/him)
Posts: 20,692
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
There are a couple of Word addins you might want to consider: TransTools – Translation productivity tools. e-Book Tools. There is overlap between them, but they have there own strengths and weaknesses, example: TransTools Unbreaker and e-BookTools Dialogue checker are unique to each and invaluable. I have them installed in the desktop version of Word from latest Office 365 with no issues… I keep everything local. BR |
|
03-09-2024, 08:25 PM | #5 |
Junior Member
Posts: 2
Karma: 10
Join Date: Mar 2024
Device: Ipad Pro 12.9, Kobo
|
Thank you all for your insights! I admit my knowledge of ePub is not very deep. I've been troubleshooting based on my workflow, and I'm currently trying to figure out the last step: converting a .zip file back to an .epub. Initially, I assumed that the process I used in Calibre to convert an ePub to a zip file could be simply reversed, but it seems that's not the case.
I attempted to rename my ePub file to .zip, but that approach didn't work. Regarding the LLM translation, that part went smoothly and without errors. I've completed a script segment that employs BeautifulSoup to parse individual HTML files, extracting content from specific tags. The content of the book was within three <div> classes, so the script needed to fetch the content from those specified classes, pass it through the LLM, and use the output to replace the original HTML content. I've been using Google Gemini for this, and it's quite remarkable—it didn't alter any HTML tags, and the formatting remained unchanged when viewing the HTML files. I've uploaded the translated HTML "website" to GitHub as a demonstration of this part of the process working. You can view it here: https://gaunc1.github.io/brobromybookishere/. |
03-09-2024, 10:03 PM | #6 |
null operator (he/him)
Posts: 20,692
Karma: 26966376
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Trouble with Korean font when converting ZIP to EPUB | junj | Conversion | 2 | 04-11-2021 10:55 PM |
Problem converting from zip to epub. | nstock | Conversion | 2 | 10-31-2017 03:18 AM |
Error message when converting from ZIP to ePub | luthar28 | Conversion | 2 | 05-24-2011 01:04 PM |
Conversion error when converting zip to epub | siebert | Conversion | 2 | 02-27-2011 11:40 AM |