Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-09-2024, 10:16 AM   #1
Gaunc
Junior Member
Gaunc began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2024
Device: Ipad Pro 12.9, Kobo
Question Issues Converting Translated ZIP Content Back to EPUB in Calibre

Hello everyone,

This is my first thread here, and I'm reaching out for some assistance with Calibre. I've recently begun managing a small catalogue of EPUB files, which has led me to learn and use Calibre for the first time.

My workflow involves several steps designed around translating EPUB content. Here's a brief overview:

Convert EPUB to ZIP: Using Calibre, I first convert my EPUB files into ZIP format.
Unzip and Translate: After unzipping the EPUB, I run a script that translates the content within all the HTML files to another language, utilizing an AI LLM API.
Re-zip and Convert Back to EPUB: Once translation is complete, I rezip the files and attempt to convert this ZIP back into an EPUB format using Calibre.
The issue arises in the final step. Despite the translated HTML content displaying perfectly in a web browser, the re-converted EPUB file from the ZIP is a total mess. Interestingly, this issue persists even when I try converting the original EPUB to ZIP and then back again, without any modifications.

As someone new to Calibre and this process, I'm unsure where the problem lies or how to fix it. Has anyone here dealt with similar conversion challenges or have experience with translating content in EPUB files? Any insights or advice would be greatly appreciated.

Thank you in advance for your help!
Gaunc is offline   Reply With Quote
Old 03-09-2024, 12:17 PM   #2
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,464
Karma: 145525534
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Gaunc View Post
Hello everyone,

This is my first thread here, and I'm reaching out for some assistance with Calibre. I've recently begun managing a small catalogue of EPUB files, which has led me to learn and use Calibre for the first time.

My workflow involves several steps designed around translating EPUB content. Here's a brief overview:

Convert EPUB to ZIP: Using Calibre, I first convert my EPUB files into ZIP format.
Unzip and Translate: After unzipping the EPUB, I run a script that translates the content within all the HTML files to another language, utilizing an AI LLM API.
Re-zip and Convert Back to EPUB: Once translation is complete, I rezip the files and attempt to convert this ZIP back into an EPUB format using Calibre.
The issue arises in the final step. Despite the translated HTML content displaying perfectly in a web browser, the re-converted EPUB file from the ZIP is a total mess. Interestingly, this issue persists even when I try converting the original EPUB to ZIP and then back again, without any modifications.

As someone new to Calibre and this process, I'm unsure where the problem lies or how to fix it. Has anyone here dealt with similar conversion challenges or have experience with translating content in EPUB files? Any insights or advice would be greatly appreciated.

Thank you in advance for your help!
For what it may be worth, an ePub is a zip container so no need to convert to zip. You do have to maintain the structure of the container so links point to the correct locations. I would suggest unzipping the epub into a directory, trying your translation on the html/xhtml files only and then rezipping the directory contents. One special note is that the mimetype file must be in the root of the .zip container and must be stored with no compression.

You will need to correct language references in the ePub. If for instance you are translating from English to German, any references to lang="en" or xml:lang="en" would need to be changed to lang="de" or xml:lang="de".

The last time I saw an attempt to machine translate an ePub, it also ran into the issue that the translation attempted to translate everything. I.e. <body> was translated to <körper> which is not valid and class= was translated as klasse= which again is not valid. Hopefully, the tools have improved over the last few years so that will not be an issue.

You might be better off converting the ePub to a .docx Word document, translating that document and then converting back to ePub.
DNSB is offline   Reply With Quote
Old 03-09-2024, 12:35 PM   #3
Quoth
the rook, bossing Never.
Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.Quoth ought to be getting tired of karma fortunes by now.
 
Quoth's Avatar
 
Posts: 11,164
Karma: 85874891
Join Date: Jun 2017
Location: Ireland
Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper11
Epub is a zip. just rename it.

However export/conversion to docx, rtf or whatever for translation would be better. Your method risks messing up the epub manifest and css etc, as you've discovered.

Also the elephant in the room is the so called AI, the LLM. Either they are rubbish or plagiarising.

An epub is simply a zip, but the contents for an epub2 are:
HTML files, in order. Each new file causes a page break. The HTML headers ideally import css.
The CSS file(s), if any. Bad design if there are not.
The font files, if any. Order is irrelevant.
The image files, if any. Order is irrelevant.
A content.opf which is mandatory. It lists the files and what they do (a manifest).
A toc.ncx which is optional. It's the "system" Table of Contents for an app or ereader.
Epub3 has other possible files

Calibre has an editor which manages the relation between the files. Editing the HTML outside of Calibre is risky.

Simply passing the HTML via an API is a disaster as IDs (anchors), imports, classes etc won't be preserved, apart from risk of mangling the HTML tags.

It's best to export docx, translate each section/chapter separately (copy / paste only same style blocks with no images), check all links, anchors, headings, etc, save as docx, import to Calibre.

What you are doing only works (badly) for web pages. The html files in an ebook are not the same as a standalone web page even though using HTML5. An epub3 is even more frought with disaster to do this.

Last edited by Quoth; 03-09-2024 at 12:39 PM.
Quoth is offline   Reply With Quote
Old 03-09-2024, 03:11 PM   #4
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by DNSB View Post
. . .

You might be better off converting the ePub to a .docx Word document, translating that document and then converting back to ePub.


There are a couple of Word addins you might want to consider:

TransTools – Translation productivity tools.

e-Book Tools.

There is overlap between them, but they have there own strengths and weaknesses, example: TransTools Unbreaker and e-BookTools Dialogue checker are unique to each and invaluable.

I have them installed in the desktop version of Word from latest Office 365 with no issues… I keep everything local.

BR
BetterRed is offline   Reply With Quote
Old 03-09-2024, 08:25 PM   #5
Gaunc
Junior Member
Gaunc began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2024
Device: Ipad Pro 12.9, Kobo
Thank you all for your insights! I admit my knowledge of ePub is not very deep. I've been troubleshooting based on my workflow, and I'm currently trying to figure out the last step: converting a .zip file back to an .epub. Initially, I assumed that the process I used in Calibre to convert an ePub to a zip file could be simply reversed, but it seems that's not the case.

I attempted to rename my ePub file to .zip, but that approach didn't work. Regarding the LLM translation, that part went smoothly and without errors. I've completed a script segment that employs BeautifulSoup to parse individual HTML files, extracting content from specific tags. The content of the book was within three <div> classes, so the script needed to fetch the content from those specified classes, pass it through the LLM, and use the output to replace the original HTML content. I've been using Google Gemini for this, and it's quite remarkable—it didn't alter any HTML tags, and the formatting remained unchanged when viewing the HTML files.

I've uploaded the translated HTML "website" to GitHub as a demonstration of this part of the process working. You can view it here: https://gaunc1.github.io/brobromybookishere/.
Gaunc is offline   Reply With Quote
Old 03-09-2024, 10:03 PM   #6
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,575
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Perhaps you could use the Calibre Unpack tool: it provides 'Explode' and 'Rebuild' features.

Click image for larger version

Name:	Screenshot 2024-03-10 140055.jpg
Views:	17
Size:	90.4 KB
ID:	206809

BR
BetterRed is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble with Korean font when converting ZIP to EPUB junj Conversion 2 04-11-2021 10:55 PM
Problem converting from zip to epub. nstock Conversion 2 10-31-2017 03:18 AM
Error message when converting from ZIP to ePub luthar28 Conversion 2 05-24-2011 01:04 PM
Conversion error when converting zip to epub siebert Conversion 2 02-27-2011 11:40 AM


All times are GMT -4. The time now is 12:20 PM.


MobileRead.com is a privately owned, operated and funded community.