MobileRead Forums - View Single Post - Issues Converting Translated ZIP Content Back to EPUB in Calibre

Quoth · 03-09-2024, 12:35 PM

Epub is a zip. just rename it.

However export/conversion to docx, rtf or whatever for translation would be better. Your method risks messing up the epub manifest and css etc, as you've discovered.

Also the elephant in the room is the so called AI, the LLM. Either they are rubbish or plagiarising.

An epub is simply a zip, but the contents for an epub2 are:
HTML files, in order. Each new file causes a page break. The HTML headers ideally import css.
The CSS file(s), if any. Bad design if there are not.
The font files, if any. Order is irrelevant.
The image files, if any. Order is irrelevant.
A content.opf which is mandatory. It lists the files and what they do (a manifest).
A toc.ncx which is optional. It's the "system" Table of Contents for an app or ereader.
Epub3 has other possible files

Calibre has an editor which manages the relation between the files. Editing the HTML outside of Calibre is risky.

Simply passing the HTML via an API is a disaster as IDs (anchors), imports, classes etc won't be preserved, apart from risk of mangling the HTML tags.

It's best to export docx, translate each section/chapter separately (copy / paste only same style blocks with no images), check all links, anchors, headings, etc, save as docx, import to Calibre.

What you are doing only works (badly) for web pages. The html files in an ebook are not the same as a standalone web page even though using HTML5. An epub3 is even more frought with disaster to do this.

03-09-2024, 12:35 PM	#3
Quoth Still reading Posts: 14,300 Karma: 105299897 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	Epub is a zip. just rename it. However export/conversion to docx, rtf or whatever for translation would be better. Your method risks messing up the epub manifest and css etc, as you've discovered. Also the elephant in the room is the so called AI, the LLM. Either they are rubbish or plagiarising. An epub is simply a zip, but the contents for an epub2 are: HTML files, in order. Each new file causes a page break. The HTML headers ideally import css. The CSS file(s), if any. Bad design if there are not. The font files, if any. Order is irrelevant. The image files, if any. Order is irrelevant. A content.opf which is mandatory. It lists the files and what they do (a manifest). A toc.ncx which is optional. It's the "system" Table of Contents for an app or ereader. Epub3 has other possible files Calibre has an editor which manages the relation between the files. Editing the HTML outside of Calibre is risky. Simply passing the HTML via an API is a disaster as IDs (anchors), imports, classes etc won't be preserved, apart from risk of mangling the HTML tags. It's best to export docx, translate each section/chapter separately (copy / paste only same style blocks with no images), check all links, anchors, headings, etc, save as docx, import to Calibre. What you are doing only works (badly) for web pages. The html files in an ebook are not the same as a standalone web page even though using HTML5. An epub3 is even more frought with disaster to do this. Last edited by Quoth; 03-09-2024 at 12:39 PM.