As far as I can tell, this is not a 'solved problem' within calibre or with its plugins, or even with public domain tools generally.
KindleUnpack does 'pretty well' with AZW3 to ePub. At least Thorium reader likes the result pretty much all of the time.
Image-only fixed layout (comics usually) is the low hanging fruit, but any deviation in page image size can throw a wrench in things.
But there is PDF (with text objects) to (fixed layout) ePub and AZW3 (with positioned text), which nothing seems to do a good job with ('good job' meaning 'preserves text and positioning in some way').
So what is missing is:
- PDF to FL ePub
- PDF to FL AZW3
- FL ePub to PDF
- FL ePub to AZW3
PyMuPDF claims to support conversions between any of its supported formats:
Document formats (input or output): PDF, XPS, ePub, Mobi, FB2, CBZ, SVG, TXT
Image formats:
Input formats: JPG/JPEG, PNG, BMP, GIF, TIFF, PNM, PGM, PBM, PPM, PAM, JXR, JPX/JP2, PSD
Output formats: JPG/JPEG, PNG, PNM, PGM, PBM, PPM, PAM, PSD, PS
It also has OCR support if it finds Tesseract's language support data.
This is example code to convert XPS to PDF:
Code:
import pymupdf
xps = pymupdf.open("input.xps")
pdfbytes = xps.convert_to_pdf()
pdf = pymupdf.open("pdf", pdfbytes)
pdf.save("output.pdf")
(I assume 'mobi' is not same as 'azw3' so even if everything else worked, one would still need to add conversion to AZW3 somehow, maybe by using KindleUnpack code and reversing its workflow to go the other way).
I am wondering if anyone has tried PyMuPDF out for converting between fixed layout formats. I am not holding high expectations for the resulting conversions, but maybe it is in the 'not too bad' category.
Is anyone else interested in this problem?
For ebooks (ePub or Kindle formats), fixed layout content support is not very good. Rarely is there any annotation capability or even text search. So even with best conversion it might not serve any great purpose to have it available.
At any rate I hope to find a little time here and there to try some experiments.