Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-23-2026, 09:12 PM   #1231
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 52,842
Karma: 180988364
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by yoshi View Post
Thank you for your explanation.

Adding "rendition:page-spread-center" without using "Plugin Tweaks" would be beneficial for EPUB readers.
You could try wrapping it in [noparse] ... [/noparse]. This keeps the :p from being seen as a winky.
DNSB is offline   Reply With Quote
Old 06-23-2026, 09:58 PM   #1232
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,854
Karma: 64181416
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by yoshi View Post
I tried to correct the above corrupted character but failed to do it.
You need to disable smilies under advanced edit option. (I did it for you)
theducks is offline   Reply With Quote
Advert
Old 07-02-2026, 06:03 PM   #1233
utakata
Junior Member
utakata began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2026
Device: KindleOasis
Performance fix + configurable PDF page DPI for print replica conversion

Hi jhowell,

First of all, thank you for maintaining this excellent plugin.

While converting Japanese print replica books (PDF-backed fixed layout KFX), I ran into two issues and would like to propose fixes for both. A patch against v2.25.0 is included below.

1. O(n^2) slowdown in check_consistency() with PDF-backed books

get_pdf_page_size() creates a new pypdf.PdfReader (and re-flattens the page tree) for every page reference. For a 336-page print replica book, decode_book() takes over 30 seconds, nearly all of it in this loop.

The patch caches the PdfReader per PDF content (keyed by a content fingerprint, since each resource fragment may hold a distinct bytes object with identical content). Results on the same book:
- get_pdf_page_size across 336 pages: over 30 sec -> 0.18 sec
- decode_book() incl. consistency check: over 30 sec -> about 11 sec

2. Configurable DPI for PDF page rendering (currently fixed at 150)

convert_pdf_to_jpeg() is limited to 150 dpi because calibre's page_images() does not accept a resolution argument. 150 dpi is quite low for technical books - small text and ruby annotations become hard to read.

The patch invokes pdftoppm directly (using calibre's bundled poppler binary when available, falling back to page_images() at 150 dpi on failure) and adds a "pdf_page_dpi" conversion option (default 300, clamped to 72-600), e.g.:

ebook-convert book.kfx book.epub --pdf-page-dpi 300

I set the default to 300 in the patch, but keeping 150 as the default would of course preserve existing behavior.

Verified on Windows (calibre 9.8) and Linux with two Japanese print replica books.

Happy to adjust anything if you'd prefer a different approach. Thanks again!

Code:
--- a/kfxlib/resources.py
+++ b/kfxlib/resources.py
@@ -314,17 +314,62 @@
     return outfile.getvalue()
 
 
-def convert_pdf_to_jpeg(pdf_data, page_num, dpi=150, reported_errors=None):
-    pdf_file = temp_filename("pdf", pdf_data)
-    jpeg_dir = create_temp_dir()
+PDF_PAGE_DPI = 150      # default; overridden by the KFX Input "pdf_page_dpi" conversion option
+
+
+def find_pdftoppm():
+    """Locate the pdftoppm executable (calibre's bundled poppler, or the system PATH)."""
+    candidates = []
 
     if calibre_numeric_version is not None:
+        try:
+            from calibre.ebooks.pdf.pdftohtml import PDFTOHTML
+            base = os.path.dirname(PDFTOHTML)
+            candidates.append(os.path.join(base, "pdftoppm.exe"))
+            candidates.append(os.path.join(base, "pdftoppm"))
+        except Exception:
+            pass
+
+    for candidate in candidates:
+        if os.path.exists(candidate):
+            return candidate
+
+    return "pdftoppm"      # rely on the system PATH
+
+
+def convert_pdf_to_jpeg(pdf_data, page_num, dpi=None, reported_errors=None):
+    if dpi is None:
+        dpi = PDF_PAGE_DPI
+
+    pdf_file = temp_filename("pdf", pdf_data)
+    jpeg_dir = create_temp_dir()
 
-        if dpi != 150:
-            raise Exception("calibre PDF page_images supports only default 150dpi")
+    rendered = False
+    try:
+        import subprocess
+        args = [
+            find_pdftoppm(), "-jpeg", "-r", str(dpi), "-cropbox",
+            "-f", str(page_num), "-l", str(page_num),
+            pdf_file, os.path.join(jpeg_dir, "page")]
+
+        kwargs = {}
+        if os.name == "nt":
+            kwargs["creationflags"] = 0x08000000    # CREATE_NO_WINDOW
+
+        subprocess.run(args, check=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL, timeout=120, **kwargs)
+        rendered = True
+    except Exception as e:
+        if reported_errors is not None and "pdftoppm_direct" not in reported_errors:
+            reported_errors.add("pdftoppm_direct")
+            log.warning("Direct pdftoppm rendering failed (%s), falling back to calibre page_images at 150dpi" % repr(e))
 
+    if not rendered and calibre_numeric_version is not None:
         from calibre.ebooks.metadata.pdf import page_images
         page_images(pdf_file, jpeg_dir, first=page_num, last=page_num)
+        rendered = True
+
+    if not rendered:
+        raise Exception("No PDF rendering method available (pdftoppm not found)")
 
     for dirpath, dirnames, filenames in os.walk(jpeg_dir):
         if len(filenames) != 1:
@@ -495,9 +540,25 @@
     return best_raw_media, best_quality
 
 
+_pdf_reader_cache = {}      # content fingerprint -> PdfReader
+
+
+def get_cached_pdf_reader(pdf_data):
+    """Cache PdfReader per PDF content. Avoids O(n^2) re-parsing and page-tree
+    re-flattening when a book references hundreds of pages of the same embedded
+    PDF (print replica). Keyed by a content fingerprint since each resource may
+    hold a distinct bytes object with identical content."""
+    key = (len(pdf_data), bytes(pdf_data[:256]), bytes(pdf_data[-256:]))
+    reader = _pdf_reader_cache.get(key)
+    if reader is None:
+        reader = pypdf.PdfReader(io.BytesIO(pdf_data))
+        len(reader.pages)       # force one-time page tree flatten while caching
+        _pdf_reader_cache[key] = reader
+    return reader
+
+
 def get_pdf_page_size(pdf_data, resource_name, page_num):
-    raw_media_file = io.BytesIO(pdf_data)
-    pdf = pypdf.PdfReader(raw_media_file)
+    pdf = get_cached_pdf_reader(pdf_data)
     page = pdf.pages[page_num - 1]
 
     if page.user_unit != 1:
--- a/__init__.py
+++ b/__init__.py
@@ -25,7 +25,7 @@
     name = "KFX Input"
     author = "jhowell"
     file_types = {"azw8", "kfx", "kfx-zip", "kpf"}
-    version = (2, 25, 0)
+    version = (2, 25, 1)    # custom build: configurable PDF page DPI + PdfReader cache
     minimum_calibre_version = (5, 0, 0)     # Python 3.8.5
     supported_platforms = ["windows", "osx", "linux"]
     description = "Convert from Amazon KFX format"
@@ -36,6 +36,11 @@
             help="Allow conversion to proceed even if the KFX book contains unexpected or incorrect data "
             "that may not convert properly. If this option is selected it is recommend that the log of each "
             "conversion be checked for error messages."),
+        OptionRecommendation(
+            name="pdf_page_dpi", recommended_value=300,
+            help="Resolution (DPI) used to render embedded PDF pages of print replica books as images "
+            "during conversion. Higher values produce sharper text at the cost of larger output files. "
+            "Default is 300. (The original plugin used a fixed 150 dpi.)"),
     }
 
     recommendations = EPUBInput.recommendations
@@ -93,13 +98,23 @@
             job_log = set_logger(JobLog(log))
             job_log.info("Converting %s" % name_of_file(stream))
 
+            from calibre_plugins.kfx_input.kfxlib import resources as kfx_resources
+            try:
+                pdf_page_dpi = int(getattr(options, "pdf_page_dpi", 300) or 300)
+            except (TypeError, ValueError):
+                pdf_page_dpi = 300
+            kfx_resources.PDF_PAGE_DPI = max(72, min(600, pdf_page_dpi))
+            job_log.info("PDF page rendering resolution: %d dpi" % kfx_resources.PDF_PAGE_DPI)
+
             book = YJ_Book(stream, symbol_catalog_filename=get_symbol_catalog_filename())
             book.decode_book(retain_yj_locals=True)
 
             if book.has_pdf_resource:
                 job_log.warning(
-                    "This book contains PDF content. It can be extracted using either the From KFX user interface "
-                    "plugin or the KFX Input plugin CLI. See the KFX Input plugin documentation for more information.")
+                    "This book contains PDF content. Its pages will be rendered as %d dpi images for this "
+                    "conversion. To obtain the original PDF without quality loss use either the From KFX user "
+                    "interface plugin or the KFX Input plugin CLI instead. See the KFX Input plugin "
+                    "documentation for more information." % kfx_resources.PDF_PAGE_DPI)
 
             if book.is_fixed_layout or book.is_magazine:
                 job_log.warning(
utakata is offline   Reply With Quote
Old 07-02-2026, 07:06 PM   #1234
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 84,011
Karma: 153695583
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
You should be doing your testing with the latest version of calibre.
JSWolf is offline   Reply With Quote
Old Yesterday, 08:36 AM   #1235
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,383
Karma: 95902893
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
Quote:
Originally Posted by utakata View Post
While converting Japanese print replica books (PDF-backed fixed layout KFX), I ran into two issues and would like to propose fixes for both. A patch against v2.25.0 is included below.
I will take a look at your patches an incorporate something similar in the next release of the plugin. (I am working on other projects so it may not happen immediately.)


Update:

There have been a lot of changes to the plugin since version 2.25.0 which your patch is based on.

In cases where PDF pages are composed of just an image that image will be extracted with no loss of resolution. And if not the DPI for PDF pages that need to be rendered as images has already been changed from 150 to 300. I believe that those existing changes should be sufficient. If you need a higher DPI you can alter the value of PDF_TO_IMAGE_DPI in resources.py to whatever you want.

I still need to look into the caching issue that you raised.

Last edited by jhowell; Yesterday at 09:52 AM. Reason: Update
jhowell is online now   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
KFX conversion, transfer back to library issue. shoelesshunter Conversion 12 09-22-2025 09:49 AM
[Conversion Input] Microsoft Doc Input Plugin igi Plugins 77 03-08-2025 04:04 AM
[Conversion Input] LaTeX Formulas Input Conversion Plugin sevyls Plugins 0 03-23-2015 05:52 AM
[Input Plugin] DOCX Input SauliusP. Plugins 42 06-05-2013 04:01 AM
Looking For MHT Input Conversion Plugin FlooseMan Dave Plugins 4 03-30-2010 05:52 PM


All times are GMT -4. The time now is 09:56 AM.


MobileRead.com is a privately owned, operated and funded community.