View Single Post
Old 11-20-2020, 09:11 AM   #403
feuille
Connoisseur
feuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enoughfeuille will become famous soon enough
 
Posts: 62
Karma: 666
Join Date: May 2020
Location: Germany
Device: android smartphone + tablet
PDF and "Exception when scanning for ISBN: not all arguments converted..."

Same error message here, so I played around with the plugin code.

As I found out, this message is just the tip of the iceberg.

To avoid this error message, the line
log.error('Exception when scanning for ISBN:', e)
in extract_threaded() in jobs.py should be changed to
log.error('Exception when scanning for ISBN: {}: {}'.format(type(e).__name__, e))
or
log.error(e.__traceback__)
or similar to avoid the formatting error message.

Next I came across:

Starting job: Extract ISBN for 1 books
================================================== =
Path: E:\Libraries\Literature\Theodor Fontane\Unterm Birnbaum (264)\ Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
WorkerError: Traceback (most recent call last):
File "calibre\utils\ipc\simple_worker.py", line 300, in main
File "calibre_plugins.extract_isbn.pdf", line 92, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

Failed to run pdfinfo/pdftohtml
Error in jobs.py:
Exception when scanning for ISBN:

Failed to extract ISBN
================================================== =
Scan complete, with 1 failures


So in pdf.py immediately after def get_isbn(output_dir, pdf_name, log=None): I added the line
scanner = BookScanner(log)

The next step was:

Starting job: Extract ISBN for 1 books
================================================== =
Title: Unterm Birnbaum.
Format: PDF
Path: E:\Bibliotheken\Literatur\Theodor Fontane\Unterm Birnbaum (264)\Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
pdfinfo returned no UTF-8 data

Scan time: 0.64 secs
The scan failed to find an isbn in 0.64 secs
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures


The error message pdfinfo returned no UTF-8 data comes from get_page_count() in pdf.py

In this method, the line
raw = raw.decode('utf-8')
should be made fault tolerant:
raw = raw.decode('utf-8', error='replace')

With these changes, ISBN extraction for PDF files is now running smoothly for me!
feuille is offline   Reply With Quote