MobileRead Forums - View Single Post

feuille · 11-20-2020, 09:11 AM

Same error message here, so I played around with the plugin code.

As I found out, this message is just the tip of the iceberg.

To avoid this error message, the line
log.error('Exception when scanning for ISBN:', e)
in extract_threaded() in jobs.py should be changed to
log.error('Exception when scanning for ISBN: {}: {}'.format(type(e).__name__, e))
or
log.error(e.__traceback__)
or similar to avoid the formatting error message.

Next I came across:

Starting job: Extract ISBN for 1 books
================================================== =
Path: E:\Libraries\Literature\Theodor Fontane\Unterm Birnbaum (264)\ Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
WorkerError: Traceback (most recent call last):
File "calibre\utils\ipc\simple_worker.py", line 300, in main
File "calibre_plugins.extract_isbn.pdf", line 92, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

Failed to run pdfinfo/pdftohtml
Error in jobs.py:
Exception when scanning for ISBN:

Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

So in pdf.py immediately after def get_isbn(output_dir, pdf_name, log=None): I added the line
scanner = BookScanner(log)

The next step was:

Starting job: Extract ISBN for 1 books
================================================== =
Title: Unterm Birnbaum.
Format: PDF
Path: E:\Bibliotheken\Literatur\Theodor Fontane\Unterm Birnbaum (264)\Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
pdfinfo returned no UTF-8 data

Scan time: 0.64 secs
The scan failed to find an isbn in 0.64 secs
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

The error message pdfinfo returned no UTF-8 data comes from get_page_count() in pdf.py

In this method, the line
raw = raw.decode('utf-8')
should be made fault tolerant:
raw = raw.decode('utf-8', error='replace')

With these changes, ISBN extraction for PDF files is now running smoothly for me!

11-20-2020, 09:11 AM	#403
feuille Connoisseur Posts: 62 Karma: 666 Join Date: May 2020 Location: Germany Device: android smartphone + tablet	PDF and "Exception when scanning for ISBN: not all arguments converted..." Same error message here, so I played around with the plugin code. As I found out, this message is just the tip of the iceberg. To avoid this error message, the line log.error('Exception when scanning for ISBN:', e) in extract_threaded() in jobs.py should be changed to log.error('Exception when scanning for ISBN: {}: {}'.format(type(e).__name__, e)) or log.error(e.__traceback__) or similar to avoid the formatting error message. Next I came across: Starting job: Extract ISBN for 1 books ================================================== = Path: E:\Libraries\Literature\Theodor Fontane\Unterm Birnbaum (264)\ Unterm Birnbaum - Theodor Fontane.pdf --------------------------------------------------- WorkerError: Traceback (most recent call last): File "calibre\utils\ipc\simple_worker.py", line 300, in main File "calibre_plugins.extract_isbn.pdf", line 92, in get_isbn UnboundLocalError: local variable 'scanner' referenced before assignment Failed to run pdfinfo/pdftohtml Error in jobs.py: Exception when scanning for ISBN: Failed to extract ISBN ================================================== = Scan complete, with 1 failures So in pdf.py immediately after def get_isbn(output_dir, pdf_name, log=None): I added the line scanner = BookScanner(log) The next step was: Starting job: Extract ISBN for 1 books ================================================== = Title: Unterm Birnbaum. Format: PDF Path: E:\Bibliotheken\Literatur\Theodor Fontane\Unterm Birnbaum (264)\Unterm Birnbaum - Theodor Fontane.pdf --------------------------------------------------- pdfinfo returned no UTF-8 data Scan time: 0.64 secs The scan failed to find an isbn in 0.64 secs Failed to extract ISBN ================================================== = Scan complete, with 1 failures The error message pdfinfo returned no UTF-8 data comes from get_page_count() in pdf.py In this method, the line raw = raw.decode('utf-8') should be made fault tolerant: raw = raw.decode('utf-8', error='replace') With these changes, ISBN extraction for PDF files is now running smoothly for me!