[GUI Plugin] Extract ISBN - Page 27

Ppaa · 10-01-2020, 03:58 AM

Plugin works with epub and pdf converted to epub. But I have very little epub and conversion is not the best option. I have a standard defender. Disabling it did not affect the error.

This is the first time I use the plugin and calibre.

Starting job: Extract ISBN for 1 books
================================================== =
Title: Hacker and Moore's Essentials of Obstetrics and Gynecology
Format: EPUB
Path: \\192.168.1.3\NASpace\Recipe\Library\Neville F. Hacker\Hacker and Moore's Essentials of Ob (123)\Hacker and Moore's Essentials o - Neville F. Hacker.epub
---------------------------------------------------
Scanning first 10, then last 5, then remaining 68 files
Invalid ISBN match: 19103-2899
Valid ISBN13: 9781416059400
Valid ISBN13: 9780808924166
Invalid ISBN match: 215 239 3804
Valid ISBN10: 1865843830
Invalid ISBN match: 1865 853333
Valid ISBN13: 9781416059400
Invalid ISBN match: 22 2008025860
Invalid ISBN match: 9 8 7 6 5 4 3 2
Scan time: 3.90 secs
The isbn was found in 3.90 secs
Identical ISBN extracted of: 9781416059400
================================================== =
Scan complete, with 0 failures

davidfor · 10-01-2020, 04:27 AM

Quote:

Originally Posted by Ppaa

Plugin works with epub and pdf converted to epub. But I have very little epub and conversion is not the best option. I have a standard defender. Disabling it did not affect the error.

This is the first time I use the plugin and calibre.

OK, that clears up that the plugin works and it isn't a more general problem. The code that does the text extraction from the PDF is fairly old and there may be a better way to do it now. Though I cannot see a reason why it works for me, but, not you.

Ppaa · 10-01-2020, 04:43 AM

How else can you extract isbn from pdf?

davidfor · 10-01-2020, 07:58 AM

Quote:

Originally Posted by Ppaa

How else can you extract isbn from pdf?

Not that I know of. I'm not sure how many people actually use this plugin. Personally, I download metadata for my books, and that can get the ISBN if one is available. Otherwise, I don't worry about it.

If the metadata download doesn't work, the brute force method would be to convert it to epub, run the plugin and then delete the epubs. That should work, but will take time.

I do plan to have a look at it, but, I don't know when. As it is a plugin I don't used, it isn't a high priority for me. If there is someone else who wants to look...

Ppaa · 10-01-2020, 02:24 PM

Perhaps this will help.

Title: Management Of High-Risk Pregnancy An Evidence-Based Approach Queenan
Format: PDF
Path: \\192.168.1.3\NASpace\Recipe\Library\2007\Manageme nt Of High-Risk Pregnancy A (544)\Management Of High-Risk Pregnan - 2007.pdf
---------------------------------------------------
Traceback (most recent call last):
File "site-packages\calibre\utils\ipc\simple_worker.py", line 308, in main
File "calibre_plugins.extract_isbn.pdf", line 90, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

Failed to extract ISBN

JSWolf · 10-02-2020, 11:15 AM

Quote:

Originally Posted by Ppaa

Perhaps this will help.

Title: Management Of High-Risk Pregnancy An Evidence-Based Approach Queenan
Format: PDF
Path: \\192.168.1.3\NASpace\Recipe\Library\2007\Manageme nt Of High-Risk Pregnancy A (544)\Management Of High-Risk Pregnan - 2007.pdf
---------------------------------------------------
Traceback (most recent call last):
File "site-packages\calibre\utils\ipc\simple_worker.py", line 308, in main
File "calibre_plugins.extract_isbn.pdf", line 90, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

Failed to extract ISBN

You should stop using an IP address and use a local library. Maybe the IP address sis causing your problems.

davidfor · 10-02-2020, 09:08 PM

Quote:

Originally Posted by JSWolf

You should stop using an IP address and use a local library. Maybe the IP address sis causing your problems.

I'd love to know why you think that. Especially as @Ppaa has demonstrated the problem with a book in a library on the local disk.

Ppaa · 10-03-2020, 05:04 AM

Title: Epilepsy and Pregnancy - What Every Woman with Epilepsy Should Know Chillemi
Format: PDF
Path: C:\Users\magio\Documents\Library\2006\Epilepsy and Pregnancy - What Every (1)\Epilepsy and Pregnancy - What E - 2006.pdf
---------------------------------------------------
Exception when scanning for ISBN: not all arguments converted during string formatting
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

davidfor · 10-03-2020, 05:16 AM

Quote:

Originally Posted by Ppaa

Title: Epilepsy and Pregnancy - What Every Woman with Epilepsy Should Know Chillemi
Format: PDF
Path: C:\Users\magio\Documents\Library\2006\Epilepsy and Pregnancy - What Every (1)\Epilepsy and Pregnancy - What E - 2006.pdf
---------------------------------------------------
Exception when scanning for ISBN: not all arguments converted during string formatting
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

I think it is going to fail for all of you PDFs. Unless the 5.1.1 beta solves it, it is going to have to wait until I have some time to look at it. Or someone else interested in it steps in.

chaley · 10-03-2020, 05:46 AM

Quote:

Originally Posted by davidfor

I think it is going to fail for all of you PDFs. Unless the 5.1.1 beta solves it, it is going to have to wait until I have some time to look at it. Or someone else interested in it steps in.

I tried it with the idea of "taking a look at it". It doesn't fail on V5.1 (portable) or V5.1.1 (64 bit). It successfully found the ISBN in the PDF.

davidfor · 10-03-2020, 06:31 AM

Quote:

Originally Posted by chaley

I tried it with the idea of "taking a look at it". It doesn't fail on V5.1 (portable) or V5.1.1 (64 bit). It successfully found the ISBN in the PDF.

I haven't been able to get it to fail on any version. That makes it hard to debug. I did follow the code down and I can see there is a difference in how pdftohtml is used compared to how it is used when doing a conversion. But, that might just be coding style or because the desired results are different. I haven't had a chance to study it. Or a lot of desire to.

erymaxuk · 10-09-2020, 10:44 PM

Hi, I would be grateful for an advise regarding extract ISBN. The problem, plugin extract ISBN from ISBN xxxxxxx (from the book).Using this ISBN, metadata downloader cannot find the book. If I manually insert ISBN xxxxxxxxx (ebook) or eISBN from the book, metadata find it with no problem. I have many books where are 2 ISBNs, ISBN and ISBN(ebook). Is there any way to modify the plugin to read ISBN(ebook)number instaed of just the ISBNnumber? THanks in advance.

feuille · 11-20-2020, 09:11 AM

Same error message here, so I played around with the plugin code.

As I found out, this message is just the tip of the iceberg.

To avoid this error message, the line
log.error('Exception when scanning for ISBN:', e)
in extract_threaded() in jobs.py should be changed to
log.error('Exception when scanning for ISBN: {}: {}'.format(type(e).__name__, e))
or
log.error(e.__traceback__)
or similar to avoid the formatting error message.

Next I came across:

Starting job: Extract ISBN for 1 books
================================================== =
Path: E:\Libraries\Literature\Theodor Fontane\Unterm Birnbaum (264)\ Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
WorkerError: Traceback (most recent call last):
File "calibre\utils\ipc\simple_worker.py", line 300, in main
File "calibre_plugins.extract_isbn.pdf", line 92, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

Failed to run pdfinfo/pdftohtml
Error in jobs.py:
Exception when scanning for ISBN:

Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

So in pdf.py immediately after def get_isbn(output_dir, pdf_name, log=None): I added the line
scanner = BookScanner(log)

The next step was:

Starting job: Extract ISBN for 1 books
================================================== =
Title: Unterm Birnbaum.
Format: PDF
Path: E:\Bibliotheken\Literatur\Theodor Fontane\Unterm Birnbaum (264)\Unterm Birnbaum - Theodor Fontane.pdf
---------------------------------------------------
pdfinfo returned no UTF-8 data

Scan time: 0.64 secs
The scan failed to find an isbn in 0.64 secs
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

The error message pdfinfo returned no UTF-8 data comes from get_page_count() in pdf.py

In this method, the line
raw = raw.decode('utf-8')
should be made fault tolerant:
raw = raw.decode('utf-8', error='replace')

With these changes, ISBN extraction for PDF files is now running smoothly for me!

davidfor · 11-21-2020, 10:50 PM

Quote:

Originally Posted by erymaxuk

Hi, I would be grateful for an advise regarding extract ISBN. The problem, plugin extract ISBN from ISBN xxxxxxx (from the book).Using this ISBN, metadata downloader cannot find the book. If I manually insert ISBN xxxxxxxxx (ebook) or eISBN from the book, metadata find it with no problem. I have many books where are 2 ISBNs, ISBN and ISBN(ebook). Is there any way to modify the plugin to read ISBN(ebook)number instaed of just the ISBNnumber? THanks in advance.

That isn't strictly a problem of this plugin. The plugin searches for an ISBN, if it finds one, it uses it. If that ISBN does not exist in the metadata source you are using, that is more a problem of the metadata source. Or that the ISBN found was (for want of a better term) a false positive.

Of course, it could be done. But, the plugin just uses regex to find the possibilities and then validates what was found. Adding more checks would complicate that search. But, it isn't something I am interested in doing. I don't use the plugin, and only made changes to fix it for calibre 5. If anyone wants to add this, please do so.

Freakeao · 12-26-2020, 01:34 PM

How about an option to use the last ISBN found instead of the first? ebook ISBNs seem to be listed last pretty frequently and those are the ones I prefer.

Thanks.

10-09-2020, 10:44 PM	#402
erymaxuk Member Posts: 15 Karma: 10 Join Date: Nov 2018 Location: Thailand Device: jlalik14@gmail.com	extractin ISBN(eBook) Hi, I would be grateful for an advise regarding extract ISBN. The problem, plugin extract ISBN from ISBN xxxxxxx (from the book).Using this ISBN, metadata downloader cannot find the book. If I manually insert ISBN xxxxxxxxx (ebook) or eISBN from the book, metadata find it with no problem. I have many books where are 2 ISBNs, ISBN and ISBN(ebook). Is there any way to modify the plugin to read ISBN(ebook)number instaed of just the ISBNnumber? THanks in advance.

11-20-2020, 09:11 AM	#403
feuille Connoisseur Posts: 51 Karma: 666 Join Date: May 2020 Location: Germany Device: android smartphone + tablet	PDF and "Exception when scanning for ISBN: not all arguments converted..." Same error message here, so I played around with the plugin code. As I found out, this message is just the tip of the iceberg. To avoid this error message, the line log.error('Exception when scanning for ISBN:', e) in extract_threaded() in jobs.py should be changed to log.error('Exception when scanning for ISBN: {}: {}'.format(type(e).__name__, e)) or log.error(e.__traceback__) or similar to avoid the formatting error message. Next I came across: Starting job: Extract ISBN for 1 books ================================================== = Path: E:\Libraries\Literature\Theodor Fontane\Unterm Birnbaum (264)\ Unterm Birnbaum - Theodor Fontane.pdf --------------------------------------------------- WorkerError: Traceback (most recent call last): File "calibre\utils\ipc\simple_worker.py", line 300, in main File "calibre_plugins.extract_isbn.pdf", line 92, in get_isbn UnboundLocalError: local variable 'scanner' referenced before assignment Failed to run pdfinfo/pdftohtml Error in jobs.py: Exception when scanning for ISBN: Failed to extract ISBN ================================================== = Scan complete, with 1 failures So in pdf.py immediately after def get_isbn(output_dir, pdf_name, log=None): I added the line scanner = BookScanner(log) The next step was: Starting job: Extract ISBN for 1 books ================================================== = Title: Unterm Birnbaum. Format: PDF Path: E:\Bibliotheken\Literatur\Theodor Fontane\Unterm Birnbaum (264)\Unterm Birnbaum - Theodor Fontane.pdf --------------------------------------------------- pdfinfo returned no UTF-8 data Scan time: 0.64 secs The scan failed to find an isbn in 0.64 secs Failed to extract ISBN ================================================== = Scan complete, with 1 failures The error message pdfinfo returned no UTF-8 data comes from get_page_count() in pdf.py In this method, the line raw = raw.decode('utf-8') should be made fault tolerant: raw = raw.decode('utf-8', error='replace') With these changes, ISBN extraction for PDF files is now running smoothly for me!

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Extract ISBN from PDF?	mdroberts	Calibre	14	12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name	ChristianQ	Calibre	59	12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request	UnraisedArc	Calibre	60	03-23-2011 09:31 AM
Displaying ISBN column in the main GUI	tilleydog	Library Management	26	02-25-2011 04:08 AM

10-01-2020, 03:58 AM	#391
Ppaa Member Posts: 12 Karma: 10 Join Date: Apr 2016 Device: Android smartphone	Plugin works with epub and pdf converted to epub. But I have very little epub and conversion is not the best option. I have a standard defender. Disabling it did not affect the error. This is the first time I use the plugin and calibre. Starting job: Extract ISBN for 1 books ================================================== = Title: Hacker and Moore's Essentials of Obstetrics and Gynecology Format: EPUB Path: \\192.168.1.3\NASpace\Recipe\Library\Neville F. Hacker\Hacker and Moore's Essentials of Ob (123)\Hacker and Moore's Essentials o - Neville F. Hacker.epub --------------------------------------------------- Scanning first 10, then last 5, then remaining 68 files Invalid ISBN match: 19103-2899 Valid ISBN13: 9781416059400 Valid ISBN13: 9780808924166 Invalid ISBN match: 215 239 3804 Valid ISBN10: 1865843830 Invalid ISBN match: 1865 853333 Valid ISBN13: 9781416059400 Invalid ISBN match: 22 2008025860 Invalid ISBN match: 9 8 7 6 5 4 3 2 Scan time: 3.90 secs The isbn was found in 3.90 secs Identical ISBN extracted of: 9781416059400 ================================================== = Scan complete, with 0 failures

10-01-2020, 04:43 AM	#393
Ppaa Member Posts: 12 Karma: 10 Join Date: Apr 2016 Device: Android smartphone	How else can you extract isbn from pdf?

10-01-2020, 02:24 PM	#395
Ppaa Member Posts: 12 Karma: 10 Join Date: Apr 2016 Device: Android smartphone	Perhaps this will help. Title: Management Of High-Risk Pregnancy An Evidence-Based Approach Queenan Format: PDF Path: \\192.168.1.3\NASpace\Recipe\Library\2007\Manageme nt Of High-Risk Pregnancy A (544)\Management Of High-Risk Pregnan - 2007.pdf --------------------------------------------------- Traceback (most recent call last): File "site-packages\calibre\utils\ipc\simple_worker.py", line 308, in main File "calibre_plugins.extract_isbn.pdf", line 90, in get_isbn UnboundLocalError: local variable 'scanner' referenced before assignment Failed to extract ISBN

10-03-2020, 05:04 AM	#398
Ppaa Member Posts: 12 Karma: 10 Join Date: Apr 2016 Device: Android smartphone	Title: Epilepsy and Pregnancy - What Every Woman with Epilepsy Should Know Chillemi Format: PDF Path: C:\Users\magio\Documents\Library\2006\Epilepsy and Pregnancy - What Every (1)\Epilepsy and Pregnancy - What E - 2006.pdf --------------------------------------------------- Exception when scanning for ISBN: not all arguments converted during string formatting Failed to extract ISBN ================================================== = Scan complete, with 1 failures

12-26-2020, 01:34 PM	#405
Freakeao Connoisseur Posts: 51 Karma: 10 Join Date: Nov 2012 Device: none	How about an option to use the last ISBN found instead of the first? ebook ISBNs seem to be listed last pretty frequently and those are the ones I prefer. Thanks.