[GUI Plugin] Extract ISBN - Page 23

theducks · 03-16-2018, 04:16 PM

Quote:

Originally Posted by aquiaolado

Hi.
Can someone explain me why the plugin does not retrieve any ISBN?
Thanks.

This is the result

Starting job: Extract ISBN for 1 books
================================================== =
Title: (Lecture Notes in Social Networks) James A. Dator, John A. Sweeney, Aubrey M. Yee (auth.)-Mutative Media Communication Technologies and Power Relations in the Past, Present, and Futures-Springer Inte
Format: PDF
Path: C:\Users\Paulo Martins\Documents\Biblioteca do Calibre\Desconhecido\(Lecture Notes in Social Networks) (2354)\(Lecture Notes in Social Networ - Desconhecido.pdf
---------------------------------------------------
Failed to extract ISBN
================================================== =
Scan complete, with 1 failures

Can you SEE an ISBN in the first few pages? This PI looks for (commonly found) ISBN like patterns.

BetterRed · 03-16-2018, 04:55 PM

From the first post in this thread:

Quote:

This plugin can be used to try to find the ISBN for a book using the text within a book format [file].

That is, from within the PDF, EPUB etc.

BR

aquiaolado · 03-16-2018, 05:20 PM

Hi again.
Thanks for your answers.
It is not only one PDF file. It happens in about 1000 files I have. I can see the the ISBN in the first 10 pages.

aquiaolado · 03-16-2018, 05:24 PM

The format is, as an example: ISBN-13: 978-1-84520-132-6

BetterRed · 03-16-2018, 06:38 PM

Quote:

Originally Posted by aquiaolado

Hi again.
Thanks for your answers.
It is not only one PDF file. It happens in about 1000 files I have. I can see the the ISBN in the first 10 pages.

Can you see the ISBN as a string of copyable characters, or as characters within an image - AFAIK the plugin doesn't do OCR on PDF's created from images.

BR

aquiaolado · 03-16-2018, 06:41 PM

Yes. It is a normal/cpyable PDF.

BetterRed · 03-16-2018, 07:06 PM

Try convert one of the PDFs to TXT and run the plugin against the TXT version, probably best to isolate the TXT format into a different book.

If the PI can find the ISBN in the TXT version then there must be something in the PDF that is effectively hiding it. Are the PDF's 'protected' in any way?

BR

theducks · 03-16-2018, 07:59 PM

I've never see ISBN13 quite written with dashed like that.
That is mixing the old Language-Publisher-Book number-check digit representation from the print only days, with the 978 barcode series of ISBN

Alvgon · 03-18-2018, 12:25 PM

I currently have a PDF with this "copyable" text:

ISBN 0 7506 4790 6

It is not detected as ISBN number, not 10 nor 13 digits format. I supse is because the text string does not have the right length for scan.py/_evaluate_isbn_match
function to detect it.
I'm not a python programmer though.
Could any knowledgeable fellow to comment on possible solutions?

Divingduck · 03-20-2018, 04:10 AM

Because it isn't a valid ISBN-10 declaration.

Valid is ISBN:0750647906 and ISBN:0-7506-4790-6 not ISBN 0 7506 4790 6

https://en.wikipedia.org/wiki/Intern...rd_Book_Number

Nicolas F · 03-20-2018, 07:59 AM

Quote:

Originally Posted by Alvgon

I currently have a PDF with this "copyable" text:

ISBN 0 7506 4790 6

It is not detected as ISBN number, not 10 nor 13 digits format. I supse is because the text string does not have the right length for scan.py/_evaluate_isbn_match
function to detect it.
I'm not a python programmer though.
Could any knowledgeable fellow to comment on possible solutions?

Quote:

Originally Posted by Divingduck

Because it isn't a valid ISBN-10 declaration.

Valid is ISBN:0750647906 and ISBN:0-7506-4790-6 not ISBN 0 7506 4790 6

https://en.wikipedia.org/wiki/Intern...rd_Book_Number

It may be an invalid way to write it down, but that's not really a problem here. The regex used by the plugin will recognize it, so the problem is elsewhere.
(just try to add "ISBN 0 7506 4790 6" anywhere in an epub and the plugin as no problem detecting it)

The plugin probably have difficulty accessing the text of the pdf.

theducks · 03-20-2018, 10:31 AM

Quote:

Originally Posted by Nicolas F

It may be an invalid way to write it down, but that's not really a problem here. The regex used by the plugin will recognize it, so the problem is elsewhere.
(just try to add "ISBN 0 7506 4790 6" anywhere in an epub and the plugin as no problem detecting it)

The plugin probably have difficulty accessing the text of the pdf.

Those 'spaces' may be something else, not recognized by the REGEX \s

The problem is the publisher did something unusual

BeckyEbook · 04-02-2018, 03:17 PM

Quote:

Originally Posted by Alvgon

I currently have a PDF with this "copyable" text:

ISBN 0 7506 4790 6

As mentioned by @theducks - there is no ordinary "space" between the numbers.

Change in file scan.py line 15 to:

Code:

RE_ISBN = re.compile(u'\s*([0-9\-\.–*―—\^ \u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A]{9,18}[0-9xX])', re.UNICODE)

And try again.

excaliber · 03-04-2019, 07:18 AM

@kiwidude: Thanks for the plugin!
I have one issue with it. For every job that is finished a dialog box like this appears:
Scan complete
Extract ISBN found x new isbn(s). Proceed with updating your library?

If the jobs are few then it's not a problem, I can click Yes and all it's ok. The problem arises when there are some hundred jobs and I have to click every time the Yes button - then it's becoming annoying.
Would it be possible to implement a "Yes to all" and maybe "No to all" or "Cancel"?

JVarga · 10-12-2019, 11:19 AM

This is a rather old thread, but I hope somebody is still reading it...
I keep failing to extract the ISBN from PDF files (all PDF-s) with the error message:

Traceback (most recent call last):
File "site-packages\calibre\utils\ipc\simple_worker.py", line 290, in main
File "calibre_plugins.extract_isbn.pdf", line 86, in get_isbn
UnboundLocalError: local variable 'scanner' referenced before assignment

It is quite an old error; at present, I use Calibre 4.1.0 under Windows 10.

Can anybody help how to solve or bypass the problem?

03-16-2018, 05:24 PM	#334
aquiaolado Member Posts: 14 Karma: 10 Join Date: Mar 2018 Device: smartphone	format of isbn The format is, as an example: ISBN-13: 978-1-84520-132-6

10-12-2019, 11:19 AM	#345
JVarga Junior Member Posts: 2 Karma: 10 Join Date: Oct 2019 Device: none	Extract ISBN does not work under Windows 10 This is a rather old thread, but I hope somebody is still reading it... I keep failing to extract the ISBN from PDF files (all PDF-s) with the error message: Traceback (most recent call last): File "site-packages\calibre\utils\ipc\simple_worker.py", line 290, in main File "calibre_plugins.extract_isbn.pdf", line 86, in get_isbn UnboundLocalError: local variable 'scanner' referenced before assignment It is quite an old error; at present, I use Calibre 4.1.0 under Windows 10. Can anybody help how to solve or bypass the problem?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Extract ISBN from PDF?	mdroberts	Calibre	14	12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name	ChristianQ	Calibre	59	12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request	UnraisedArc	Calibre	60	03-23-2011 09:31 AM
Displaying ISBN column in the main GUI	tilleydog	Library Management	26	02-25-2011 04:08 AM

03-16-2018, 05:20 PM	#333
aquiaolado Member Posts: 14 Karma: 10 Join Date: Mar 2018 Device: smartphone	Hi again. Thanks for your answers. It is not only one PDF file. It happens in about 1000 files I have. I can see the the ISBN in the first 10 pages.

03-16-2018, 06:41 PM	#336
aquiaolado Member Posts: 14 Karma: 10 Join Date: Mar 2018 Device: smartphone	Yes. It is a normal/cpyable PDF.

03-16-2018, 07:06 PM	#337
BetterRed null operator (he/him) Posts: 20,565 Karma: 26954694 Join Date: Mar 2012 Location: Sydney Australia Device: none	Try convert one of the PDFs to TXT and run the plugin against the TXT version, probably best to isolate the TXT format into a different book. If the PI can find the ISBN in the TXT version then there must be something in the PDF that is effectively hiding it. Are the PDF's 'protected' in any way? BR

03-16-2018, 07:59 PM	#338
theducks Well trained by Cats Posts: 29,792 Karma: 54830978 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	I've never see ISBN13 quite written with dashed like that. That is mixing the old Language-Publisher-Book number-check digit representation from the print only days, with the 978 barcode series of ISBN

03-18-2018, 12:25 PM	#339
Alvgon Junior Member Posts: 1 Karma: 10 Join Date: Mar 2018 Device: none	I currently have a PDF with this "copyable" text: ISBN 0 7506 4790 6 It is not detected as ISBN number, not 10 nor 13 digits format. I supse is because the text string does not have the right length for scan.py/_evaluate_isbn_match function to detect it. I'm not a python programmer though. Could any knowledgeable fellow to comment on possible solutions?

03-20-2018, 04:10 AM	#340
Divingduck Wizard Posts: 1,161 Karma: 1404241 Join Date: Nov 2010 Location: Germany Device: Sony PRS-650	Because it isn't a valid ISBN-10 declaration. Valid is ISBN:0750647906 and ISBN:0-7506-4790-6 not ISBN 0 7506 4790 6 https://en.wikipedia.org/wiki/Intern...rd_Book_Number

03-04-2019, 07:18 AM	#344
excaliber Connoisseur Posts: 59 Karma: 10 Join Date: Nov 2013 Device: Samsung Galaxy Tab 2 10.1 P5110	@kiwidude: Thanks for the plugin! I have one issue with it. For every job that is finished a dialog box like this appears: Scan complete Extract ISBN found x new isbn(s). Proceed with updating your library? If the jobs are few then it's not a problem, I can click Yes and all it's ok. The problem arises when there are some hundred jobs and I have to click every time the Yes button - then it's becoming annoying. Would it be possible to implement a "Yes to all" and maybe "No to all" or "Cancel"?