[GUI Plugin] Extract ISBN - Page 11

capnm · 06-08-2011, 12:45 AM

1)
When I run this on a single epub, and it finds an isbn that matches the existing metadata I get this error message:

Job: "Extract ISBN for 1 books" failed with error:
Traceback (most recent call last):
File "site-packages\calibre\gui2\threaded_jobs.py", line 83, in start_work
File "calibre_plugins.extract_isbn.jobs", line 81, in extract_threaded
AttributeError: 'set' object has no attribute 'append'

Called with args: ([100], <calibre.library.database2.LibraryDatabase2 object at 0x05186170>) {u'notifications': <Queue.Queue instance at 0x11F8FF80>, u'abort': <threading._Event object at 0x0BD3C9D0>, u'log': <calibre.utils.logging.GUILog object at 0x0BD3C0B0>}

Not a big deal since the extract works ok ....

2)

It's not finding the ISBN in several epubs.
This doesn't seem to be yet another dash, they're using 2D dashes.
They are in the last couple of lines of the last htm if that might have something to do with it.

Two examples:

eISBN: 978-0-375-89036-9

v3.0
</div>
</body>
</html>

eISBN: 978-0-307-54803-0

<a class="pubhlink" href="http://www.vintagebooks.com">www.vintagebooks.com</a>

v3.0
</div>
</body>
</html>

3)
A side question-
Is there an easy way to grab the ISBN (or other fields) from the embedded metadata?
The only things I've come up with are:
Reimport the epub or
Open in Sigil, edit metadata,cut/paste
both pretty cumbersome

Thanks!

kiwidude · 06-08-2011, 03:36 AM

@capnm - please pm me a link to your file for #2.

I will look into #1, that looks like a simple bug to fix, strange no one else has noticed it.

Re #3 - the other way without Sigil is to use the ebook viewer which you could similarly copy the data from, not much difference in steps though. I don't know of anything else beyond what you have listed.

Ersatzreifen · 06-08-2011, 07:11 AM

I downloaded this plugin and tried to install it, but got this error:

Code:

calibre, version 0.7.45
ERROR: Unhandled exception: <b>InvalidPlugin</b>:No valid plugin found in /home/russ/Downloads/Extract ISBN.zip
Traceback (most recent call last):
  File "/usr/lib64/calibre/calibre/gui2/preferences/plugins.py", line 280, in add_plugin
    plugin = add_plugin(path)
  File "/usr/lib64/calibre/calibre/customize/ui.py", line 377, in add_plugin
    plugin = load_plugin(path_to_zip_file)
  File "/usr/lib64/calibre/calibre/customize/ui.py", line 93, in load_plugin
    raise InvalidPlugin(_('No valid plugin found in ')+path_to_zip_file)
InvalidPlugin: No valid plugin found in /home/russ/Downloads/Extract ISBN.zip

Ok, how to install?

Ersatzreifen · 06-08-2011, 07:27 AM

Update:

I just tried to install a different plugin, and got the same type of error.
I restarted Calibre and tried again. No joy.

kiwidude · 06-08-2011, 07:35 AM

Look at your calibre version as it is way too old for this plugin. Upgrade calibre then try again.

kiwidude · 06-08-2011, 08:38 AM

@capnm - thx for the epub. There is no issue to do with nearness of the ISBN to the bottom of the page. In fact Extract ISBN does return a 10-digit ISBN when I run it. What it doesn't do is return the ISBN that you want - instead it returns the first one it finds, which is a few pages before that and refers to an audio edition of the book.

Were this a PDF, then it would be picking up the correct ISBN, as it checks the final pages of a book in reverse order.

Maybe it is time I try to apply that similar reverse scan logic to formats other than PDF, as you are not the first person to comment on it. The problem is that unlike a PDF, the way I access the text in an ePub is by iterating through the spine (manifest) of the book. So there is no concept of "pages", only of "files". Depending on how well the book is split, the last few "pages" might be in one file or in multiple files, in fact the whole book could be in one file. It is for the same reasons that I cannot apply the same logic of scanning only the first 10 "pages" like I do with PDFs.

So it all gets a bit messy and crude. Maybe I shall make it that I scan the very last page in reverse order first, and then scan the rest of the book in normal order.

I fixed the other bug you reported btw, as you said it doesn't really impact the functionality as such which is why no-one else noticed it but nice to get rid of the error nonetheless.

capnm · 06-08-2011, 08:55 AM

Oh ....
I guess I was confused by the fact the [desired] ISBN didn't show up in the log.
I thought you listed all the potential ISBNs found.

Rather than attempt messy page parsing, how about just preferring ISBN-13s to ISBN-10s when both are found?

(Actually I thought you were already doing that, looking at the logs ...)

kiwidude · 06-08-2011, 09:05 AM

@capnm - it does already do that. But the logic currently is to stop scanning on the page/file that it finds its first valid ISBN (parse that whole page before stopping). The assumption is that if a book had both a 10-digit and 13-digit ISBN, that they will both be in the same page/file and hence the ISBN13 will be selected it it exists on there. And as is the case here, that if the ISBNs were spread across different pages/files, that they refer to different books. It is not uncommon for books to have ISBNs for other books - just like in this case you have an ISBN for an audio edition.

capnm · 06-08-2011, 10:21 AM

Quote:

Originally Posted by kiwidude

the logic currently is to stop scanning on the page/file that it finds its first valid ISBN (parse that whole page before stopping)

Ahhh ... that's part I didn't get.

I'm pretty darn happy with the plug-in as is. I'm not sure how much you can start second-guessing the layout

You'll never avoid the books with "further reading" isbn lists, etc.

Maybe include an option to search only for 13s?

That would also quench some of the 800 number false positives I've noticed, but not worried much about. (They're easy enough to spot).

edit:
Better - config option to only return 10s if no 13 is found.

kiwidude · 06-08-2011, 11:44 AM

I've made the change to scan the last two files of non-PDF books in reverse (if the book has more than two pages). As you say it is a lottery, but I know it is common to put ISBN at the back of the book so sometimes you will get lucky.

I'm making some other changes regarding logging which will be dependent on the next Calibre release, so I won't release it until Friday.

kiwidude · 06-12-2011, 02:19 PM

Changes in this release:

Fix bug occurring when same ISBN extracted for a book
For non PDF file types, based on #files in books scan first x files, last y in reverse then rest
When scan fails, still give option to view the log rather than standard error dialog

Note that this requires Calibre 0.8.5 or later.

Philosopher · 06-13-2011, 03:51 AM

Is it possible to integrate this plug-in with the Jobs indicator to monitor its progress and know that it is still working?

kiwidude · 06-13-2011, 05:24 AM

Not any more than it already has, for when you do a batch of books. It cannot report progress within a book.

nynaevelan · 06-24-2011, 02:05 PM

Hi Kiwidude:

Since I FINALLY have my library to where I want it to be and all my fabulous plugins are doing what I want/need them to do, it is now time to start playing with some new plugins.

I was looking at this one and I was thinking this would be a good tool to check to see if I have the correct isbn assigned to the correct book, however this plugin appears to only download into the isbn identifier field. Is there a way to have it download into a custom column or perhaps you are a regex expert and you could help me with a regex that would take my existing isbn and move it to a custom column which I have for the isbn?? Yes I have now become as obsessed with my ebook library as I am with my digital music library.

Nyn

capnm · 06-24-2011, 03:52 PM

Quote:

Originally Posted by nynaevelan

Is there a way to have it download into a custom column or perhaps you are a regex expert and you could help me with a regex that would take my existing isbn and move it to a custom column which I have for the isbn??

Been doing just that ... it's not really even a regex:

On the search & replace tab of bulk editing metadata -
Search Mode = Regular Expression
Search Field = Identifiers
Identifier Type = isbn
Search For + Replace With = leave these blank
Destination Field = your custom column

And to clear the isbn, as above but:
Search for = .
Destination Field = Identifiers
Identifier type = isbn (here it's not a drop down, but it still works)

06-08-2011, 12:45 AM	#151
capnm Groupie Posts: 156 Karma: 10001 Join Date: Feb 2011 Device: sony	1) When I run this on a single epub, and it finds an isbn that matches the existing metadata I get this error message: Job: "Extract ISBN for 1 books" failed with error: Traceback (most recent call last): File "site-packages\calibre\gui2\threaded_jobs.py", line 83, in start_work File "calibre_plugins.extract_isbn.jobs", line 81, in extract_threaded AttributeError: 'set' object has no attribute 'append' Called with args: ([100], <calibre.library.database2.LibraryDatabase2 object at 0x05186170>) {u'notifications': <Queue.Queue instance at 0x11F8FF80>, u'abort': <threading._Event object at 0x0BD3C9D0>, u'log': <calibre.utils.logging.GUILog object at 0x0BD3C0B0>} Not a big deal since the extract works ok .... 2) It's not finding the ISBN in several epubs. This doesn't seem to be yet another dash, they're using 2D dashes. They are in the last couple of lines of the last htm if that might have something to do with it. Two examples: <p class="crt">eISBN: 978-0-375-89036-9</p> <p class="crt">v3.0</p> </div> </body> </html> <p class="center"><strong>eISBN: 978-0-307-54803-0</strong></p> <p class="center"><a class="pubhlink" href="http://www.vintagebooks.com">www.vintagebooks.com</a></p> <p class="center">v3.0</p> </div> </body> </html> 3) A side question- Is there an easy way to grab the ISBN (or other fields) from the embedded metadata? The only things I've come up with are: Reimport the epub or Open in Sigil, edit metadata,cut/paste both pretty cumbersome Thanks!

06-12-2011, 02:19 PM	#161
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	v1.3.6 Released Changes in this release: Fix bug occurring when same ISBN extracted for a book For non PDF file types, based on #files in books scan first x files, last y in reverse then rest When scan fails, still give option to view the log rather than standard error dialog Note that this requires Calibre 0.8.5 or later.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Extract ISBN from PDF?	mdroberts	Calibre	14	12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name	ChristianQ	Calibre	59	12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request	UnraisedArc	Calibre	60	03-23-2011 09:31 AM
Displaying ISBN column in the main GUI	tilleydog	Library Management	26	02-25-2011 04:08 AM

06-08-2011, 03:36 AM	#152
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@capnm - please pm me a link to your file for #2. I will look into #1, that looks like a simple bug to fix, strange no one else has noticed it. Re #3 - the other way without Sigil is to use the ebook viewer which you could similarly copy the data from, not much difference in steps though. I don't know of anything else beyond what you have listed.

06-08-2011, 07:27 AM	#154
Ersatzreifen Bibliothekar Posts: 38 Karma: 10 Join Date: Jun 2011 Location: San Jose City, Philippines Device: Galaxy Tab S w/Bookari Premium	Update: I just tried to install a different plugin, and got the same type of error. I restarted Calibre and tried again. No joy.

06-08-2011, 07:35 AM	#155
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Look at your calibre version as it is way too old for this plugin. Upgrade calibre then try again.

06-08-2011, 08:38 AM	#156
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@capnm - thx for the epub. There is no issue to do with nearness of the ISBN to the bottom of the page. In fact Extract ISBN does return a 10-digit ISBN when I run it. What it doesn't do is return the ISBN that you want - instead it returns the first one it finds, which is a few pages before that and refers to an audio edition of the book. Were this a PDF, then it would be picking up the correct ISBN, as it checks the final pages of a book in reverse order. Maybe it is time I try to apply that similar reverse scan logic to formats other than PDF, as you are not the first person to comment on it. The problem is that unlike a PDF, the way I access the text in an ePub is by iterating through the spine (manifest) of the book. So there is no concept of "pages", only of "files". Depending on how well the book is split, the last few "pages" might be in one file or in multiple files, in fact the whole book could be in one file. It is for the same reasons that I cannot apply the same logic of scanning only the first 10 "pages" like I do with PDFs. So it all gets a bit messy and crude. Maybe I shall make it that I scan the very last page in reverse order first, and then scan the rest of the book in normal order. I fixed the other bug you reported btw, as you said it doesn't really impact the functionality as such which is why no-one else noticed it but nice to get rid of the error nonetheless.

06-08-2011, 08:55 AM	#157
capnm Groupie Posts: 156 Karma: 10001 Join Date: Feb 2011 Device: sony	Oh .... I guess I was confused by the fact the [desired] ISBN didn't show up in the log. I thought you listed all the potential ISBNs found. Rather than attempt messy page parsing, how about just preferring ISBN-13s to ISBN-10s when both are found? (Actually I thought you were already doing that, looking at the logs ...)

06-08-2011, 09:05 AM	#158
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@capnm - it does already do that. But the logic currently is to stop scanning on the page/file that it finds its first valid ISBN (parse that whole page before stopping). The assumption is that if a book had both a 10-digit and 13-digit ISBN, that they will both be in the same page/file and hence the ISBN13 will be selected it it exists on there. And as is the case here, that if the ISBNs were spread across different pages/files, that they refer to different books. It is not uncommon for books to have ISBNs for other books - just like in this case you have an ISBN for an audio edition.

06-08-2011, 11:44 AM	#160
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	I've made the change to scan the last two files of non-PDF books in reverse (if the book has more than two pages). As you say it is a lottery, but I know it is common to put ISBN at the back of the book so sometimes you will get lucky. I'm making some other changes regarding logging which will be dependent on the next Calibre release, so I won't release it until Friday.

06-13-2011, 03:51 AM	#162
Philosopher Connoisseur Posts: 77 Karma: 12 Join Date: Jun 2010 Device: Kindle	Is it possible to integrate this plug-in with the Jobs indicator to monitor its progress and know that it is still working?

06-13-2011, 05:24 AM	#163
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Not any more than it already has, for when you do a batch of books. It cannot report progress within a book.

06-24-2011, 02:05 PM	#164
nynaevelan eBook Junkie Posts: 1,526 Karma: 1464018 Join Date: May 2010 Location: USA Device: Kindle Fire 2020, Kindle PW2	Hi Kiwidude: Since I FINALLY have my library to where I want it to be and all my fabulous plugins are doing what I want/need them to do, it is now time to start playing with some new plugins. I was looking at this one and I was thinking this would be a good tool to check to see if I have the correct isbn assigned to the correct book, however this plugin appears to only download into the isbn identifier field. Is there a way to have it download into a custom column or perhaps you are a regex expert and you could help me with a regex that would take my existing isbn and move it to a custom column which I have for the isbn?? Yes I have now become as obsessed with my ebook library as I am with my digital music library. Nyn

Advert

Advert