[GUI Plugin] Extract ISBN - Page 9

wolfelric · 05-08-2011, 11:09 AM

Awesome plugin, been hoping for this.

crivicris · 05-13-2011, 03:39 AM

I have just discovered plugins for my calibre, and this one is a must for me. Thanks

kiwidude · 05-13-2011, 12:48 PM

Thanks crivicris and welcome to MobileRead. There will be a new version of this plugin at some point, so if you havent already I suggest installing the Plugin Updater plugin to make it easier to keep up to date and install other plugins that take your fancy...

xXTGMKXx · 05-14-2011, 05:28 AM

Hi, I am new to this forum. I searched far and wide for a better program than Calibre and found none. Despite issues I found troublesome, the steady stream of updates solved them in a timely manner. Kudos to the development team. Now that I have discovered plugins, I thought I would contribute my observations to the development of this key aspect of the program.

First of all thank you to the author of ExtractISBN. I am in full agreement with an earlier poster that the edit metadata window should have a button utilizing this plugin on a piecemeal basis. The plugin performs exceptionally well on a single file and this is how I prefer to update my catalog... since choice of covers [and the alteration of some metadata] is so subjective.

I download collections of books, it is a compulsion. I am branching out as quickly as I can organize them with Calibre. Your plugin has been instrumental in this regard. I would like to share my experience with it (ExtractISBN 1.3.1 - Windows Vista [I know] - 1GB Ram - Calibre 0.8.1).

With the plugin set to run on a small collection of 6700 - the progress seems to slow to a crawl on a linear curve until the UI hangs. For example 1 book extraction is instantaneous - 100 is 8 minutes - 250 is 30 minutes - 500 is one hour 15 minutes - 1000 is 3 hours and beyond that honestly I haven't had the patience... the UI is unresponsive for increasingly long periods of time. If you could fix this issue I would be ever so thankful. I'm not a good programmer by any means... but I have an idea... Is it possible (I could be wrong by a wide margin here so be kind) that you save the results of the search to memory... and that instead a hard-file could be updated after each successful hit... and that at the end of the job the file referenced for application of changes? With that I don't see how there would be any discrepancy between the extremely short runtime of one file and the runtime when deep in to a collection. Like I said... I suck at coding... if it doesn't work... at least I've raised the issue.

Keep up the good work, bibliophiles and digital hoarders everywhere are in your debt!!

kiwidude · 05-14-2011, 07:30 AM

Welcome to MR and thanks for your post...

This issue has been discussed recently in this thread. The problem is caused by some nasty memory leaks inside the calibre conversion code that this plugin calls to get a standard format that it can scan for the isbn.

The simplest solution to this is to follow the same approach that doing conversions does and run the conversion and scan in an external worker process executable. So after each conversion the memory contents are completely released. Currently my approach has been to run as a separate thread inside the calibre exe like metadata downloads do, however this means that memory leaks and cannot be reclaimed without restarting calibre.

However I cannot make this change without changes to the calibre api. Currently it is not possible from a plugin to create jobs to run on an external process, as the list of known "things to do" that the worker executable understands is hard coded currently. It needs some extra code to allow being passed some info about calling code in a plugin.

I have asked Kovid to make this change, as there is likely other code changes that could be made to give me more reusable code that I could use to. He has only just returned from holiday so hopefully it might get done this week and then I can start rewriting this plugin.

The only other option is to fix the memory leaks. However having helped Kovid track down some memory leak issues in the metadata download over a 5 hour period one Sunday night I know just how painful and difficult this is. Plus it could well be that the issues lie in some library calibre calls or whatever. And multiply that out over the dozens of format converters and you can see why the simplest solution is to use the code the same way calibre does.

Glad you are finding the plugin useful, but in the meantime keep your batches small and use ctrl+R to restart calibre periodically when you see the impact.

xXTGMKXx · 05-14-2011, 08:11 AM

Quote:

Originally Posted by kiwidude

The only other option is to fix the memory leaks. However having helped Kovid track down some memory leak issues in the metadata download over a 5 hour period one Sunday night I know just how painful and difficult this is.

Thanks for your timely and well-written reply! I understand fully the problems you outlined. I am completely fine with waiting however long it takes for optimization, after all the plugin already works like a charm - not everyone will find keeping extraction batches under 1000 a problem, lol.

I am definitely a loyal Calibre user... nothing like it... so no complaints from me about waiting. On the same note, contributions from plugin developers are just as significant to my loyalty as the viability of the main platform itself.

In the meantime, I have but one more humble suggestion... which floated from the ether overnight. How about creation of a new tag within/for use with ExtractISBN. Basically the antithesis of identifier_updated; something like extract_failed - to allow marking/sorting ([extract_failed:false & identifier:false] as a sort method to select a new batch for extraction) of documents which ExtractISBN returned negative. I find myself rehashing the same files... with little in the way of keeping track. I suppose it wouldn't have to be persistent. It could have a half-life... or perhaps the value resets when calibre restarts. Or could be batch reset with a command when no longer needed. Hell... as long as the failed files are marked until the next invocation of ExtractISBN... then those files could be called and copy-deleted to a container library to get them out of the way. That would cost time, but would technically be more efficient than the process is at the moment. Just an idea.

Anyway, I must apologize for raising an issue previously discussed. I only skimmed the thread. On the other hand I was taught it never hurts to ask.

Viva Calibre!

xXTGMKXx · 05-15-2011, 06:46 AM

Quote:

How about creation of a new tag within/for use with ExtractISBN

Well I found a quick-fix... user-added column with a yes/no configuration.
I still think it's a good idea... but thought I'd throw my solution out there for those with the same problem.

kiwidude · 05-15-2011, 06:53 AM

Quote:

Originally Posted by xXTGMKXx

Well I found a quick-fix... user-added column with a yes/no configuration.
I still think it's a good idea... but thought I'd throw my solution out there for those with the same problem.

Can you not just use a search of:
isbn:false

When I added the ability to temp mark the ids that were updated, I did consider an option in the dropdown to show those that failed and in fact had it coded but ripped it out before release. I didn't include it for two reasons:

The first is that there is overlap with isbn:false. Of course isbn:false is all of your database, and not related to your selection you did the extract on.

The second is the definition of "failure". Does failure mean that it could not find an ISBN by scanning? What if the book already had an ISBN?

Or does it meant that the book was not updated with an ISBN (it might have found one but if matched an existing value on the book so did nothing).

It gets a bit murky. If we can agree a definition that would be "useful" then I can put it in a future release in that dropdown of the configuration screen for the plugin. My guess would be that you are only going to be interested in books that still do not have an ISBN from the set that you scanned?

rloveking · 05-15-2011, 11:42 PM

I apologize if this has been covered before, I did not read all of the prior posts in this forum before posting. I have been LOVING this plugin. But I'm having problems with some of my .lit files. Here is an example:

isbn found by Extract ISBN: 2360011111

ISBN part of the .lit file:

All rights reserved.

ISBN: 0-425-20743-9

BERKLEY SENSATION®

I don't see how one comes from the other. And I didn't find anywhere in the .lit file that has "236" anywhere in it.

I have 98 books (I believe all .lit) that have this "Wrong" ISBN number. All that I have looked at so far seem to have a correct ISBN within them (like above).

Any ideas?

- Becky

xXTGMKXx · 05-16-2011, 12:12 AM

Quote:

My guess would be that you are only going to be interested in books that still do not have an ISBN from the set that you scanned?

Precisely... instead of only marking books where an isbn was found (isbn:updated) my idea was to mark books where an isbn was scanned for but not found.

Now that I think about it though... it is as murky as you thought. Since search parameters would start to confuse each other. I think my solution of a yes/no column is more elegant... if you could somehow change your plugin to create a yes/no marker... let's call it "Extracted" and mark those updated with a checkmark, and those failed with an x, that would be pretty elegant. By that logic, you could still have the option to view the updated isbns at the end of the job - and you could also leave the user the option to search on their own terms... for example "identifiers:false & extracted:false" would return a clean list of documents yet to be scanned.

If all this is impractical, though... I highly recommend my user-column solution to those that come after me. It's rather easy to search identifiers:false - highlight a selection - extract that selection - bulk metadata change the selection to true (then the documents with identifiers disappear from the list) - bulk metadata change the remaining documents to false - then do a search for customtag:true. That way, a search of identifiers:false can be sorted by the customtag column... the false documents would be easily identifiable.

kiwidude · 05-16-2011, 03:40 AM

Quote:

Originally Posted by rloveking

I apologize if this has been covered before, I did not read all of the prior posts in this forum before posting. I have been LOVING this plugin. But I'm having problems with some of my .lit files. Here is an example:

isbn found by Extract ISBN: 2360011111

ISBN part of the .lit file:

I don't see how one comes from the other. And I didn't find anywhere in the .lit file that has "236" anywhere in it.

I have 98 books (I believe all .lit) that have this "Wrong" ISBN number. All that I have looked at so far seem to have a correct ISBN within them (like above).

Any ideas?

- Becky

Becky,

Can you please PM me a link to one or two of your books that have this problem so I can take a look?

dm101 · 05-17-2011, 10:18 AM

Quote:

Originally Posted by rloveking

I apologize if this has been covered before, I did not read all of the prior posts in this forum before posting. I have been LOVING this plugin. But I'm having problems with some of my .lit files. Here is an example:

isbn found by Extract ISBN: 2360011111

ISBN part of the .lit file:

I don't see how one comes from the other. And I didn't find anywhere in the .lit file that has "236" anywhere in it.

I have 98 books (I believe all .lit) that have this "Wrong" ISBN number. All that I have looked at so far seem to have a correct ISBN within them (like above).

Any ideas?

- Becky

Dear kiwidude,
I have exact the same problem with many ebooks, but i have pdf.
Greets dm101

kiwidude · 05-17-2011, 10:24 AM

@dm101 - until becky sends me a link to an example book there is nothing much I can do about it. There was always going to be a risk with loosening the regex to not search for specific text of one of the many variations of "ISBN" before it that this situation could arise.

If you want to PM me a link to the pdf then that would help, though this will likely not be exactly the same issue as becky has and so I still would need a file from her.

kiwidude · 05-17-2011, 02:48 PM

Thx @dm101 for the files. I can see what the problem is (and indeed this is probably becky's issue as well - you mentioned pdf which is why I thought it may be different but it is an ePub you sent that showed the issue). It is when you have a file with those annoying embedded font-face declarations at the top like this:

<style type="text/css">
@font-face {
font-family: Courier;
panose-1: 2 7 4 9 2 2 5 2 4 4
}

I've never understood the point of these (and rip them out of my own ePubs). Obviously with enough of them in there the chances of hitting a number that coincidentally looks like an ISBN is higher.

I already have some code in there that rips out HTML tags. I will tweak that a bit to make sure these get ignored as well when evaluating.

kiwidude · 05-17-2011, 03:23 PM

Changes in this release:

Strip the <style> tag contents to ensure panose-1 numbers are not picked up as false positives

Hopefully this should resolve some of the problems reported above of false positives on ISBNs.

05-08-2011, 11:09 AM	#121
wolfelric Junior Member Posts: 5 Karma: 10 Join Date: Dec 2008 Device: sony reader 505	Thanks Awesome plugin, been hoping for this.

05-15-2011, 11:42 PM	#129
rloveking Junior Member Posts: 1 Karma: 10 Join Date: May 2011 Device: Kindle	"Wrong" ISBN? I apologize if this has been covered before, I did not read all of the prior posts in this forum before posting. I have been LOVING this plugin. But I'm having problems with some of my .lit files. Here is an example: isbn found by Extract ISBN: 2360011111 ISBN part of the .lit file: All rights reserved. ISBN: 0-425-20743-9 BERKLEY SENSATION® I don't see how one comes from the other. And I didn't find anywhere in the .lit file that has "236" anywhere in it. I have 98 books (I believe all .lit) that have this "Wrong" ISBN number. All that I have looked at so far seem to have a correct ISBN within them (like above). Any ideas? - Becky Last edited by rloveking; 05-15-2011 at 11:52 PM. Reason: additional info

05-17-2011, 03:23 PM	#135
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	v1.3.2 Released Changes in this release: Strip the <style> tag contents to ensure panose-1 numbers are not picked up as false positives Hopefully this should resolve some of the problems reported above of false positives on ISBNs.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Extract ISBN from PDF?	mdroberts	Calibre	14	12-16-2016 07:32 AM
[Old Thread] Extract ISBN from file name	ChristianQ	Calibre	59	12-09-2015 05:08 AM
[GUI Plugin] Plugin Updater Deprecated	kiwidude	Plugins	159	06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request	UnraisedArc	Calibre	60	03-23-2011 09:31 AM
Displaying ISBN column in the main GUI	tilleydog	Library Management	26	02-25-2011 04:08 AM

05-13-2011, 03:39 AM	#122
crivicris Junior Member Posts: 2 Karma: 10 Join Date: Dec 2010 Device: Kindle 3 wifi+3G	I have just discovered plugins for my calibre, and this one is a must for me. Thanks

05-13-2011, 12:48 PM	#123
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Thanks crivicris and welcome to MobileRead. There will be a new version of this plugin at some point, so if you havent already I suggest installing the Plugin Updater plugin to make it easier to keep up to date and install other plugins that take your fancy...

05-14-2011, 05:28 AM	#124
xXTGMKXx I have this net... Posts: 7 Karma: 10 Join Date: May 2011 Location: Virginia, USA Device: Kindle	Hi, I am new to this forum. I searched far and wide for a better program than Calibre and found none. Despite issues I found troublesome, the steady stream of updates solved them in a timely manner. Kudos to the development team. Now that I have discovered plugins, I thought I would contribute my observations to the development of this key aspect of the program. First of all thank you to the author of ExtractISBN. I am in full agreement with an earlier poster that the edit metadata window should have a button utilizing this plugin on a piecemeal basis. The plugin performs exceptionally well on a single file and this is how I prefer to update my catalog... since choice of covers [and the alteration of some metadata] is so subjective. I download collections of books, it is a compulsion. I am branching out as quickly as I can organize them with Calibre. Your plugin has been instrumental in this regard. I would like to share my experience with it (ExtractISBN 1.3.1 - Windows Vista [I know] - 1GB Ram - Calibre 0.8.1). With the plugin set to run on a small collection of 6700 - the progress seems to slow to a crawl on a linear curve until the UI hangs. For example 1 book extraction is instantaneous - 100 is 8 minutes - 250 is 30 minutes - 500 is one hour 15 minutes - 1000 is 3 hours and beyond that honestly I haven't had the patience... the UI is unresponsive for increasingly long periods of time. If you could fix this issue I would be ever so thankful. I'm not a good programmer by any means... but I have an idea... Is it possible (I could be wrong by a wide margin here so be kind) that you save the results of the search to memory... and that instead a hard-file could be updated after each successful hit... and that at the end of the job the file referenced for application of changes? With that I don't see how there would be any discrepancy between the extremely short runtime of one file and the runtime when deep in to a collection. Like I said... I suck at coding... if it doesn't work... at least I've raised the issue. Keep up the good work, bibliophiles and digital hoarders everywhere are in your debt!!

05-14-2011, 07:30 AM	#125
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Welcome to MR and thanks for your post... This issue has been discussed recently in this thread. The problem is caused by some nasty memory leaks inside the calibre conversion code that this plugin calls to get a standard format that it can scan for the isbn. The simplest solution to this is to follow the same approach that doing conversions does and run the conversion and scan in an external worker process executable. So after each conversion the memory contents are completely released. Currently my approach has been to run as a separate thread inside the calibre exe like metadata downloads do, however this means that memory leaks and cannot be reclaimed without restarting calibre. However I cannot make this change without changes to the calibre api. Currently it is not possible from a plugin to create jobs to run on an external process, as the list of known "things to do" that the worker executable understands is hard coded currently. It needs some extra code to allow being passed some info about calling code in a plugin. I have asked Kovid to make this change, as there is likely other code changes that could be made to give me more reusable code that I could use to. He has only just returned from holiday so hopefully it might get done this week and then I can start rewriting this plugin. The only other option is to fix the memory leaks. However having helped Kovid track down some memory leak issues in the metadata download over a 5 hour period one Sunday night I know just how painful and difficult this is. Plus it could well be that the issues lie in some library calibre calls or whatever. And multiply that out over the dozens of format converters and you can see why the simplest solution is to use the code the same way calibre does. Glad you are finding the plugin useful, but in the meantime keep your batches small and use ctrl+R to restart calibre periodically when you see the impact.

05-17-2011, 10:24 AM	#133
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	@dm101 - until becky sends me a link to an example book there is nothing much I can do about it. There was always going to be a risk with loosening the regex to not search for specific text of one of the many variations of "ISBN" before it that this situation could arise. If you want to PM me a link to the pdf then that would help, though this will likely not be exactly the same issue as becky has and so I still would need a file from her.

05-17-2011, 02:48 PM	#134
kiwidude Calibre Plugins Developer Posts: 4,637 Karma: 2162064 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Thx @dm101 for the files. I can see what the problem is (and indeed this is probably becky's issue as well - you mentioned pdf which is why I thought it may be different but it is an ePub you sent that showed the issue). It is when you have a file with those annoying embedded font-face declarations at the top like this: <style type="text/css"> @font-face { font-family: Courier; panose-1: 2 7 4 9 2 2 5 2 4 4 } I've never understood the point of these (and rip them out of my own ePubs). Obviously with enough of them in there the chances of hitting a number that coincidentally looks like an ISBN is higher. I already have some code in there that rips out HTML tags. I will tweak that a bit to make sure these get ignored as well when evaluating.

Advert

Advert