03-23-2011, 10:37 AM | #1 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
[GUI Plugin] Extract ISBN
This plugin can be used to try to find the ISBN for a book using the text within a book format. It is intended as an alternative to various script based solutions to this problem posted in this thread.
Main Features:
Special Notes:
Paypal Donations: Last edited by kiwidude; 09-08-2024 at 10:09 PM. Reason: New version |
03-23-2011, 10:54 AM | #2 |
Junior Member
Posts: 9
Karma: 12
Join Date: Mar 2011
Device: Kindle
|
I. Love. You.
Now... if we could add the extraction to the Edit Book Details window (like to the right of the ISBN text box) and then have an option to download metadata if an ISBN is found... I would have your baby. (Although, yes, I can edit a batch and then download a batch. I tend to edit one at a time so I think one at a time. ) This has worked beautifully on 480 out of 500 books. And the 20 that didn't work I confirmed were PDFs where the contents were JPG images rather than text -- so no way for the regex to pick up the ISBN. Oh, some sort of progress indicator would be beneficial. (Dunno if possible.) Last edited by talonius; 03-23-2011 at 11:07 AM. |
Advert | |
|
03-23-2011, 11:16 AM | #3 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Cool, glad it worked for you!
I agree that some sort of progress indicator would be useful. I just wanted to get "something" out there to see what the interest was, how people wanted to approach the multiple format/selection issue etc. Your point about the edit book details window also confirms why I did not invest a great deal more effort at this point beyond "proving it was possible". As this plugin just wires together and resuses a few bits of Calibre code there really isn't any technical reason why it couldn't be built natively into Calibre. It is entirely down to Kovid and whether he wants to make the functionality available from screens like the Edit Metadata and Bulk Metadata dialogs. |
03-23-2011, 12:22 PM | #4 | |
Addict
Posts: 385
Karma: 6514
Join Date: Aug 2010
Location: Denmark
Device: Kindle 3 3G+Wifi, Oasis
|
Just me being SILLY ! - Sorry
Quote:
Being a batchelor, old and all, I haven't been keeping upto date with procreation, I see Sorry All, especially talonious & kiwidude !!! WILL TRY to just read from now on, instead of being "funny" Last edited by pchrist7; 03-23-2011 at 12:27 PM. |
|
03-23-2011, 02:08 PM | #5 |
Junior Member
Posts: 9
Karma: 12
Join Date: Mar 2011
Device: Kindle
|
Minor issue: If there's a format stored in Calibre that Calibre doesn't know how to handle (DejaVu in this instance) the plugin throws an error and aborts processing.
Possible optimization: Abort searching through the book once a certain percentage/amount of text has been searched. This would help speed up the search for 95% of the books. Building it into Calibre would be fantastic but since this is the major roadblock to me finishing my catalog, I'm going to continue to push it. <g> No worries, I'm looking at how to do all of my suggestions myself as possible improvements. I work in C#/C++ professionally, just not Python/Calibre. I'll just have to buckle down and do some (gasp!) reading. As for jokes... ha! Trust me, I'm far from serious. One reason I don't participate in projects is because my joking attitude tends to grate on the more serious folks who tend to inhabit the programmer's world. |
Advert | |
|
03-23-2011, 04:32 PM | #6 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Talonius - I will push a 1.0.1 version shortly which will ensure any errors are more gracefully handled. It will also display progress in the status bar.
The optimization stuff is a tough one. The problem is that I have seen books where the copyright/ISBN information has been put at the end of the EPUB. Granted this is the exception rather than the rule, but maybe others have seen it frequently? This is the sort of operation that you will only do once on your books though so performance shouldn't be too much of an issue... Also, I think most of the slowdown will be in the time taken to convert each book into text, not the bit the plugin does of applying regex expressions on each file in it. I haven't profiled it but I am pretty confident that will be the case. What I have done is get it to short-circuit gathering ISBNs once it has found an ISBN and finished processing the current internal file of the converted format. The logic I "borrowed" from bazbar scanned the whole book and built up lists of ISBNs should a book have multiple ISBN13s for instance. I don't know enough about when that ever happens (most books I have seen have only either one or both of an ISBN10/ISBN13 but not more than that). Finishing processing a file (hopefully all ISBNs are on the same one) and then stopping should be enough. This won't help speed up books with no ISBN inside though. I am also about to make it that if you ctrl+click or shift+click on the toolbar button it will do a non-interactive decision of which format to interrogate when you have multiple. This will be based on your preferred input format list in Preferences for now. I'll wait for suggestions for alternatives before doing anything else around that. For people who only have formats produced by converting the same version that will work well. Where it won't is say if they got a PDF from somewhere and an EPUB from somewhere else, and the EPUB has had the ISBN stuff removed. Still, at least you will see in the report which books it failed to find an ISBN for, and you can always then just do a normal toolbar button click to get the interactive choice of format to extract from. Last edited by kiwidude; 03-23-2011 at 05:06 PM. Reason: Added more info about performance bottleneck |
03-23-2011, 05:15 PM | #7 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.0.1 Released
I've mentioned most of this in the previous post but to recap:
|
03-25-2011, 04:01 AM | #8 |
Connoisseur
Posts: 54
Karma: 442
Join Date: Oct 2010
Location: Detroit
Device: iPad
|
Great and very useful plugin, thanks much.
one comment though, I have been able to (inadvertently) "choke" the plugin on a document with 1800 pages and 2million words. It is a text pdf, and as it turns out there is no isbn amongst the 2 million words. Is it possible to have a "fail gracefully after x time" capability? Thanks again for what is otherwise a very useful plugin. |
03-25-2011, 06:11 AM | #9 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@garcle - see my comments above in post #6. The way to test this would be to go into convert, choose search & replace and click one of the wizard buttons. That will ask Calibre to convert the document in the exact same way that my ISBN extract does. Check how long it takes for it to do this with your big PDF file to get to a point of text being displayed in the wizard box, versus how long it takes the extract ISBN functionality.
If the times are comparative, there is nothing I can do, at least not without rewriting the text conversion functionality to perhaps say just convert a small % of the document. Which I have no intention of doing myself OTOH if you think the ISBN functionality is still significantly slower than the S&R wizard then I could take a look at it. If you point me at a download somewhere of a PDF typical of the issue I will see what I can do. |
03-26-2011, 01:15 AM | #10 | |
Member
Posts: 18
Karma: 10
Join Date: Feb 2011
Device: Nook
|
Quote:
1) Run in non-interactive by default or interactive by default. 2) Follow preferred input format, or continue searching all if not found in first? I format down some of my epubs which is my preferred format. |
|
03-26-2011, 08:35 AM | #11 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Doug-W - thanks for the suggestions. I am applying them at the moment and will push a new version when done.
When searching all formats, do you think that option should be dependent on whether the user has interactively chosen a format? i.e. If I have interactively chosen a specific format, it should always stop after seaching just that format. Whereas the "search all formats until found using preferred order" only applies when you are doing a non-interactive search? Hope I explained myself, it is very difficult to wrap the wording around as per the screenshot - any suggestions for alternate wording welcomed EDIT: Removed the screenshot, came up with a simpler approach... Last edited by kiwidude; 03-27-2011 at 12:01 PM. |
03-27-2011, 12:16 PM | #12 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.1 Released
This release adds some configuration options over the scan behaviour for when there are multiple formats for a book. You can configure both a default behaviour and an alternate behaviour (the latter when you shift+click or ctrl+click on the plugin as a toolbar button).
The options you have are:
|
03-28-2011, 12:16 AM | #13 |
Connoisseur
Posts: 54
Karma: 442
Join Date: Oct 2010
Location: Detroit
Device: iPad
|
Any way to force a refresh on the book list?
the isbns dont show up in the book list (bit do show in the book metadata editor form) after the plugin runs. |
03-28-2011, 07:52 AM | #14 |
Calibre Plugins Developer
Posts: 4,685
Karma: 2162246
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
v1.1.1 Released
This adds two things:
Thanks @garcle for reporting the refresh issue. |
03-29-2011, 05:23 AM | #15 |
Enthusiast
Posts: 49
Karma: 12
Join Date: Feb 2011
Device: Kobo Aura, Sony PRS-350 and PRS-T1
|
Thanks kiwidude for this very useful plugin!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extract ISBN from PDF? | mdroberts | Calibre | 14 | 12-16-2016 08:32 AM |
[Old Thread] Extract ISBN from file name | ChristianQ | Calibre | 59 | 12-09-2015 06:08 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 01:27 PM |
[Old Thread] Auto Extract ISBN-Feature request | UnraisedArc | Calibre | 60 | 03-23-2011 10:31 AM |
Displaying ISBN column in the main GUI | tilleydog | Library Management | 26 | 02-25-2011 05:08 AM |