View Single Post
Old 03-23-2011, 09:37 AM   #1
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
[GUI Plugin] Extract ISBN

This plugin can be used to try to find the ISBN for a book using the text within a book format. It is intended as an alternative to various script based solutions to this problem posted in this thread.

Main Features of v1.4.4
  • Scans all formats for the selected book(s) in preferred input format order until an ISBN-13 or ISBN-10 is found
  • Runs as a background job in Calibre, prompting you to update when the scanning is completed.
  • Scans only the book content, excluding HTML tag markup.
  • For PDF formats, scans only the first 10 pages, then if ISBN not found, the last 5 pages in reverse order.
  • For other formats, scans files at the front, then a number of end files in reverse order before the remainder of the book.
  • Restricts valid ISBN-13s to those that start with 977, 978 or 979. You can add additional prefixes in the configuration if required.
  • Optionally perform a search when completed showing you only the books updated (default is off). Some users may use this to then perform a metadata download.

Special Notes:
  • Requires calibre v0.8.54 or later.
  • As this runs in the background, you must be careful not to change the books being scanned while it is running. Changing the metadata such as title or author, deleting a book or performing a conversion will risk causing a problem. Restrict any editing to other books in your library while the scan is running and you will be fine.

Installation Notes:
  • Download the attached zip file and install the plugin/add to context menu or toolbar/restart calibre as described in the Introduction to plugins thread.

Paypal Donations:
  • If you find this or any of my other plugins useful please feel free to show your appreciation. I have spent many hundreds of unpaid hours in their development and support so any encouragement for me to continue is appreciated!

Version History:
Spoiler:

Version 1.4.4 - 30 Jul 2014
Support for upcoming calibre 2.0

Version 1.4.3 - 01 Aug 2012
Split bulk extraction into batches with size changeable via plugin configuration

Version 1.4.2 - 03 Jun 2012
Minimum version set to calibre 0.8.54 (but preferred version is 0.8.55)
Performance optimisation for epubs for calibre 0.8.51 to reduce unneeded computation
Change to calibre API for deprecated dialog which caused issues that intermittently crashed calibre
Minor fix to ensure HTMLPreProcessor object is initialised correctly
Change to using different pdf engines for pdf processing due to calibre 0.8.53 breaking the one I was using.
Stability improvement will activate with calibre 0.8.55 by running pdf analysis on a forked thread

Version 1.4.1 - 12 Nov 2011
Exclude leading spaces before the ISBN number which prevented some valid ISBNs from being detected.

Version 1.4.0 - 11 Sep 2011
Upgrade to support the centralised keyboard shortcut management in Calibre

Version 1.3.7 - 02 Jul 2011
Fix bug of question dialog when metadata has changed not being displayed

Version 1.3.6 - 12 Jun 2011
Fix bug occurring when same ISBN extracted for a book
For non PDF file types, based on #files in books scan first x files, last y in reverse then rest
When scan fails, still give option to view the log rather than standard error dialog

Version 1.3.5 - 25 May 2011
Add yet another unicode variation of the hyphen separator to the regex

Version 1.3.4 - 21 May 2011
Run the ISBN extraction out of process to get around the memory leak issues

Version 1.3.3 - 19 May 2011
Ensure stripped HTML tags replaced with a ! to prevent ISBN running into another number making it invalid

Version 1.3.2 - 17 May 2011
Strip the <style> tag contents to ensure panose-1 numbers are not picked up as false positives

Version 1.3.1 - 06 May 2011
Strip non-ascii characters from the pdfreflow xml which caused it to be invalid
Support the ^ character being part of the ISBN number
Attempt to minimise any memory leak issues caused by this plugin itself

Version 1.3 - 29 Apr 2011
Do all scanning as a background job to keep the UI responsive
Remove all interactive UI options - it will now always scan all formats in preferred order
Make sure that ISBN-13s start with 977, 978 or 979 (configurable).
Exclude the various repeating digit ISBNs of 1111111111 etc.
Exclude all html markup tags to prevent issues like the svg sizes being picked up as ISBNs
Include endash and other dash variants as possible separators
When scanning PDF documents, scan the last 5 pages in reverse order so it is the last ISBN found
Configuration option for ISBN13 prefixes and option to show updated books when extract completes

Version 1.2.1 - 09 Apr 2011
Support skinning of icons by putting them in a plugin name subfolder of local resources/images

Version 1.2 - 03 Apr 2011
Rewritten for new plugin infrastructure in Calibre 0.7.53
ISBN matching regex replaced using an approach from drMerry
PDFs now processed with new Calibre PDF engine to scan just first 10 and last 5 pages

Version 1.1 - 28 Mar 2011
Add configuration options over the scan behaviour (default + alternate)
The options you have are:
Ask me which format to scan
Scan only the first format in preferred input order
Scan all formats in preferred input order until an ISBN found

Version 1.0.1 - 24 Mar 2011
Skip book formats which we are unable to read, such as djvu
Display progress in the status bar
Ctrl+click or shift+click on the toolbar button to do a non-interactive choice of formats where your book has multiple.
It will use the first found based on your preferred input format order list from Preferences->Behaviour

Version 1.0 - 24 Mar 2011
Initial release of Extract ISBN plugin
Attached Thumbnails
Click image for larger version

Name:	Screenshot_1_Summary.png
Views:	1440
Size:	36.0 KB
ID:	68860   Click image for larger version

Name:	Screenshot_2_Configuration.png
Views:	850
Size:	15.8 KB
ID:	69073  
Attached Files
File Type: zip Extract ISBN.zip (76.6 KB, 2520 views)

Last edited by kovidgoyal; 07-29-2014 at 11:41 PM. Reason: v1.4.4 Released
kiwidude is offline   Reply With Quote