Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-16-2012, 02:07 AM   #241
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
The biggest use I personally make of it is where I am importing books where the title and author fields are not set, such as by the book having a random filename. Rather than manually typing it in you could extract isbn and do a metadata download with the option to overwrite title and author.

Metadata downloads with an isbn will all but guarantee you a better likelihood of the right metadata from the website, since most metadata plugins will lookup by isbn if available and fallback to title and author search if not. The latter being more error prone due to spelling errors, typos, series info in title field etc.

And a small minority undoubtedly use it because they are sufficiently fussy to want the isbn field to contain the value for thei specific edition of that book.
kiwidude is offline   Reply With Quote
Old 07-28-2012, 07:09 PM   #242
stanmarsh
Member
stanmarsh began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2011
Device: galaxy tab
group extract

hello kiwidude,

i'm wondering if its possible to add some sort of group limit to extraction?
i tried selecting all books but it will take to long to finish, the group limit will group the queues into 10, 20 or 50 (depends on how powerful the computer).

e.g.
  1. library 5000 books with no identifier
  2. using identifiers:"=false"
  3. select all the books
  4. extract isbn (group into 10 - slow computer/usb2)
  5. 5000 books = 500 groups (groupings could be similar to find duplicate)
  6. translate to 500 jobs
  7. --go to work - shut down computer--
  8. check calibre-jobs for what group you are in (N)
  9. close calibre
  10. --start calibre--
  11. extract isbn from group N to group N

something like that, is that possible? it will prevent calibre from hanging.

thanks
stanmarsh is offline   Reply With Quote
 
Enthusiast
Old 07-31-2012, 05:28 AM   #243
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Beta for next version

@stanmarsh - give this version a whirl. By default the batch size is 100, but you can increase/reduce it in Preferences -> Plugins -> Extract ISBN -> Configure plugin.

Note that there are a couple of side effects if the number of books you have selected is more than your batch size causing multiple jobs to be run:
  1. You will get prompted each and every time a batch completes. This will not be changed and is the same behaviour as what you see with bulk metadata downloads.
  2. As part of the output this plugin displays what books it did not even attempt to retrieve data for (e.g. book had no formats). This information will now get displayed on the first batch job completing only.
  3. There is an option which some users have turned on to execute a search to show which books have been updated. If you use this option, you are only going to see books for that batch. When the next batch completes, you will then only see books for that batch and so on.

Last edited by kiwidude; 08-01-2012 at 05:14 AM. Reason: Remove attachment as officially released
kiwidude is offline   Reply With Quote
Old 08-01-2012, 05:15 AM   #244
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v1.4.3 Released

Changes in this release:
  • Split bulk extraction into batches with size changeable via plugin configuration
kiwidude is offline   Reply With Quote
Old 08-05-2012, 10:01 PM   #245
stanmarsh
Member
stanmarsh began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Oct 2011
Device: galaxy tab
hello kiwidude!

thanks for implementing the feature request! will test it out!
stanmarsh is offline   Reply With Quote
Old 09-26-2012, 07:34 PM   #246
userpaul
Junior Member
userpaul began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2012
Device: prs-500, Ipad 2
Fantastic...thank you
userpaul is offline   Reply With Quote
Old 10-05-2012, 05:20 AM   #247
myce
Member
myce began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2012
Device: Sony PRS-T1
Extract ISBN is really great at extracting ISBNs from the books text. But this made it stumble.

From "The Definitive Guide to How Computers Do Math: Featuring the Virtual Diy Calculator" page 2:
Code:
For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data is available.
ISBN-13 978-0471-73278-5
ISBN-10 0-471-73278-8
results in the log file:
Code:
      Invalid ISBN match: 877-762-2974 
      Valid ISBN10: 3175723993 
      Invalid ISBN match: 317-572-4002 
      Invalid ISBN match: -13 978-0471-73278 
      Invalid ISBN match: -10 0-471-73278-8
I understand that it detects 3175723993 as a valid ISBN. But maybe you could make it reparse substrings if the number it found is longer than 10/13 digits. Or maybe even look for the string ISBN.{,3}1[03] explicitly and give the numbers in it's vicinity higher precedence.
myce is offline   Reply With Quote
Old 10-05-2012, 09:51 AM   #248
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,251
Karma: 5495470
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by myce View Post
Extract ISBN is really great at extracting ISBNs from the books text. But this made it stumble.

From "The Definitive Guide to How Computers Do Math: Featuring the Virtual Diy Calculator" page 2:
Code:
For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data is available.
ISBN-13 978-0471-73278-5
ISBN-10 0-471-73278-8
results in the log file:
Code:
      Invalid ISBN match: 877-762-2974 
      Valid ISBN10: 3175723993 
      Invalid ISBN match: 317-572-4002 
      Invalid ISBN match: -13 978-0471-73278 
      Invalid ISBN match: -10 0-471-73278-8
I understand that it detects 3175723993 as a valid ISBN. But maybe you could make it reparse substrings if the number it found is longer than 10/13 digits. Or maybe even look for the string ISBN.{,3}1[03] explicitly and give the numbers in it's vicinity higher precedence.
IMHO only 1 parse rule at a time should be used. the last 2 broke that rule and therefore failed to find a valid ISBN. Space or Dash, not both in the same substring

once found (10 character ISBN 10), the check digit should validate (the NANP phone number should fail in near 100% of the cases the FAX number is one of those edge cases )
theducks is offline   Reply With Quote
Old 10-05-2012, 05:45 PM   #249
myce
Member
myce began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2012
Device: Sony PRS-T1
Quote:
Originally Posted by theducks View Post
IMHO only 1 parse rule at a time should be used. the last 2 broke that rule and therefore failed to find a valid ISBN. Space or Dash, not both in the same substring
Well, yes and no. Had the publisher decided to use spaces instead of dashes, your suggestion would still find the number 13 978 0471 73278 5 which wouldn't be valid without parsing all substrings of 13 digits length.
myce is offline   Reply With Quote
Old 10-05-2012, 06:25 PM   #250
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,251
Karma: 5495470
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by myce View Post
Well, yes and no. Had the publisher decided to use spaces instead of dashes, your suggestion would still find the number 13 978 0471 73278 5 which wouldn't be valid without parsing all substrings of 13 digits length.
have you seen a book written ISBN 10 or ISBN 13 ?

ISBN and ISBN13 are more normal (ISBN 10 is redundant. ISBN is 10 chars)
theducks is offline   Reply With Quote
Old 10-06-2012, 01:43 PM   #251
myce
Member
myce began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2012
Device: Sony PRS-T1
Quote:
Originally Posted by theducks View Post
have you seen a book written ISBN 10 or ISBN 13 ?
The blank or dash before the number isn't relevant. The regexp will (should) start matching at the first digit.
Quote:
Originally Posted by theducks View Post
ISBN and ISBN13 are more normal (ISBN 10 is redundant. ISBN is 10 chars)
You are right in saying it is redundant. That doesn't mean it won't be used by publishers.
myce is offline   Reply With Quote
Old 10-06-2012, 02:29 PM   #252
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,251
Karma: 5495470
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by myce View Post
The blank or dash before the number isn't relevant. The regexp will (should) start matching at the first digit.
No it should start at the first Matching pattern:
The pattern should attempt to match starting at the first digit.
\d+\-\d+\-\d+\-(\d|X)
is one pattern, the first digit pair followed by a space should not be included in the match because it violates this pattern (dash separator)

Now if they had used spaces every place, you are correct, that the pattern should have started with the 10 (because we don't know where the line ends, we can')t tell that there were more digits than the pattern could capture if it had not started with the wrong number group

Because of all the various ways of printing the ISBN, lots of post capture validating needs to be done (The FAX number managed to MOD11 validate (a fairly rare case) ).
theducks is offline   Reply With Quote
Old 10-07-2012, 07:07 AM   #253
myce
Member
myce began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2012
Device: Sony PRS-T1
Quote:
Originally Posted by theducks View Post
No it should start at the first Matching pattern:
The pattern should attempt to match starting at the first digit.
\d+\-\d+\-\d+\-(\d|X)
is one pattern, the first digit pair followed by a space should not be included in the match because it violates this pattern (dash separator)
That was what I tried to express by the "should" in parenthesis. First I wrote will cause I would have implemented the pattern with a starting digit. Then I saw from the logfile that it is currently implemented differently, cause the match started with the dash.

The pattern you suggested will only match if the ISBN contains exactly three dashes. I'd implement something along the lines of \d(\d| |-)+(\d|X) and then validating all substrings consisting of 10 and 13 digits.

But I fail to see why we are discussing about the best way to implement this. I could see a point in a discussion with the maintainer of the plugin. But that would be kiwidude. It's his decision how he writes his plugin. I just wanted to point out a case in which the current implementation fails.
myce is offline   Reply With Quote
Old 10-09-2012, 11:29 AM   #254
RotAnal
Enthusiast
RotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheeseRotAnal can extract oil from cheese
 
Posts: 37
Karma: 1234
Join Date: Sep 2012
Device: Onyx Boox M92
I have noticed that during the exectution of the Extract ISBN plugin certain attempts take much more time than the others (the execution seems to hang to a particular % and the CPU time raises up), thus slowing down the whole search process.
Thereafter, a part of the files processed invariably fails to return any ISBN and one has to manually extract them anyway.
Therefore I wonder if the developer could work out some adjustable timeout in the GUI to limit the time wasted towards a single, high-probability failed search.
RotAnal is offline   Reply With Quote
Old 10-10-2012, 05:09 AM   #255
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@RotAnal - the majority of any computation involved in terms of "slowdown" is converting the book to a format it can extract the ISBN from. If your book is an EPUB (or indeed a PDF) then the performance will be as good as it is going to get, for any other format it must convert to EPUB behind the scenes which is where the lag is involved.

The actual time take searching for ISBN's is miniscule by comparison, and it is already optimised to only search a small proportion of pages at the front and back of the book rather than the whole thing.
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Extract ISBN from file name ChristianQ Calibre 56 05-20-2012 09:59 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM
[Old Thread] Auto Extract ISBN-Feature request UnraisedArc Calibre 60 03-23-2011 09:31 AM
Displaying ISBN column in the main GUI tilleydog Library Management 26 02-25-2011 04:08 AM
Extract ISBN from PDF? mdroberts Calibre 10 12-15-2009 01:35 AM


All times are GMT -4. The time now is 04:50 AM.


MobileRead.com is a privately owned, operated and funded community.