An update.
Check isbn is indeed used and functions well I see.
I have made this version.
Works 2 times faster than original.
I scanned 600 epubs that had no isbn (Not checked if there was ISBN inside it)
I got 100 new ISBN-nrs
Seems nice, BUT:
I had 2 non- (but valid) isbn-nr's
There were isbn-nr's in the file. The numbers I found, where there because of a bad epub conversion.
You can not use \d. you have to use 0-9 because with \d calibre freezes on some files.
I have some trouble with multi-line
I can detect:
NUR 123
ISBN 1234567890
and
NUR 123
ISBN 123 456.78
9
0
and
123 456.789
0
but NOT
NUR 123
1234567890
In this case 1231234567 is returned as posible isbn and found bad
(EDIT: ADDED 7, Off-course I do not get 213123456..)
Maybe someone can find a solution?
I build in some restrictions to avoid some problems
13 or 10 0's is a valid isbn, but you don't want to extract that
I also test isbn 13-numbers if they start with 978 or 979. If not, I do not even test validity.
I'm a bad programmer in case of changelog, made some log info
I changed extract_isbn_code
Added strings on top of the file
changed the regex
changed loor_for_isbn_in_text
I'm not a py programmer so I someone knows a better way to do the txt.replace (strip all whitespaces (including \n and \r) and removing - and .)
At the other hand, I have sometimes put an isbn including - into the meta-info and calibre updated the info itself. so maybe only \n\r needs to be removed?
(in this case you don't even have to (and can't) test for 10 / 13 isbn. So it should go even faster
I also included a pdf with legal isbn-ranges. If you add this check, next to the validity check, you're 99.99999% sure it is an ISBN-number
Last edited by kiwidude; 05-28-2012 at 10:34 AM.
Reason: Remove attachment so others do not get confused
|