MobileRead Forums - View Single Post

drMerry · 03-31-2011, 08:16 AM

An update.

Check isbn is indeed used and functions well I see.
I have made this version.
Works 2 times faster than original.
I scanned 600 epubs that had no isbn (Not checked if there was ISBN inside it)
I got 100 new ISBN-nrs

Seems nice, BUT:
I had 2 non- (but valid) isbn-nr's
There were isbn-nr's in the file. The numbers I found, where there because of a bad epub conversion.

You can not use \d. you have to use 0-9 because with \d calibre freezes on some files.
I have some trouble with multi-line

I can detect:

NUR 123
ISBN 1234567890

and

NUR 123
ISBN 123 456.78
9

0

and

123 456.789

0

but NOT
NUR 123
1234567890

In this case 1231234567 is returned as posible isbn and found bad
(EDIT: ADDED 7, Off-course I do not get 213123456..)

Maybe someone can find a solution?

I build in some restrictions to avoid some problems
13 or 10 0's is a valid isbn, but you don't want to extract that
I also test isbn 13-numbers if they start with 978 or 979. If not, I do not even test validity.

I'm a bad programmer in case of changelog, made some log info
I changed extract_isbn_code
Added strings on top of the file
changed the regex
changed loor_for_isbn_in_text

I'm not a py programmer so I someone knows a better way to do the txt.replace (strip all whitespaces (including \n and \r) and removing - and .)

At the other hand, I have sometimes put an isbn including - into the meta-info and calibre updated the info itself. so maybe only \n\r needs to be removed?
(in this case you don't even have to (and can't) test for 10 / 13 isbn. So it should go even faster

I also included a pdf with legal isbn-ranges. If you add this check, next to the validity check, you're 99.99999% sure it is an ISBN-number