View Single Post
Old 03-31-2011, 08:16 AM   #33
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
An update.

Check isbn is indeed used and functions well I see.
I have made this version.
Works 2 times faster than original.
I scanned 600 epubs that had no isbn (Not checked if there was ISBN inside it)
I got 100 new ISBN-nrs

Seems nice, BUT:
I had 2 non- (but valid) isbn-nr's
There were isbn-nr's in the file. The numbers I found, where there because of a bad epub conversion.

You can not use \d. you have to use 0-9 because with \d calibre freezes on some files.
I have some trouble with multi-line

I can detect:

NUR 123
ISBN 1234567890

and

NUR 123
ISBN 123 456.78
9

0

and

123 456.789

0

but NOT
NUR 123
1234567890

In this case 1231234567 is returned as posible isbn and found bad
(EDIT: ADDED 7, Off-course I do not get 213123456..)

Maybe someone can find a solution?

I build in some restrictions to avoid some problems
13 or 10 0's is a valid isbn, but you don't want to extract that
I also test isbn 13-numbers if they start with 978 or 979. If not, I do not even test validity.

I'm a bad programmer in case of changelog, made some log info
I changed extract_isbn_code
Added strings on top of the file
changed the regex
changed loor_for_isbn_in_text

I'm not a py programmer so I someone knows a better way to do the txt.replace (strip all whitespaces (including \n and \r) and removing - and .)

At the other hand, I have sometimes put an isbn including - into the meta-info and calibre updated the info itself. so maybe only \n\r needs to be removed?
(in this case you don't even have to (and can't) test for 10 / 13 isbn. So it should go even faster

I also included a pdf with legal isbn-ranges. If you add this check, next to the validity check, you're 99.99999% sure it is an ISBN-number
Attached Files
File Type: pdf RangeMessage.pdf (1.27 MB, 786 views)

Last edited by kiwidude; 05-28-2012 at 10:34 AM. Reason: Remove attachment so others do not get confused
drMerry is offline   Reply With Quote