@Noobish and @kiwidude:
Thanks for the information. However, what I mean is this:
Here is an example I mentioned. The PDF content is:
Tel: ... (0)1279 623623
Fax: ... (0)1279 431059
...
...
ISBN 0 273 65667 8
...
Then from ISBN extract log:
Invalid ISBN match: 1279 623623
Valid ISBN10: 1279431059
Invalid ISBN match: ...
Invalid ISBN match: ...
Valid ISBN10: 0273656678
Invalid ISBN match: ...
...
New ISBN extracted of: 1279431059
Which means it uses the 1st 'Valid ISBN10' it finds (ignoring the 2nd 'true' ISBN), which makes sense, unfortunately it finds the one that's not so "valid".
Another eBook, with the same publisher:
Tel: ... (0)1279 623623
Fax: ... (0)1279 431059
...
...
ISBN: 978-0-273-71492-7
...
And the log:
Invalid ISBN match: 1279 623623
Valid ISBN10: 1279431059
Valid ISBN13: 9780273714927
Invalid ISBN match: ...
Invalid ISBN match: ...
Invalid ISBN match: ...
...
New ISBN extracted of: 9780273714927
Here it prioritizes on ISBN13, then it works well.
But as kiwidude said, "false positives" when searching for "ISBN", therefore my proposal to check against online resources, but this increases complexity for a pluggin then. Perhaps a selection/rejection among "Valid ISBN' found identified by the keyword preceding the 'ISBN' could be an improvement for the pluggin with not so much development effort.
By the way, these books are true PDF therefore the pluggin can read them.
Anyway, thanks for the pluggin, we can work with it so even if it will stay so.
|