View Single Post
Old 10-05-2012, 09:51 AM   #248
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,801
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by myce View Post
Extract ISBN is really great at extracting ISBNs from the books text. But this made it stumble.

From "The Definitive Guide to How Computers Do Math: Featuring the Virtual Diy Calculator" page 2:
Code:
For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data is available.
ISBN-13 978-0471-73278-5
ISBN-10 0-471-73278-8
results in the log file:
Code:
      Invalid ISBN match: 877-762-2974 
      Valid ISBN10: 3175723993 
      Invalid ISBN match: 317-572-4002 
      Invalid ISBN match: -13 978-0471-73278 
      Invalid ISBN match: -10 0-471-73278-8
I understand that it detects 3175723993 as a valid ISBN. But maybe you could make it reparse substrings if the number it found is longer than 10/13 digits. Or maybe even look for the string ISBN.{,3}1[03] explicitly and give the numbers in it's vicinity higher precedence.
IMHO only 1 parse rule at a time should be used. the last 2 broke that rule and therefore failed to find a valid ISBN. Space or Dash, not both in the same substring

once found (10 character ISBN 10), the check digit should validate (the NANP phone number should fail in near 100% of the cases the FAX number is one of those edge cases )
theducks is online now   Reply With Quote