View Single Post
Old 10-05-2012, 05:20 AM   #247
myce
Member
myce began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Oct 2012
Device: Sony PRS-T1
Extract ISBN is really great at extracting ISBNs from the books text. But this made it stumble.

From "The Definitive Guide to How Computers Do Math: Featuring the Virtual Diy Calculator" page 2:
Code:
For general information on our other products and services please contact our Customer Care
Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
Library of Congress Cataloging-in-Publication Data is available.
ISBN-13 978-0471-73278-5
ISBN-10 0-471-73278-8
results in the log file:
Code:
      Invalid ISBN match: 877-762-2974 
      Valid ISBN10: 3175723993 
      Invalid ISBN match: 317-572-4002 
      Invalid ISBN match: -13 978-0471-73278 
      Invalid ISBN match: -10 0-471-73278-8
I understand that it detects 3175723993 as a valid ISBN. But maybe you could make it reparse substrings if the number it found is longer than 10/13 digits. Or maybe even look for the string ISBN.{,3}1[03] explicitly and give the numbers in it's vicinity higher precedence.
myce is offline   Reply With Quote