MobileRead Forums - View Single Post - [Old Thread] Auto Extract ISBN-Feature request

TMF · 07-11-2009, 02:28 PM

Today, I opened a ticket with a similar feature request at http://calibre.kovidgoyal.net/ticket/2822:

"When importing PDF e-books, the ISBN is usually not part of the PDF metadata, but can be found on the copyright page (the page with the publication information, printing history, cataloguing information etc.) Most often it is provided on a line of its own in the form of "ISBN: xxxx", "ISBN-13: xxxx", "ISBN (hardcover): xxxx" or similar.

I am proposing an enhancement that would load the text of the first 10 or so pages of a PDF and search it for ISBNs of this type by means of a user-configurable regex. If several matches are found (e.g.: "ISBN-10", "ISBN-13" and "eISBN-10"), the user might be given the opportunity to select one from a dialog. This function would be invoked during the "Download metadata" process for books that don't already have an ISBN in their metadata, or it could be invoked manually from the "Edit metadata" dialog for individual books.

This enhancement would greatly improve the automation of the metadating of PDF files, because with a ISBN the "Fetch metadata from server" function will always provide the correct result, whereas when it relies only on author and title, it will often yield ambiguous or wrong results."

07-11-2009, 02:28 PM	#3
TMF Enthusiast Posts: 42 Karma: 10 Join Date: May 2009 Device: PRS-505	Today, I opened a ticket with a similar feature request at http://calibre.kovidgoyal.net/ticket/2822: "When importing PDF e-books, the ISBN is usually not part of the PDF metadata, but can be found on the copyright page (the page with the publication information, printing history, cataloguing information etc.) Most often it is provided on a line of its own in the form of "ISBN: xxxx", "ISBN-13: xxxx", "ISBN (hardcover): xxxx" or similar. I am proposing an enhancement that would load the text of the first 10 or so pages of a PDF and search it for ISBNs of this type by means of a user-configurable regex. If several matches are found (e.g.: "ISBN-10", "ISBN-13" and "eISBN-10"), the user might be given the opportunity to select one from a dialog. This function would be invoked during the "Download metadata" process for books that don't already have an ISBN in their metadata, or it could be invoked manually from the "Edit metadata" dialog for individual books. This enhancement would greatly improve the automation of the metadating of PDF files, because with a ISBN the "Fetch metadata from server" function will always provide the correct result, whereas when it relies only on author and title, it will often yield ambiguous or wrong results."