Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 02-18-2019, 04:04 PM   #1
Cynosarges
Junior Member
Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.
 
Posts: 8
Karma: 53566
Join Date: Mar 2017
Device: Kindle Touch
Preserving the hyphens in ISBN number

Is there any way in Calibre to preserve the hyphens in the ISBN in the Ids field?

I am raising this question after spotting the spotty quality of the Publisher field that the Download Metadata obtains from Amazon, Google Books and OCLC Worldcat.

The ISBN has a formal structure 978 (EAN prefix) - <Registration group> (approximates to language) - <Registrant> (Publisher/Imprint) - <Publication element> (Publisher's book/edition Id) - <Check digit>
This means that the ISBN encodes an authorative identifier for publishers, not subect to the vagaries of data entry by Amazon, Google Books, or library staff.

This would allow identification and correction of Metadata errors, using Calibre's catalogue function to list publisher and ISBN, ordered by ISBN. However, as the elements are variable length, it is far easier to see the Registrant if the hyphens remain in the data.

I don't want to add a Custom column, as this would duplicate data. Currently, I am considering exporting a CSV file, and writing an Excel macro to insert hyphens. However this is an ugly fudge, and I wondered whether there was any way to stop Calibre removing the hyphens (and loosing useful information in the process).
Cynosarges is offline   Reply With Quote
Old 02-18-2019, 06:14 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,798
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
The information is still there, you just lost the parse (publisher and book number) implied part. BPH gets 1 or 2 digits of Publisher, while a vanity press may only get 1 digit of book number.

Note ISBN 13 (EAN) does not have the check digit that ISBN10 does (0-9,X)
theducks is offline   Reply With Quote
Old 02-18-2019, 06:52 PM   #3
gbm
Wizard
gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.gbm ought to be getting tired of karma fortunes by now.
 
Posts: 2,082
Karma: 8796704
Join Date: Jun 2010
Device: Kobo Clara HD,Hisence Sero 7 Pro RIP, Nook STR, jetbook lite
Quote:
Originally Posted by Cynosarges View Post
Is there any way in Calibre to preserve the hyphens in the ISBN in the Ids field?

I am raising this question after spotting the spotty quality of the Publisher field that the Download Metadata obtains from Amazon, Google Books and OCLC Worldcat.

The ISBN has a formal structure 978 (EAN prefix) - <Registration group> (approximates to language) - <Registrant> (Publisher/Imprint) - <Publication element> (Publisher's book/edition Id) - <Check digit>
This means that the ISBN encodes an authorative identifier for publishers, not subect to the vagaries of data entry by Amazon, Google Books, or library staff.

This would allow identification and correction of Metadata errors, using Calibre's catalogue function to list publisher and ISBN, ordered by ISBN. However, as the elements are variable length, it is far easier to see the Registrant if the hyphens remain in the data.

I don't want to add a Custom column, as this would duplicate data. Currently, I am considering exporting a CSV file, and writing an Excel macro to insert hyphens. However this is an ugly fudge, and I wondered whether there was any way to stop Calibre removing the hyphens (and loosing useful information in the process).
I just checked a epub downloaded from the publisher--a big five publisher--direct, unzipped it using Archive Manger, checked the opf file, and the isbn has no hyphens.

bernie
gbm is offline   Reply With Quote
Old 02-19-2019, 01:36 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is no information lost by removing hyphens, hyphens are there simply for humans to read, the ISBN number means the same thig with or without hyphens
kovidgoyal is offline   Reply With Quote
Old 02-19-2019, 04:31 AM   #5
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by theducks View Post
Note ISBN 13 (EAN) does not have the check digit that ISBN10 does (0-9,X)
Oh yes it does! It's just not the same algorithm. But the last character of an ISBN-13 is a checksum.
pdurrant is offline   Reply With Quote
Old 02-19-2019, 04:34 AM   #6
pdurrant
The Grand Mouse 高貴的老鼠
pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.pdurrant ought to be getting tired of karma fortunes by now.
 
pdurrant's Avatar
 
Posts: 71,504
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
Quote:
Originally Posted by Cynosarges View Post
Is there any way in Calibre to preserve the hyphens in the ISBN in the Ids field?
To identify the publisher in an ISBN you must have a list of publisher prefixes. So long as you have the publisher prefix, you don't need the hyphens. Just match as many character as are in each specific publisher prefix.

Note that some big publishers have multiple prefixes, and some publishers will not be consistent in matching prefix to imprint.
pdurrant is offline   Reply With Quote
Old 02-19-2019, 04:41 AM   #7
stuartjmz
Nameless Being
 
Quote:
Originally Posted by theducks View Post

Note ISBN 13 (EAN) does not have the check digit that ISBN10 does (0-9,X)
From https://www.isbn-international.org/content/what-isbn

since 1 January 2007 they now always consist of 13 digits. ISBNs are calculated using a specific mathematical formula and include a check digit to validate the number. (e.a)

Each ISBN consists of 5 elements with each section being separated by spaces or hyphens. Three of the five elements may be of varying length:

...
Check digit – this is always the final single digit that mathematically validates the rest of the number. It is calculated using a Modulus 10 system with alternate weights of 1 and 3.
  Reply With Quote
Old 02-19-2019, 04:53 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,798
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by pdurrant View Post
Oh yes it does! It's just not the same algorithm. But the last character of an ISBN-13 is a checksum.
You are correct.
The ISBN10 (MOD11) check digit is dropped and the standard EAN13 check is used.
theducks is offline   Reply With Quote
Old 02-19-2019, 04:22 PM   #9
Cynosarges
Junior Member
Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.
 
Posts: 8
Karma: 53566
Join Date: Mar 2017
Device: Kindle Touch
Hi, theducks and kovidgoyal

Quote:
Originally Posted by theducks View Post
The information is still there, you just lost the parse (publisher and book number) implied part. BPH gets 1 or 2 digits of Publisher, while a vanity press may only get 1 digit of book number.

Note ISBN 13 (EAN) does not have the check digit that ISBN10 does (0-9,X)
Quote:
Originally Posted by kovidgoyal View Post
There is no information lost by removing hyphens, hyphens are there simply for humans to read, the ISBN number means the same thig with or without hyphens
The information can only be found if combined with the current version of a 156K XML file.

The file defines the codes (of varying length) that the International ISBN agency assigns to each National ISBN agency. It then describes the ranges that each National ISBN agency will use for allocating Registrant codes. Note that these ranges are allocated by National ISBN agencies according to the structure of their national publishing industry. (Example: Israel (3-digit code) allocates 2-digit registrants in the range 00-19, while neighboring Jordan (4-digit code) allocates 2-digit registrants in the range 10-49. Their ranges for 3-digit and 4-digit registrants also differ, while Jordon has a single 1-digit registrant and Israel has a range of 5-digit registrants.)

Yes, the publisher information is contained within the 13 digits of the ISBN, but due to variable length Registration Group and Registrant, it is impossible to obtain without either
(1) hyphens separating each part of the ISBN, or
(2) code to apply the International ISBN agency XML file.

If you go to https://www.isbn-international.org/r...ile_generation, you can get a copy of the XML file (Or a PDF file for humans to read).

BTW, just to be clear, ISBN-13 has a check digit, calculated using a Modulus 10 system with alternate weights of 1 and 3. (see https://www.isbn-international.org/content/what-isbn). A different formula from the ISBN-10, but still a check digit.

I agree that the information is there, in theory. However, it is not accessible. To offer an analogy, you have a PGP encrypted message, but you do not have the decrypt key. The information is present, but unobtainable.
Cynosarges is offline   Reply With Quote
Old 02-19-2019, 10:25 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Well, I'm afraid I'm not interested in preserving hyphens in that field, a bit too much work and also I doubt many of the metadata sources preserve the hyphens either.

You could however write a simple script to process the metadata using the calibredb command line tool. It could read the XML file and thereby extract the needed information and either directly correctthe publisher or stick the hyphenated isbn value into a custom column.

If you wanted to get really ambitious you could write a calibre plugin to do this as well.
kovidgoyal is offline   Reply With Quote
Old 02-20-2019, 06:05 AM   #11
Cynosarges
Junior Member
Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.Cynosarges is no e-book dilettante.
 
Posts: 8
Karma: 53566
Join Date: Mar 2017
Device: Kindle Touch
Quote:
Originally Posted by kovidgoyal View Post
Well, I'm afraid I'm not interested in preserving hyphens in that field, a bit too much work and also I doubt many of the metadata sources preserve the hyphens either.
I understand Kovid. When I posted the question I was hoping that there was a tweak that could achieve this, or an existing plugin retained the hyphens (for some other reason), but did not make this obvious in the tweak/plugin documentation.

However I know one excellent data source that does preserve the hyphens - The Internet Speculative Fiction Database (www.isfdb.org). Although primarily limited in coverage to Science fiction and fantasy, in practice it exends to many thrillers, some detective fiction, some historical fiction. It preserves the formatted ISBN, links variant titles (where the same novel has been printed with different titles), is *much* better at documenting series data/sub-series data than than any other source I have found, and is excellent for pen names and publisher data. I use this to verify Metadata downloads.

Quote:
Originally Posted by kovidgoyal View Post
You could however write a simple script to process the metadata using the calibredb command line tool. It could read the XML file and thereby extract the needed information and either directly correctthe publisher or stick the hyphenated isbn value into a custom column.

If you wanted to get really ambitious you could write a calibre plugin to do this as well.
As I had a 6 year old (unfortunately incomplete) spreadsheet of my books/ebooks containing many hyphenated ISBNs and publisher names (as the s/sheet already has publisher names, it takes me one step beyond simply obtaining publisher codes), I've already started down the road of an excel macro. When the spreadsheet is missing data, I add a row using the data from an ISFDB query. Ugly but practical. In the future, I might, as an intellectual exercise, look at what would be involved in writing a plug-in to use the ISFDB as a metadata source, but this would be after I complete my retirement abroad in about 18 months - too late for me to use now.
Cynosarges is offline   Reply With Quote
Old 02-20-2019, 12:12 PM   #12
DaltonST
Deviser
DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.DaltonST ought to be getting tired of karma fortunes by now.
 
DaltonST's Avatar
 
Posts: 2,265
Karma: 2090983
Join Date: Aug 2013
Location: Texas
Device: none
Since Calibre will automatically strip out the hyphens given any Edit Metadata chance to do so, you could temporarily add the hyphens, then immediately copy part of the ISBN into a custom column using Bulk Metadata Edit Search & Replace.

This SQL will work in any SQLite tool, such as 'SQLite Expert-Personal' (which is free) to add the hyphens before you use BME S&R.

Code:
UPDATE identifiers SET val = (SUBSTR(val,1,3)||'-'||SUBSTR(val,4,1)||'-'||SUBSTR(val,5,3)||'-'||SUBSTR(val,8,5)||'-'||SUBSTR(val,13,1)                   )
WHERE type = 'isbn' AND val NOT LIKE '%-%' /* AND book = 25376 */  /* example output: 978-0-684-84328-5 */ /* These are comments /*
You can remove the comments around the 'AND book=nnnnn' while you test if you wish to impact only a particular book.


DaltonST
DaltonST is offline   Reply With Quote
Old 02-20-2019, 02:19 PM   #13
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 35,367
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
One issue is that in content.opf, the hyphens from the ISBN seem to be removed in almost all of the ebooks that included that information. I found that I had to check inside the book to see if the ISBN is there, most often on the copyright page. Quite often more than 1 ISBN is present there since for some reason, publishers seem to like both the print and ebook ISBNs to be present.
DNSB is offline   Reply With Quote
Old 02-20-2019, 02:29 PM   #14
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,565
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@Cynosarges - if you have the hyphenated ISBN values in a spreadsheet, you could wrangle them into a calibre custom column via the Import List plugin. The plugin can match rows in a CSV table to books in a library.

BR
BetterRed is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove hyphens from ISBN in bulk? Frizzell Library Management 2 11-08-2017 08:05 PM
Djvu: Extracting ISBN numbers from a large number of books? MelBr Other formats 7 04-13-2014 03:35 AM
.mobi to PDF preserving page number metadata msteuernagel Conversion 0 05-07-2012 11:56 AM
Stupid Question: ISBN-10 and ISBN-13 Tegan Library Management 4 03-11-2011 01:20 AM
ISBN number question Brad Chambers Writers' Corner 3 01-25-2011 06:06 PM


All times are GMT -4. The time now is 06:47 PM.


MobileRead.com is a privately owned, operated and funded community.