What is "id=" For?

enuddleyarbl · 06-14-2022, 05:19 PM

Sorry for this. It's been bothering me for a while, but, I just can't figure it out. What is an ID even for in an epub? According to things like:

https://www.w3.org/publishing/epub3/...tml#attrdef-id

it's a "shared attribute" and "The ID [XML] of the element, which MUST be unique within the document scope." So, I assumed it was just a name applied to part of the epub that could be referenced in other places to conveniently work with them.

But, for instance, right now, I'm looking at a book in the Editor and all the ID= elements are like:

<body id="BE6O0-76267ef1661c4ca4a716bfbfb65daab2" class="calibre7">
or
<p class="calibre8"><a id="c07" class="title"></a></p>

The body ones are understandable. They're for pointing at the chapters and are referenced in the text table of contents and in the toc.ncx file. But, AFAICS, the ids that follow after those body ones (they're simple strings looking like "c07" (which in this case stands for Chapter 7)) aren't referenced anywhere. They're not in the text table of contents, the toc.ncx file or anywhere else in the document except where they're first defined. What are they for? Are they just an artifact of Calibre's conversion process?

And, while I'm embarrassing myself here, what's with the <a ...></a> that hold those ids? I thought those were to reference http pages somewhere with an "href=".

theducks · 06-14-2022, 07:11 PM

'a' = Anchor
Need to be able to jump to someplace (other than start of section. Top of file is assumed)?
Make an Anchor place to land on.

The other end (calling) has where to go.
a simple #C07 means it is in the same section (file)

https://www.w3.org/publishing/epub3/...tml#attrdef-id
This is an Off page (site) anchor reference

enuddleyarbl · 06-14-2022, 09:46 PM

Thanks for the reply. Your link to w3 got mangled in adding it here. But, I found another location to explain things:

https://www.w3schools.com/htmL/html_id.asp

However, I went through all the files in the epub I'm currently looking at and none of those simple ids are referenced anywhere except where they're defined. I even searched for all occurrences of # and they were always associated with the non-simple ids in the body statements.

The only thing I can think of is that those simple ids are from the original epub (since they closely resemble the chapter names) and were used in the original TOC. But, maybe in Calibre's conversion or my playing around with editing the TOC, they got superseded with those big honkin' ids in the body statement. But, even that is odd since the id's shouldn't be defined in the anchor statements. They should just be referenced.

Anyway, thanks again.

theducks · 06-14-2022, 09:52 PM

Quote:

Originally Posted by DaveLessnau

Thanks for the reply. Your link to w3 got mangled in adding it here. But, I found another location to explain things:

https://www.w3schools.com/htmL/html_id.asp

However, I went through all the files in the epub I'm currently looking at and none of those simple ids are referenced anywhere except where they're defined. I even searched for all occurrences of # and they were always associated with the non-simple ids in the body statements.

The only thing I can think of is that those simple ids are from the original epub (since they closely resemble the chapter names) and were used in the original TOC. But, maybe in Calibre's conversion or my playing around with editing the TOC, they got superseded with those big honkin' ids in the body statement. But, even that is odd since the id's shouldn't be defined in the anchor statements. They should just be referenced.

Anyway, thanks again.

IMHO you are better off using a full id. deleting then Splitting and can orphan the simple reference (AKA Break it)

rjwse@aol.com · 06-15-2022, 07:16 AM

I think of ID and CLASS as follows: when starting from scratch, don't use ID at all. It is a 'one-shot' designation. You can only use it once. A CLASS definition can be used as many times as you want. In the stylesheet it has a dot in front of it and the word is made up, something like .whatever{stylewhatever="something"; stylewhatever2="something"; stylewhatever3="something";} In the xhtml there is no dot and the CLASS is inside of a tag, such as, <P class="whatever">text</p> Total purists do not use CLASSES at all and go to the extra effort of using only styles. This requires extra typing. It has a great advantage of being able to see errors from the get-go, whereas classes are real head scratchers sometimes. Using classes lets you use advanced stuff that except for people who goof with ultracomplex stuff you probably don't need. Nevertheless, I do. Best regards, Pop

enuddleyarbl · 06-15-2022, 11:24 AM

If the formatting isn't so bad (i.e., if I can actually read the book on my Forma), I usually just leave everything alone. But, if they've done something silly enough that it bothers me while reading, I go into the Editor and clean things up. In general, I hate those class statements (I'm not smart enough to figure them out). So, I usually delete most of them and use my generic styles. I've already gotten rid of the class statements on the <div>s around every paragraph and converted them over to <p>s. Since those anchored id= things don't seem to be used, I'm going to rip out the whole line. Then I'll do the same for the formatting around the chapter headings and just use <h2>s there. I don't understand why these publishers put all these weird things into what should be a simple, consistent, easy-to-read set of formats for a book. If it were a web page, fine (I guess). But, its a book. It was sold as a book. For a specific ereader. There's no reason for this kind of stuff.

KevinH · 06-15-2022, 02:58 PM

The ids can be referenced in css selectors, ncx (toc, pagelist), opf guide, nav (toc, landmarks, pagelist), external cfi's, javascripts (if epub3 using javascript), smil, and opf (internally) in general not to mention normal links, footnotes, endnotes, etc.

So before deleting them, check carefully.

JSWolf · 06-15-2022, 03:52 PM

I've noticed that a lot of publisher ID's are just to say what the section is that you are reading and have no other reason to be there and can be deleted.

Quoth · 06-22-2022, 07:21 AM

Quote:

Originally Posted by JSWolf

I've noticed that a lot of publisher ID's are just to say what the section is that you are reading and have no other reason to be there and can be deleted.

One ebook from a big publisher had a unique ID on EVERY paragraph. I only keep the ones used by the TOC, i.e. Chapter headings and similar.

Regex is your friend!

enuddleyarbl · 06-22-2022, 09:40 AM

The book I just edited has ids for each page number. For instance:

<a id="page_4"/>

I wonder if that's used by things like the default epub reader on Kobo? It's sure not used anywhere in the document (at least after what I did to it

).

KevinH · 06-22-2022, 10:33 AM

Most likely it is for a PageList for an ncx or nav section. They probably match a specific printed release. Useful if the book is academic and citations to pages or page ranges are needed. But off-times just left over from ocr scans.

BobC · 06-22-2022, 01:04 PM

Quote:

Originally Posted by DaveLessnau

The book I just edited has ids for each page number. For instance:

<a id="page_4"/>

I wonder if that's used by things like the default epub reader on Kobo? It's sure not used anywhere in the document (at least after what I did to it

).

This sort of id can be useful when comparing a badly OCR'd EPUB with a PDF when you are correcting the EPUB to correct spelling to match the words used on the PDF. It can make it easier to locate the offending word in the PDF if yo know what page it is on.

I doubt if that is why the ids were generated in the first place but I've been very thankful for them on a couple of occasions.

BobC

JSWolf · 06-22-2022, 04:06 PM

Quote:

Originally Posted by Quoth

One ebook from a big publisher had a unique ID on EVERY paragraph. I only keep the ones used by the TOC, i.e. Chapter headings and similar.

Regex is your friend!

I've seen that ID per paragraph on a number of eBooks. Really stupid IMHO.

theducks · 06-22-2022, 05:18 PM

Quote:

Originally Posted by JSWolf

I've seen that ID per paragraph on a number of eBooks. Really stupid IMHO.

But the question is: 'Do they really HARM anything?' They do add to the file size, but so does 14 screens of raves

, excerpts from other work...
On a modern device, we may fit one less book (that I probably wont have time to read this year

anyway).

We should worry more about things that don't work correctly like dead or wrong landing links.

enuddleyarbl · 06-22-2022, 11:02 PM

In my case, I'm mostly worried about what happens if I delete those ids. So far, I've had no problem deleting every id= thing I've found. Usually, those are just for TOC types of things. But, after ripping everything out of the files and putting the proper <h1> and <h2> tags where I need them, I have the Calibre Editor recreate the toc.ncx file and then have it create an inline TOC from that. I then replace the book's inline TOC with the Calibre generated one. Again, no problems yet.

Of course, some of the silly HTML I think I'm seeing does bother me and I do wish publishers would be a bit more reasonable in what they put in there. But, then again, I also wish they'd read and correct the resulting books after they OCR scan them to a digital format.

06-14-2022, 05:19 PM	#1
enuddleyarbl Guru Posts: 776 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Forma	What is "id=" For? Sorry for this. It's been bothering me for a while, but, I just can't figure it out. What is an ID even for in an epub? According to things like: https://www.w3.org/publishing/epub3/...tml#attrdef-id it's a "shared attribute" and "The ID [XML] of the element, which MUST be unique within the document scope." So, I assumed it was just a name applied to part of the epub that could be referenced in other places to conveniently work with them. But, for instance, right now, I'm looking at a book in the Editor and all the ID= elements are like: <body id="BE6O0-76267ef1661c4ca4a716bfbfb65daab2" class="calibre7"> or <p class="calibre8"><a id="c07" class="title"></a></p> The body ones are understandable. They're for pointing at the chapters and are referenced in the text table of contents and in the toc.ncx file. But, AFAICS, the ids that follow after those body ones (they're simple strings looking like "c07" (which in this case stands for Chapter 7)) aren't referenced anywhere. They're not in the text table of contents, the toc.ncx file or anywhere else in the document except where they're first defined. What are they for? Are they just an artifact of Calibre's conversion process? And, while I'm embarrassing myself here, what's with the <a ...></a> that hold those ids? I thought those were to reference http pages somewhere with an "href=".

06-22-2022, 11:02 PM	#15
enuddleyarbl Guru Posts: 776 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Forma	In my case, I'm mostly worried about what happens if I delete those ids. So far, I've had no problem deleting every id= thing I've found. Usually, those are just for TOC types of things. But, after ripping everything out of the files and putting the proper <h1> and <h2> tags where I need them, I have the Calibre Editor recreate the toc.ncx file and then have it create an inline TOC from that. I then replace the book's inline TOC with the Calibre generated one. Again, no problems yet. Of course, some of the silly HTML I think I'm seeing does bother me and I do wish publishers would be a bit more reasonable in what they put in there. But, then again, I also wish they'd read and correct the resulting books after they OCR scan them to a digital format. Last edited by enuddleyarbl; 06-22-2022 at 11:04 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Pressing "Restore Defaults" under "Book Details" wipes all "Look & Feel" settings.	MarjaE	Library Management	1	03-30-2021 11:46 AM

06-14-2022, 07:11 PM	#2
theducks Well trained by Cats Posts: 31,047 Karma: 60358908 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	'a' = Anchor Need to be able to jump to someplace (other than start of section. Top of file is assumed)? Make an Anchor place to land on. The other end (calling) has where to go. a simple #C07 means it is in the same section (file) https://www.w3.org/publishing/epub3/...tml#attrdef-id This is an Off page (site) anchor reference

06-14-2022, 09:46 PM	#3
enuddleyarbl Guru Posts: 776 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Forma	Thanks for the reply. Your link to w3 got mangled in adding it here. But, I found another location to explain things: https://www.w3schools.com/htmL/html_id.asp However, I went through all the files in the epub I'm currently looking at and none of those simple ids are referenced anywhere except where they're defined. I even searched for all occurrences of # and they were always associated with the non-simple ids in the body statements. The only thing I can think of is that those simple ids are from the original epub (since they closely resemble the chapter names) and were used in the original TOC. But, maybe in Calibre's conversion or my playing around with editing the TOC, they got superseded with those big honkin' ids in the body statement. But, even that is odd since the id's shouldn't be defined in the anchor statements. They should just be referenced. Anyway, thanks again.

06-15-2022, 07:16 AM	#5
rjwse@aol.com Addict Posts: 320 Karma: 2228060 Join Date: Dec 2013 Location: LaVernia, Texas Device: kindle epub readers on android	I think of ID and CLASS as follows: when starting from scratch, don't use ID at all. It is a 'one-shot' designation. You can only use it once. A CLASS definition can be used as many times as you want. In the stylesheet it has a dot in front of it and the word is made up, something like .whatever{stylewhatever="something"; stylewhatever2="something"; stylewhatever3="something";} In the xhtml there is no dot and the CLASS is inside of a tag, such as, <P class="whatever">text</p> Total purists do not use CLASSES at all and go to the extra effort of using only styles. This requires extra typing. It has a great advantage of being able to see errors from the get-go, whereas classes are real head scratchers sometimes. Using classes lets you use advanced stuff that except for people who goof with ultracomplex stuff you probably don't need. Nevertheless, I do. Best regards, Pop

06-15-2022, 11:24 AM	#6
enuddleyarbl Guru Posts: 776 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Forma	If the formatting isn't so bad (i.e., if I can actually read the book on my Forma), I usually just leave everything alone. But, if they've done something silly enough that it bothers me while reading, I go into the Editor and clean things up. In general, I hate those class statements (I'm not smart enough to figure them out). So, I usually delete most of them and use my generic styles. I've already gotten rid of the class statements on the <div>s around every paragraph and converted them over to <p>s. Since those anchored id= things don't seem to be used, I'm going to rip out the whole line. Then I'll do the same for the formatting around the chapter headings and just use <h2>s there. I don't understand why these publishers put all these weird things into what should be a simple, consistent, easy-to-read set of formats for a book. If it were a web page, fine (I guess). But, its a book. It was sold as a book. For a specific ereader. There's no reason for this kind of stuff.

06-15-2022, 02:58 PM	#7
KevinH Sigil Developer Posts: 8,759 Karma: 5706256 Join Date: Nov 2009 Device: many	The ids can be referenced in css selectors, ncx (toc, pagelist), opf guide, nav (toc, landmarks, pagelist), external cfi's, javascripts (if epub3 using javascript), smil, and opf (internally) in general not to mention normal links, footnotes, endnotes, etc. So before deleting them, check carefully.

06-15-2022, 03:52 PM	#8
JSWolf Resident Curmudgeon Posts: 79,740 Karma: 145864619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	I've noticed that a lot of publisher ID's are just to say what the section is that you are reading and have no other reason to be there and can be deleted.

06-22-2022, 09:40 AM	#10
enuddleyarbl Guru Posts: 776 Karma: 1538394 Join Date: Sep 2013 Device: Kobo Forma	The book I just edited has ids for each page number. For instance: <a id="page_4"/> I wonder if that's used by things like the default epub reader on Kobo? It's sure not used anywhere in the document (at least after what I did to it ).

06-22-2022, 10:33 AM	#11
KevinH Sigil Developer Posts: 8,759 Karma: 5706256 Join Date: Nov 2009 Device: many	Most likely it is for a PageList for an ncx or nav section. They probably match a specific printed release. Useful if the book is academic and citations to pages or page ranges are needed. But off-times just left over from ocr scans.

Advert

Advert