23 ID characters are too many!?

chaot · 02-20-2017, 12:06 PM

Code:

  <h3 id="uwSzW8DWUPEK6RzzmX5nCi4"><a href="Inhalt.html#u2BtpeVxwIEyIFtCSZ7oQU6">Golkonda</a></h3>

So look some hundreds of linked headers in a book. IMO 23 ID-characters are too many. How can I reduce them to realistic 6 or 9 characters?

If not changeable in calibre, a regex would do the job also.

JSWolf · 02-20-2017, 12:23 PM

Yes, that's way too many characters in an ID. While it's valid, it's ugly code.

Tex2002ans · 02-21-2017, 05:27 AM

Are these IDs being generated by some outside program?

It is typically better to create human-readable code... so this gibberish:

Code:

<h3 id="uwSzW8DWUPEK6RzzmX5nCi4"><a href="Inhalt.html#u2BtpeVxwIEyIFtCSZ7oQU6">Golkonda</a></h3>

might be fixed into this:

Code:

<h3 id="Golkonda"><a href="Inhalt.html#Section5">Golkonda</a></h3>

Making it that way will also make your life a lot easier when you are trying to debug problems (broken links, links that send you to the wrong locations, etc.).

chaot · 02-21-2017, 11:47 AM

Quote:

Originally Posted by Tex2002ans

Are these IDs being generated by some outside program?

No, manually made, just as an example to show how ugly and longwinded this code gets.

Quote:

It is typically better to create human-readable code... so this gibberish:

Code:

<h3 id="uwSzW8DWUPEK6RzzmX5nCi4"><a href="Inhalt.html#u2BtpeVxwIEyIFtCSZ7oQU6">Golkonda</a></h3>

might be fixed into this:

Code:

<h3 id="Golkonda"><a href="Inhalt.html#Section5">Golkonda</a></h3>

Making it that way will also make your life a lot easier when you are trying to debug problems (broken links, links that send you to the wrong locations, etc.).

It's one possibility.

Click image for larger version

Name: Inhalt.png
Views: 192
Size: 51.8 KB
ID: 155185

Inhalt Start of Contents of one book of six. Alltogether more then thousand headings. That needs a sophisticated system. In accordance with the character on an ID such one should look like e.g. u7yl38 - like a randomly generated sequence of characters - not like Mai_und_Mais etc.

Status Quo:

Code:

IN CONTENTS
  <li><a href="Wiener_Symptome.html#Mai und Mais" id="umBEMpICNPq2GwgIRSssJq5">Mai und Mais</a></li>
IN BOOK
   <h3 id="ughWpyPh6ETHp8Eqf99tJc8"><a href="Inhalt.html#Mai und Mais">Mai und Mais</a></h3>

Your way would look like that:

Code:

IN CONTENTS
  <li><a href="Wiener_Symptome.html#Mai_und_Mais" id="Mai_und_Mais">Mai und Mais</a></li>
IN BOOK
  <h3 id="Mai_und_Mais"><a href="Inhalt.html#Mai_und_Mais">Mai und Mais</a></h3>

Warning message

Quote:

The id 1 example for example is not a valid id. IDs must start with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

Reasons: ID starts with number & contains empty spaces. I got headings with a range of unallowed characters, like parentheses ("( )"), brackets ("« »"), question marks ("?"), exclamation marks "!") etc.

Anyhow I have to find a generally acceptable way to create them automatically, with any regex. I tend to a randomly generated 6 character ID (only lower case a-z would produce 308915776 ID's, about. Errors excepted.

DiapDealer · 02-21-2017, 11:59 AM

Sorry, but nobody "manually" made ids like that. Those were automatically generated by a program of some sort. You can change them to anything you like. Unfortunately, changing them WILL be a manual process. And if the book has lots and lots of links, it may also prove painful.

No regex is gong to do what you want. You'd likely have to write a parsing algorithm that is capable of examining the current id and matching it with url fragments throughout the book so it could change them all to match. It would also need to maintain a list of all id's currently in use to ensure that your short, yet random id generator (which will need to be taught to create valid ids) creates ids that are guaranteed to be unique for that file. Your algorithm could also just create ids that started with a letter and added an auto-incrementing number (but would still need to parse the rest of the book to adjust all links as it went).

Whether you do it manually, or you write a program/script to do it for you. there's nothing but hard work and time ahead of you.

chaot · 02-21-2017, 12:02 PM

Yes, right. I configured the ID's manually together.

Do you got an idea to make the process less painfull?

First step would be a reduction of ID characters to 6, e.g!

DiapDealer · 02-21-2017, 12:28 PM

Quote:

Originally Posted by chaot

Do you got an idea to make the process less painfull?

Unfortunately, no. By the time you have a python algorithm written to do something like this FOR you, you probably could have had everything manually edited already. It certainly doesn't lend itself well to regex (except manually correcting/replacing one id at a time) and there are no "diddle my ids for me" pre-written tools that I know of with which to start. If everything works, I honestly wouldn't mess with it (unless it was a tiny book with only a handful of links).

stumped · 02-21-2017, 01:19 PM

What do the IDs actually do anyway. I have stripped all of them from some epub books with no apparent impact on readability. Everthing still seems to work ?

chaot · 02-21-2017, 01:39 PM

ID's allow navigation in books!

Normally (in usual books) is enough an one-way navigation, that is from Contents to the headers. (Even that isn't necessary.)

In my books is absolute essential to get an two-way navigation: there and back ... and more.
Keyword: something like <span class="return">top</span>

stumped · 02-21-2017, 01:53 PM

For what i read, fiction start to finish, the TOC satisfies my navigation needs.
So the ids are for visiting, returning from footnotes ?

DiapDealer · 02-21-2017, 02:57 PM

Quote:

Originally Posted by stumped

For what i read, fiction start to finish, the TOC satisfies my navigation needs.
So the ids are for visiting, returning from footnotes ?

Even a standard ToC might be broken by deleting ids. It may not be very likely in a typical work of fiction, but I still wouldn't recommend deleting them without checking.

Tex2002ans · 02-21-2017, 05:53 PM

Quote:

Originally Posted by chaot

Inhalt Start of Contents of one book of six. Alltogether more then thousand headings. That needs a sophisticated system.

Thousands of headings? What is this book, a collection of poems?

If your book has all the headings marked as <h3>, then you could use Sigil to help you.

Take your original code:

Spoiler:

If you run Sigil's Tools -> Table of Contents -> Generate Table of Contents. Now your HTML will get the BOLD parts added:

Spoiler:

Then you use Sigil to just generate the HTML Table of Contents:

Spoiler:

Now you can use Sigil's unique IDs as a basis to Regex (instead of your convoluted crazy "u3458976345" system).

Now, for some reason, you want to jump from your HTML headings BACK to the TOC... so you might want to do this:

Search: <a href="([^\#"]+#)([^"]+)">
Replace: <a id="\2" href="\1\2">

Spoiler:

Then you go back to the text and do this:

Search: <h3 id="([^"]+)">(.+?)</h3>
Replace: <h3 id="\1"><a href="../Text/TOC.xhtml#\1">\2</a></h3>

This will allow your <h3>s to point right back to their spot in the TOC:

Spoiler:

Quote:

Originally Posted by stumped

What do the IDs actually do anyway. I have stripped all of them from some epub books with no apparent impact on readability. Everthing still seems to work ?

As others have stated, in your typical Fiction, it probably wouldn't have too many uses (although each book is unique, I wouldn't go ripping IDs out without seeing if they serve some purpose).

But take Non-Fiction for example, you might have something like this:

Quote:

<p class="equation" id="Equation1.1">E = mc<sup>2</sup></p>

[...]

<p>As Einstein said in <a href="../Text/Chap1.xhtml#Equation1.1">Equation 1.1</a>, energy is mass, and mass is energy.</p>

or let us say you had an annotated version. You might use the ID to point to a specific paragraph:

Quote:

<p id="ActIII.Scene1.p20"><b>Hamlet.</b> To be, or not to be―that is the question:</p>

[...]

<p>One of the most famous lines in all of literature <a href="../Text/ActIII.xhtml#ActIII.Scene1.p20">was spoken by Hamlet</a>.</p>

So let us say you ripped out all of the IDs out of Hamlet... on the surface everything looks A-OK... but if you pushed the link, it wouldn't lead you to the correct location (or the link might not work at all).

stumped · 02-22-2017, 12:55 AM

Quote:

Originally Posted by DiapDealer

Even a standard ToC might be broken by deleting ids. It may not be very likely in a typical work of fiction, but I still wouldn't recommend deleting them without checking.

probably I get away with it as I use polish / modify to repair TOCs, or sigil to generate new ones if I've removed stuff - like next in series extracts or show-off epigraphs. I don't do it always but I think i've had some books with an ID on every paragraph, making it harder to see the stylings
they are on my mental list of stuff that's not essential ,like those mobipagebreak stylings at the end of every epub chapter

DiapDealer · 02-22-2017, 09:46 AM

Quote:

Originally Posted by stumped

probably I get away with it as I use polish / modify to repair TOCs, or sigil to generate new ones if I've removed stuff - like next in series extracts or show-off epigraphs. I don't do it always but I think i've had some books with an ID on every paragraph, making it harder to see the stylings
they are on my mental list of stuff that's not essential ,like those mobipagebreak stylings at the end of every epub chapter

If you're regenerating ToCs (and all links point to a file instead of a point IN a file), then you'll probably be fine. The only time you may run into trouble is when there is more than one chapter (or toc entry) in one xhtml file. Or you have footnote links to multiple locations in one endnote file (even fiction occasionally has foot|end notes).

In short: any situation with multiple links to multiple points in ONE destination file will require unique ids to function properly.

chaot · 02-22-2017, 11:48 AM

Quote:

Originally Posted by stumped

For what i read, fiction start to finish, the TOC satisfies my navigation needs.

But appetite grows with eating!

Quote:

So the ids are for visiting, returning from footnotes ?

Yes, and for more! Visiting and returning from headers.

Quote:

Originally Posted by Tex2002ans

Thousands of headings? What is this book, a collection of poems?

Life's work of a not only star journalist in the 1915-1935 - Joseph Roth Werke 1 - 6

Quote:

If your book has all the headings marked as <h3>, then you could use Sigil to help you.

The linkable headings are all <h3> (but there are <h2> and <h4>).

My homework is done. From you! All you wrote is tailor-made to my needs. You had thought of everything. No demands. BRAVO!
Now I start with sigil.

The last days I thought again and again how I could get rid of or reducing these unmanageable calibre ID's - my efforts are worth nothing! Exceptionally the spoiler is used to hide something:

Spoiler:

Kovid Goyal, please rethink the calibre ID-generation!

02-20-2017, 12:06 PM	#1
chaot Head of lunatic asylum Posts: 349 Karma: 77620 Join Date: Jun 2012 Location: UTC +1 Device: Tolino Vision 3HD	23 ID characters are too many!? Code: <h3 id="uwSzW8DWUPEK6RzzmX5nCi4"><a href="Inhalt.html#u2BtpeVxwIEyIFtCSZ7oQU6">Golkonda</a></h3> So look some hundreds of linked headers in a book. IMO 23 ID-characters are too many. How can I reduce them to realistic 6 or 9 characters? If not changeable in calibre, a regex would do the job also. Last edited by chaot; 02-20-2017 at 12:44 PM. Reason: <U>too many</U>

02-21-2017, 05:27 AM	#3
Tex2002ans Wizard Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook	Are these IDs being generated by some outside program? It is typically better to create human-readable code... so this gibberish: Code: <h3 id="uwSzW8DWUPEK6RzzmX5nCi4"><a href="Inhalt.html#u2BtpeVxwIEyIFtCSZ7oQU6">Golkonda</a></h3> might be fixed into this: Code: <h3 id="Golkonda"><a href="Inhalt.html#Section5">Golkonda</a></h3> Making it that way will also make your life a lot easier when you are trying to debug problems (broken links, links that send you to the wrong locations, etc.).

02-21-2017, 11:59 AM	#5
DiapDealer Grand Sorcerer Posts: 28,548 Karma: 204127028 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Sorry, but nobody "manually" made ids like that. Those were automatically generated by a program of some sort. You can change them to anything you like. Unfortunately, changing them WILL be a manual process. And if the book has lots and lots of links, it may also prove painful. No regex is gong to do what you want. You'd likely have to write a parsing algorithm that is capable of examining the current id and matching it with url fragments throughout the book so it could change them all to match. It would also need to maintain a list of all id's currently in use to ensure that your short, yet random id generator (which will need to be taught to create valid ids) creates ids that are guaranteed to be unique for that file. Your algorithm could also just create ids that started with a letter and added an auto-incrementing number (but would still need to parse the rest of the book to adjust all links as it went). Whether you do it manually, or you write a program/script to do it for you. there's nothing but hard work and time ahead of you. Last edited by DiapDealer; 02-21-2017 at 12:16 PM.

02-21-2017, 12:02 PM	#6
chaot Head of lunatic asylum Posts: 349 Karma: 77620 Join Date: Jun 2012 Location: UTC +1 Device: Tolino Vision 3HD	Yes, right. I configured the ID's manually together. Do you got an idea to make the process less painfull? First step would be a reduction of ID characters to 6, e.g! Last edited by chaot; 02-21-2017 at 12:05 PM.

02-21-2017, 01:39 PM	#9
chaot Head of lunatic asylum Posts: 349 Karma: 77620 Join Date: Jun 2012 Location: UTC +1 Device: Tolino Vision 3HD	ID's allow navigation in books! Normally (in usual books) is enough an one-way navigation, that is from Contents to the headers. (Even that isn't necessary.) In my books is absolute essential to get an two-way navigation: there and back ... and more. Keyword: something like <span class="return">top</span> Last edited by chaot; 02-21-2017 at 01:46 PM. Reason: <top>→something like <span class="return">top</span>

02-20-2017, 12:23 PM	#2
JSWolf Resident Curmudgeon Posts: 79,667 Karma: 145864619 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	Yes, that's way too many characters in an ID. While it's valid, it's ugly code.

02-21-2017, 01:19 PM	#8
stumped Wizard Posts: 3,305 Karma: 10259306 Join Date: May 2016 Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,	What do the IDs actually do anyway. I have stripped all of them from some epub books with no apparent impact on readability. Everthing still seems to work ?

02-21-2017, 01:53 PM	#10
stumped Wizard Posts: 3,305 Karma: 10259306 Join Date: May 2016 Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,	For what i read, fiction start to finish, the TOC satisfies my navigation needs. So the ids are for visiting, returning from footnotes ?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How many characters will you use?	frahse	Writers' Corner	21	09-21-2012 10:44 AM
¿Convert unicode decomposed characters to unique/normal characters?	JohnQwerty	Calibre	3	04-05-2012 12:08 PM
Classic using characters on NOOK	nmed	Barnes & Noble NOOK	1	08-02-2010 06:09 PM
Classic using characters on NOOK	nmed	Barnes & Noble NOOK	1	08-02-2010 05:55 PM
Usefull Characters	ghostyjack	Sigil	6	09-04-2009 10:18 AM

Advert

Advert