MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Diagnosing a Sigil ID nightmare (https://www.mobileread.com/forums/showthread.php?t=211033)

curiousgeorge 04-18-2013 04:55 PM

Diagnosing a Sigil ID nightmare
 
I'm correcting an epub that has havoc reined by Sigil. For some apparent reason the particular person asked me to examine there epub and I said ok before looking at it. Now I have a 600 page epub with almost 2500 lines of code from Sigil that has duplicate ids. Can someone please explain this process to me and why Sigil does this? I typically only hand code my epubs.

Code:

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

Text removed but you get the idea. Thanks!

theducks 04-18-2013 05:09 PM

Quote:

Originally Posted by curiousgeorge (Post 2486584)
I'm correcting an epub that has havoc reined by Sigil. For some apparent reason the particular person asked me to examine there epub and I said ok before looking at it. Now I have a 600 page epub with almost 2500 lines of code from Sigil that has duplicate ids. Can someone please explain this process to me and why Sigil does this? I typically only hand code my epubs.

Code:

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

<p class="tx" id="d7e791985">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

Text removed but you get the idea. Thanks!

Just remove them
You only need id's to link, and then, only if not at the top of file.

curiousgeorge 04-18-2013 05:37 PM

I know that. My apologies for not explaining what I meant. Im curious to know WHY Sigil does it.

DiapDealer 04-18-2013 06:03 PM

Quote:

Originally Posted by curiousgeorge (Post 2486664)
I know that. My apologies for not explaining what I meant. Im curious to know WHY Sigil does it.

There would normally have to be something drastically wrong with the epub in the first place for Sigil to make anything other than small changes to the code on it's own. I find it highly doubtful that Sigil auto-inserted a bunch of ids into p tags (duplicate or otherwise), myself. Not unless it was handed a complete mess to begin with and it just did its best cope. It'd be nice to see a sample of what the epub looked like before you opened it in Sigil.

Doitsu 04-19-2013 02:45 AM

As DiapDealer already pointed out, it's highly unlikely that Sigil did this. Since styles can also be assigned by ids, it's possible that these duplicated ids are used for style assignments.

Did you check the stylesheet for id based styles? For example:

Code:

#d7e791985 {
    text-align: center;
}


curiousgeorge 04-19-2013 11:04 AM

yes I have checked the CSS and Im told the entire epub was created in Sigil. I never use Sigil so I wanted to know from here if this is something common Sigil does.

theducks 04-19-2013 11:40 AM

id is not class
I would not expect to see that value in the CSS.

' class="tx" '

That book may have started elsewhere and was malformed at import.
Tidy, then *fixed* it :rolleyes:

Remember :smack:
GIGO

curiousgeorge 04-19-2013 12:26 PM

lol you say that but the horror stories I can talk about..

Doitsu 04-19-2013 12:49 PM

Quote:

Originally Posted by theducks (Post 2487379)
id is not class
I would not expect to see that value in the CSS.

Me neither, but it's perfectly acceptable. For example the following code:

Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
 
<style type="text/css">
#d7e791985 {
    text-align: center;
    color: red;
  }
</style>
</head>

<body>
  <p id="d7e791985">A centered, red paragraph.</p>
</body>
</html>

displays fine in Sigil and ADE, passes epubcheck and compiles OK with KindleGen.

However, since the original epub creator didn't use these particular ids to assign a style and apparently doesn't reference them anywhere else, it looks more like a global search and replace action gone awry.

curiousgeorge 04-23-2013 10:27 AM

it wont pass epubcheck because it goes back to HTML validation in regards to duplicate ids

such as:
Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
 
<style type="text/css">
#d7e791985 {
    text-align: center;
    color: red;
  }
</style>
</head>

<body>
  <p id="d7e791985">A centered, red paragraph.</p>
  <p id="d7e791985">A centered, red paragraph.</p>
</body>
</html>


Doitsu 04-23-2013 11:07 AM

Quote:

Originally Posted by curiousgeorge (Post 2491384)
it wont pass epubcheck because it goes back to HTML validation in regards to duplicate ids

I never said that an epub with multiple identical ids would pass epubcheck. :)
I merely wanted to show that ids can also be used to assign styles.

curiousgeorge 04-23-2013 12:22 PM

Quote:

Originally Posted by Doitsu (Post 2491428)
I never said that an epub with multiple identical ids would pass epubcheck. :)
I merely wanted to show that ids can also be used to assign styles.

ok good, I was scared a minute there :thumbsup:

LukeA 05-23-2013 02:17 AM

Copying and pasting in Sigil copies ids as well as everything else. If you are in book view you won't know this - it is only obvious in code view. I've found that just fiddling with paragraphs can result in duplicate ids across successive paragraphs without copying.

It seems to me that if duplicate ids are forbidden, Sigil should be smart enough to not create duplicate ids automatically - when copying, it should generate a new id for the pasted item. Ditto for generating multiple paragraphs out of one.

Hitch 05-23-2013 04:13 AM

Quote:

Originally Posted by LukeA (Post 2522251)
Copying and pasting in Sigil copies ids as well as everything else. If you are in book view you won't know this - it is only obvious in code view. I've found that just fiddling with paragraphs can result in duplicate ids across successive paragraphs without copying.

It seems to me that if duplicate ids are forbidden, Sigil should be smart enough to not create duplicate ids automatically - when copying, it should generate a new id for the pasted item. Ditto for generating multiple paragraphs out of one.

+1! LukeA beat me to it. I was about to say, some noob created a paragraph style, in CV (Code View), and without thinking, switched over to BookView and started typing by hitting "enter" and typing merrily on, or cutting and pasting paragraphs (probably from a PDF, heavens help us, if not Word) into BV. Sigil did what it is supposed to do; it duplicated the previous paragraph style and class. Thus, you have tens of thousands of paragraphs with an id present. This is what happens when a DIY'er reads some posting on the KDP forum and decides to use Sigil like a word-processor. Fortuitously, you can simply regex that out, and the ePUB should pass validation--assuming everything else is fine.

Given how basic that error is, though, I wouldn't count on not finding other mistakes just as basic and just as painful.

Quote:

It seems to me that if duplicate ids are forbidden, Sigil should be smart enough to not create duplicate ids automatically - when copying, it should generate a new id for the pasted item. Ditto for generating multiple paragraphs out of one.
Don't see why, and I'd say, not necessarily. Do you seriously think that Sigil should have generated thousands of new id's for the thousands of paragraphs in that ePUB of 600 printed pages? Crap, I'd rather regex 10,000 identical id's than 10,00 different ones had I made that mistake (and don't think that I didn't make one that dumb, a long time ago). ;-) Sigil's not a word-processor; it assumes that its users are smart enough to know HTML, XHTML and CSS. {smile}.

Hitch

Doitsu 05-23-2013 05:52 AM

Quote:

Originally Posted by Hitch (Post 2522291)
Don't see why, and I'd say, not necessarily. Do you seriously think that Sigil should have generated thousands of new id's for the thousands of paragraphs in that ePUB of 600 printed pages? Crap, I'd rather regex 10,000 identical id's than 10,00 different ones had I made that mistake (and don't think that I didn't make one that dumb, a long time ago). ;-) Sigil's not a word-processor; it assumes that its users are smart enough to know HTML, XHTML and CSS. {smile}.

Actually, generating new unique ids for paragraphs isn't that complicated in Sigil. Thanks to the comprehensive Index code implemented by Meme, all you have to do is:
  1. Delete all paragraph ids and select Tools > Index > Index Editor.
  2. Right-click the Index Editor window and select Autofill.
  3. Select Tools > Index > Create Index.
This will cause Sigil to automatically add consecutive ids to each paragraph in the epub. (Not that they'll be particularly useful.)


All times are GMT -4. The time now is 10:59 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.