Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 11-22-2011, 01:38 PM   #1
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Bug or feature of the TOC generator?

I have 0.4.2 and try to do by hand a book from PDF to ePub. I have changed the file to barebones HTML and will attach a CSS file later. Now, things should be nice and clean with only the HTML tags and nothing more.

Yet when I hit „Generate TOC from headings” an id="heading_id_2" or id="heading_id_3" is attached to the headings. Why is that?

And can it be disabled?
Artha is offline   Reply With Quote
Old 11-22-2011, 02:05 PM   #2
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
The ToC needs to tell the viewer where the element is, this is done by using the id attribute, it makes no sense to remove it, and will break the epub in any case. If the elements already have id's, then Sigil will not generate them.
Serpentine is offline   Reply With Quote
Advert
Old 11-22-2011, 02:26 PM   #3
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
So, in the end there is a TOC file in the final ePub?
Artha is offline   Reply With Quote
Old 11-22-2011, 02:36 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,872
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Artha View Post
So, in the end there is a TOC file in the final ePub?
Yes, it is part of the NCX file. EPUB does not need the inline TOC that mobi needs.
theducks is offline   Reply With Quote
Old 11-22-2011, 03:12 PM   #5
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Oh! That makes sense. Thanks.
Artha is offline   Reply With Quote
Advert
Old 11-22-2011, 05:52 PM   #6
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
If you want to strip attributes from most? tags you can try using something like :
Code:
find:
<(/)?\b(h\d|[uod]l|[pisbuq]|hr|br|abbr|acronym|address|area|base|basefont|bdo|big|blockquote|body|caption|center|cite|code|col|colgroup|dd|del|dfn|dir|div|dt|em|font|frame|frameset|hr|ins|kbd|label|legend|li|map|menu|noframes|noscript|object|param|pre|samp|select|small|span|strike|strong|sub|sup|table|tbody|td|textarea|tfoot|th|thead|title|tr|tt|var)\b([^<>]+[^/])(/)?>
replace:
<\1\2\4>
You will however need to make _sure_ that you are not removing font formatting which might be important, for example: calibre often uses a span class to mark italic (etc) text, instead of <i> tags - read the CSS and replace the tags correctly. You will also need to regenerate if you leave in <h> tags. Remove tags as needed and well..., be careful.
Serpentine is offline   Reply With Quote
Old 11-23-2011, 03:24 AM   #7
Artha
-----
Artha began at the beginning.
 
Posts: 114
Karma: 10
Join Date: Jun 2011
Device: Samsung SNE65
Weird. Why would Calibre use span, or <i>, when there's <em> for that?
Artha is offline   Reply With Quote
Old 11-23-2011, 03:27 AM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,316
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by Artha View Post
I have 0.4.2 and try to do by hand a book from PDF to ePub. I have changed the file to barebones HTML and will attach a CSS file later. Now, things should be nice and clean with only the HTML tags and nothing more.

Yet when I hit „Generate TOC from headings” an id="heading_id_2" or id="heading_id_3" is attached to the headings. Why is that?

And can it be disabled?
You don't need the id="heading_id_2" if each chapter is a separate file. All you do in the NCX is call the file you want for each chapter entry without needing the # anchor.

What I do is use regex to strip it. I would search for od="heading_id_[0-9]*" and replace with nothing. This works in Notepad++. I've not tried it in Sigil so I do not know if that regex would work. Someone may be able to fix it if it's incorrect.

Quote:
Originally Posted by Artha View Post
Weird. Why would Calibre use span, or <i>, when there's <em> for that?
Because that's what is in the HTML generated from the PDF.

I've seen code from some conversions were there was something like <p class=para"><span>text of the book</span></p> in every line and it got worse with italics. I was able to regex remove most of it and then manually remove it for every line that had italics.

With Calibre, a lot of the oddities are in the source fed to it.

Last edited by JSWolf; 11-23-2011 at 03:34 AM.
JSWolf is offline   Reply With Quote
Old 11-23-2011, 09:25 AM   #9
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,872
Karma: 55267620
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
@JSWolf
I would use (in Sigil)
search for:
Code:
\s+id="heading_id_\d+"
Which is fine for numeric only of any digit count

and replace with nothing
theducks is offline   Reply With Quote
Old 11-25-2011, 05:59 PM   #10
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,316
Karma: 129333690
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
What type of regex does Sigil use in case I need to look online for help?
JSWolf is offline   Reply With Quote
Old 11-26-2011, 11:03 PM   #11
opitzs
Avid Reader
opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.
 
opitzs's Avatar
 
Posts: 161
Karma: 36472
Join Date: Sep 2008
Location: Look for rain, hail and snow...
Device: PRS-505, PRS-600, PRS T1, Kobo Glo
It uses QT Regex at the moment, but with 0.5 this will be changed to PRCE Regex. I can't wait...
opitzs is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NCX file generator (and html ToC and opf) GiorgioC Workshop 0 07-12-2011 06:55 AM
Bug or feature in iBooks Chang ePub 6 02-18-2011 07:30 AM
Import Date: Bug or Feature? DobraGolonka Calibre 19 08-24-2010 11:47 AM
Bug or Feature? capidamonte Calibre 5 07-27-2010 03:06 PM


All times are GMT -4. The time now is 02:24 AM.


MobileRead.com is a privately owned, operated and funded community.