Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-16-2017, 12:29 AM   #1
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Generate TOC and linebreaks

I often have books with chapter heads like:

<h2>One<br/>Chapter Title</h2>

If I do "Generate TOC" that makes a TOC entry named
"OneChapter Title"

i.e., just deletes the linebreak.
When rewrapping lines, normally you replace a linebreak with a space. You don't join words together.

I can of course manually edit spaces or punctuation, but could this be made automatic? Then if I have to regenerate the TOC and then the HTML TOC from it, I don't have to make the same corrections again.

Ideally, make it an option: "Replace <br/> with" any character(s), default space. or none as now (some book headings have blank lines for spacing).
Also should collapse multiple <br/>, and ignore any at the beginning or end of the text.

Could extend this idea to more general transformations: e.g to add words, like if the pages have just numbers, <h2>1</h2>, but you want the TOC to have "Chapter 1".

But the handling of breaks now does need to be addressed.


Just thought of a workaround:
Replace "<br/>" with " <br/>", generate TOC. Though space at line end doesn't affect HTML unless monospaced, I'd normally clean that up.
But wouldn't work if I wanted to have colons or dashes.

Last edited by AlanHK; 09-16-2017 at 12:32 AM.
AlanHK is offline   Reply With Quote
Old 09-16-2017, 04:31 AM   #2
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Sigil's Generate TOC can make use of the title attribute.

Example:

Code:
<h2 title"1: Example Chapter">One<br/>Example Chapter</h2>
Whatever you shove in the title will show up in the toc.ncx.

I personally use a Regex along these lines:

Search: <h2>([^<]+)<br/>([^<]+)</h2>
Replace: <h2 title="\1. \2">\1<br/>\2</h2>

Then depending on the book, I use a period or em dash or colon between the chapter # + chapter name.

Last edited by Tex2002ans; 09-16-2017 at 04:37 AM.
Tex2002ans is offline   Reply With Quote
Advert
Old 09-16-2017, 04:51 AM   #3
Klecks
Enthusiast
Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.Klecks never is beset by a damp, drizzly November in his or her soul.
 
Klecks's Avatar
 
Posts: 39
Karma: 59154
Join Date: May 2010
Location: Stuttgart, Germany
Device: Kobo H2O, PocketBook Touch HD, Tolino Vision 4
Generate TOC

Hi AlanHK,

I know your problem and I like your ideas.
It would really nice to have the TOC-tool translate <br/> into a space.

But for the extended idea you must keep in mind, that a <h2> can look like: <h2>1</h2> or <h2>1.</h2> or <h2>–1–<h2> or <h2>*** 1 ***</h2>. Plus: most books have chapters like Prologue or Epilogue or About the Author. So you would need some kind of RegEx for the TOC Tool.
I'm not a programmer, but I don't think that's easy to implement.

Meanwhile my workaround is:
1. edit the TOC, if I want minor changes
2. RegEx the toc.ncx directly, if I want more changes
But if you have to recreate the TOC, you have to do these steps again.

If you want a solution, that survives the recreation of the TOC, you can (using RegEx) insert the title=""-property in the <h1> <h2> etc., so it looks like:
<h1 title="1. Some Text">1<br/><span class="Sub">Some Text</span></h1> or
<h1 title="Chapter 1">1</h1>
The TOC-tool will pick up the text from title="" and give you "1. Some Text" respectively "Chapter 1".

Last edited by Klecks; 09-16-2017 at 04:53 AM.
Klecks is offline   Reply With Quote
Old 09-16-2017, 05:17 AM   #4
AlanHK
Guru
AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.AlanHK ought to be getting tired of karma fortunes by now.
 
AlanHK's Avatar
 
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
Quote:
Originally Posted by Klecks View Post
title=""-property in the <h1> <h2> etc
Thanks. Didn't know about that.

However, of course, if you change the heading, you also have to remember to change the "Title=" manually.

Quote:
Originally Posted by Klecks View Post
But for the extended idea you must keep in mind, that a <h2> can look like: <h2>1</h2> or <h2>1.</h2> or <h2>–1–<h2> or <h2>*** 1 ***</h2>. Plus: most books have chapters like Prologue or Epilogue or About the Author.
I can handle a few special cases manually. I almost always edit the TOC after generation to add or delete a few items at the beginning or end. (May use the "title=" hack to do that now.) Just preferably not every single entry.


I still think it's undesirable to just delete breaks and run lines together as it does now.

Last edited by AlanHK; 09-16-2017 at 05:35 AM.
AlanHK is offline   Reply With Quote
Old 09-16-2017, 05:58 AM   #5
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
I thought Sigil used to handle the exact situation you describe differently (i.e. the break DID become a space when generating the ncx), but I could be wrong. I'll take a peek at what's involved.
DiapDealer is offline   Reply With Quote
Advert
Old 09-16-2017, 12:54 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,783
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by DiapDealer View Post
I thought Sigil used to handle the exact situation you describe differently (i.e. the break DID become a space when generating the ncx), but I could be wrong. I'll take a peek at what's involved.
Folk keep saying that, but I have always needed the space before the BR.

I adjust the line height for the class assigned to the H# to vertical space out the 2 lines within the book (no need for <br /> <br /> )
theducks is online now   Reply With Quote
Old 09-16-2017, 01:47 PM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by theducks View Post
Folk keep saying that, but I have always needed the space before the BR.
I did say I could be wrong (it's an old, vague memory).
DiapDealer is offline   Reply With Quote
Old 09-16-2017, 03:20 PM   #8
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by DiapDealer View Post
I did say I could be wrong (it's an old, vague memory).
There was no space generated by the <br/> as long as I can remember. But your ancient memory is probably much olderer than mine!

Quote:
Originally Posted by AlanHK View Post
I can handle a few special cases manually. I almost always edit the TOC after generation to add or delete a few items at the beginning or end. (May use the "title=" hack to do that now.) Just preferably not every single entry.
The way I do it is using the title, then you can easily Regex what's needed, without effecting the displayed text:

Original: <h2 title="Chapter One The Example">Chapter One<br/>The Example</h2>

Search: title="Chapter One
Replace: title="1—

After: <h2 title="1—The Example">Chapter One<br/>The Example</h2>

If you work with a lot of books that have that format, you could probably make a Saved Search group to handle the word -> number conversion.

From there, you could easily tweak what symbol you need for that specific book:

Search: (title="\d+)—
Replace: \1:

After: <h2 title="1: The Example">Chapter One<br/>The Example</h2>

Using title means that Generate TOC won't throw your manual changes in the garbage if you run it again. Saves a lot of headaches.

Last edited by Tex2002ans; 09-16-2017 at 03:32 PM.
Tex2002ans is offline   Reply With Quote
Old 09-16-2017, 07:21 PM   #9
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
The headings call GumboInterface:: get_local_text_of_node which will build up the local text of the node keeping any and all whitespace and that includes carriage returns and line feeds. The h2 node in your example:

Code:
<h2>One<br/>Chapter Title</h2>
has 3 children: a text node "One", an element node br, and a text node "Chapter Title" but since an element node like br has no text value in and of itself you end up seeing what you see.

We can try and detect br element child nodes and convert them to a space on to a newline in that routine instead. The Qt QString.simplified() call should then work properly to remove leading and trailing spaces but leave internal whitespace.
KevinH is online now   Reply With Quote
Old 09-16-2017, 08:23 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
QString.simplfied() collapses internal whitespace, too. But we want it to in many cases. It collapses multiple spaces, and any combination of tabs, linefeeds and carriage returns to a single space (as well as trimming the leading/trailing whitespace).

Would converting br child nodes into line-breaks (which simplified() would ultimately convert to a space) be difficult to accomplish in get_local_text_of_node? If not, should we always do so, or pass a boolean parameter (defaulted to false) that's only true for the GetHeadingsListForOneFile function in Headings.cpp?

If it's awkward, or potentially introduces other complications, I'm not sure if it's worth it (but I'm not opposed to the idea of "fixing" this on principle alone, either).

Last edited by DiapDealer; 09-16-2017 at 08:43 PM.
DiapDealer is offline   Reply With Quote
Old 09-17-2017, 12:11 PM   #11
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
It should be an easy change and that routine is really only called for toc headings, and the nav. So it should be a very safe change.

I will push something today (or soon) that should fix this issue for us.



Quote:
Originally Posted by DiapDealer View Post
QString.simplfied() collapses internal whitespace, too. But we want it to in many cases. It collapses multiple spaces, and any combination of tabs, linefeeds and carriage returns to a single space (as well as trimming the leading/trailing whitespace).

Would converting br child nodes into line-breaks (which simplified() would ultimately convert to a space) be difficult to accomplish in get_local_text_of_node? If not, should we always do so, or pass a boolean parameter (defaulted to false) that's only true for the GetHeadingsListForOneFile function in Headings.cpp?

If it's awkward, or potentially introduces other complications, I'm not sure if it's worth it (but I'm not opposed to the idea of "fixing" this on principle alone, either).
KevinH is online now   Reply With Quote
Old 09-17-2017, 02:20 PM   #12
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
FYI - just pushed a "fix" for this to master. It should be a safe change (hope those were not famous last words!)

Last edited by KevinH; 09-17-2017 at 03:25 PM.
KevinH is online now   Reply With Quote
Old 09-17-2017, 03:27 PM   #13
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by KevinH View Post
FYI - just pushed a "fix" for this master. It should be a safe change (hope those were not famous last words!)
It performs as expected for EPUB2, but there's no call to QString.simplified() on the EPUB3 side of things. So there's a line-break in the nav, the ToC Widget entry, and ultimately the ncx (if you generate one from he nav).

I've found a fairly central location in Headings.cpp to call .simplified() that should ensure that whitespace gets collapsed properly in all toc/nav/ncx generation. Should I push?
DiapDealer is offline   Reply With Quote
Old 09-17-2017, 04:12 PM   #14
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
yes please!
KevinH is online now   Reply With Quote
Old 09-17-2017, 04:41 PM   #15
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Done.
DiapDealer is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unable to generate custom toc hounddog Conversion 0 02-20-2016 12:36 PM
How to generate a physical TOC? Books987 Conversion 2 01-24-2015 11:08 PM
html does not generate TOC iliakan Conversion 3 01-05-2015 01:56 PM
generate TOC duplicates puzzle cybmole Calibre 14 01-09-2011 07:01 PM
can't generate a toc from an html file p3aul Calibre 13 08-27-2010 05:44 AM


All times are GMT -4. The time now is 08:43 PM.


MobileRead.com is a privately owned, operated and funded community.