09-16-2017, 12:29 AM | #1 |
Guru
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
Generate TOC and linebreaks
I often have books with chapter heads like:
<h2>One<br/>Chapter Title</h2> If I do "Generate TOC" that makes a TOC entry named "OneChapter Title" i.e., just deletes the linebreak. When rewrapping lines, normally you replace a linebreak with a space. You don't join words together. I can of course manually edit spaces or punctuation, but could this be made automatic? Then if I have to regenerate the TOC and then the HTML TOC from it, I don't have to make the same corrections again. Ideally, make it an option: "Replace <br/> with" any character(s), default space. or none as now (some book headings have blank lines for spacing). Also should collapse multiple <br/>, and ignore any at the beginning or end of the text. Could extend this idea to more general transformations: e.g to add words, like if the pages have just numbers, <h2>1</h2>, but you want the TOC to have "Chapter 1". But the handling of breaks now does need to be addressed. Just thought of a workaround: Replace "<br/>" with " <br/>", generate TOC. Though space at line end doesn't affect HTML unless monospaced, I'd normally clean that up. But wouldn't work if I wanted to have colons or dashes. Last edited by AlanHK; 09-16-2017 at 12:32 AM. |
09-16-2017, 04:31 AM | #2 |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Sigil's Generate TOC can make use of the title attribute.
Example: Code:
<h2 title"1: Example Chapter">One<br/>Example Chapter</h2> I personally use a Regex along these lines: Search: <h2>([^<]+)<br/>([^<]+)</h2> Replace: <h2 title="\1. \2">\1<br/>\2</h2> Then depending on the book, I use a period or em dash or colon between the chapter # + chapter name. Last edited by Tex2002ans; 09-16-2017 at 04:37 AM. |
Advert | |
|
09-16-2017, 04:51 AM | #3 |
Enthusiast
Posts: 39
Karma: 59154
Join Date: May 2010
Location: Stuttgart, Germany
Device: Kobo H2O, PocketBook Touch HD, Tolino Vision 4
|
Generate TOC
Hi AlanHK,
I know your problem and I like your ideas. It would really nice to have the TOC-tool translate <br/> into a space. But for the extended idea you must keep in mind, that a <h2> can look like: <h2>1</h2> or <h2>1.</h2> or <h2>–1–<h2> or <h2>*** 1 ***</h2>. Plus: most books have chapters like Prologue or Epilogue or About the Author. So you would need some kind of RegEx for the TOC Tool. I'm not a programmer, but I don't think that's easy to implement. Meanwhile my workaround is: 1. edit the TOC, if I want minor changes 2. RegEx the toc.ncx directly, if I want more changes But if you have to recreate the TOC, you have to do these steps again. If you want a solution, that survives the recreation of the TOC, you can (using RegEx) insert the title=""-property in the <h1> <h2> etc., so it looks like: <h1 title="1. Some Text">1<br/><span class="Sub">Some Text</span></h1> or <h1 title="Chapter 1">1</h1> The TOC-tool will pick up the text from title="" and give you "1. Some Text" respectively "Chapter 1". Last edited by Klecks; 09-16-2017 at 04:53 AM. |
09-16-2017, 05:17 AM | #4 | |
Guru
Posts: 668
Karma: 929286
Join Date: Apr 2014
Device: PW-3, iPad, Android phone
|
Thanks. Didn't know about that.
However, of course, if you change the heading, you also have to remember to change the "Title=" manually. Quote:
I still think it's undesirable to just delete breaks and run lines together as it does now. Last edited by AlanHK; 09-16-2017 at 05:35 AM. |
|
09-16-2017, 05:58 AM | #5 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
I thought Sigil used to handle the exact situation you describe differently (i.e. the break DID become a space when generating the ncx), but I could be wrong. I'll take a peek at what's involved.
|
Advert | |
|
09-16-2017, 12:54 PM | #6 | |
Well trained by Cats
Posts: 29,783
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
I adjust the line height for the class assigned to the H# to vertical space out the 2 lines within the book (no need for <br /> <br /> ) |
|
09-16-2017, 01:47 PM | #7 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
09-16-2017, 03:20 PM | #8 | |
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
There was no space generated by the <br/> as long as I can remember. But your ancient memory is probably much olderer than mine!
Quote:
Original: <h2 title="Chapter One The Example">Chapter One<br/>The Example</h2> Search: title="Chapter One Replace: title="1— After: <h2 title="1—The Example">Chapter One<br/>The Example</h2> If you work with a lot of books that have that format, you could probably make a Saved Search group to handle the word -> number conversion. From there, you could easily tweak what symbol you need for that specific book: Search: (title="\d+)— Replace: \1: After: <h2 title="1: The Example">Chapter One<br/>The Example</h2> Using title means that Generate TOC won't throw your manual changes in the garbage if you run it again. Saves a lot of headaches. Last edited by Tex2002ans; 09-16-2017 at 03:32 PM. |
|
09-16-2017, 07:21 PM | #9 |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
The headings call GumboInterface:: get_local_text_of_node which will build up the local text of the node keeping any and all whitespace and that includes carriage returns and line feeds. The h2 node in your example:
Code:
<h2>One<br/>Chapter Title</h2> We can try and detect br element child nodes and convert them to a space on to a newline in that routine instead. The Qt QString.simplified() call should then work properly to remove leading and trailing spaces but leave internal whitespace. |
09-16-2017, 08:23 PM | #10 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
QString.simplfied() collapses internal whitespace, too. But we want it to in many cases. It collapses multiple spaces, and any combination of tabs, linefeeds and carriage returns to a single space (as well as trimming the leading/trailing whitespace).
Would converting br child nodes into line-breaks (which simplified() would ultimately convert to a space) be difficult to accomplish in get_local_text_of_node? If not, should we always do so, or pass a boolean parameter (defaulted to false) that's only true for the GetHeadingsListForOneFile function in Headings.cpp? If it's awkward, or potentially introduces other complications, I'm not sure if it's worth it (but I'm not opposed to the idea of "fixing" this on principle alone, either). Last edited by DiapDealer; 09-16-2017 at 08:43 PM. |
09-17-2017, 12:11 PM | #11 | |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
It should be an easy change and that routine is really only called for toc headings, and the nav. So it should be a very safe change.
I will push something today (or soon) that should fix this issue for us. Quote:
|
|
09-17-2017, 02:20 PM | #12 |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
FYI - just pushed a "fix" for this to master. It should be a safe change (hope those were not famous last words!)
Last edited by KevinH; 09-17-2017 at 03:25 PM. |
09-17-2017, 03:27 PM | #13 | |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
I've found a fairly central location in Headings.cpp to call .simplified() that should ensure that whitespace gets collapsed properly in all toc/nav/ncx generation. Should I push? |
|
09-17-2017, 04:12 PM | #14 |
Sigil Developer
Posts: 7,637
Karma: 5433388
Join Date: Nov 2009
Device: many
|
yes please!
|
09-17-2017, 04:41 PM | #15 |
Grand Sorcerer
Posts: 27,547
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Done.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unable to generate custom toc | hounddog | Conversion | 0 | 02-20-2016 12:36 PM |
How to generate a physical TOC? | Books987 | Conversion | 2 | 01-24-2015 11:08 PM |
html does not generate TOC | iliakan | Conversion | 3 | 01-05-2015 01:56 PM |
generate TOC duplicates puzzle | cybmole | Calibre | 14 | 01-09-2011 07:01 PM |
can't generate a toc from an html file | p3aul | Calibre | 13 | 08-27-2010 05:44 AM |