05-02-2017, 04:28 PM | #1 |
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
Forcing indexing to skip a block of text?
Hi folks,
Sorry to be back so soon with another indexing question, should be the last one (for a while at least!) Whilst nosing about in Sigils code view I noticed there is a class called "sigil_index_marker" used to flag index entries from within the epub content. Does the opposite exist i.e. a class that can be applied to a block of text, maybe with a <div> or similar to tell the indexing tool to skip the marked block when looking for text from the index editor? Failing that, is there a way to tell the indexing tool to skip a file or similar? Reason being, I'm in the process of adding a "List of Illustrations" to my epub and don't want the list content included in the index even if it matches the index editor text. I realise that I could use the "sigil_index_marker" class and mark all the occurrences I want to include but this would mean upwards of 1000 entries to manually flag - ouch! Hope this makes sense... Andy B. |
05-02-2017, 10:00 PM | #2 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
The indexer will skip "toc" guide items in the opf in epub2 and the nav in epub3.
So by manually playing with the opf guide items you might be able to force it to ignore a file. |
Advert | |
|
05-03-2017, 01:42 AM | #3 |
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
Many thanks, I'll give it a try...
Andy B. |
05-03-2017, 02:53 AM | #4 |
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
Ops - close but not quite cigar worthy ...
My "List of Illustrations" is now fixed, its content is not appearing in the index, but I now have text from my ToC appearing... A ToC snippet looks like this before indexing: Code:
<h3 class="sigil_not_in_toc">12. <a href="../Text/12-green-horrors.xhtml">Green Horrors</a></h3> <ul> <li>What constitutes a horror? Opinions vary…</li> <li>Japanese knotweed</li> <li>Giant hogweed</li> <li>A succession of pondweeds</li> <li>New Zealand pigmyweed</li> <li>Azolla</li> <li>Least duckweed</li> <li>Floating pennywort</li> <li>Himalayan balsam</li> <li>Rhododendron ponticum</li> </ul> And this after indexing: Code:
. . . <li>Least duckweed</li> <li>Floating pennywort</li> <li>Himalayan balsam</li> <li id="sigil_index_id_1">Rhododendron ponticum</li> </ul> The relevant part of my tweaked (as suggested) content.opf file looks like this: Code:
<guide> <reference type="copyright-page" title="Copyright Page" href="Text/copyright.xhtml"/> <reference type="dedication" title="Dedication" href="Text/dedication.xhtml"/> <reference type="toc" title="Table of Content" href="Text/contents.xhtml"/> <reference type="toc" title="Table of Content" href="Text/illustrations.xhtml"/> <reference type="preface" title="Preface" href="Text/preface.xhtml"/> <reference type="acknowledgements" title="Acknowledgements" href="Text/acknowledgements.xhtml"/> <reference type="text" title="Text" href="Text/01-early-doors.xhtml"/> <reference type="text" title="Text" href="Text/02-chill-out.xhtml"/> . . . I'm confused... This is starting to get quite involved, I'm happy to take the discussion off list if required... Andy B. Last edited by andyb; 05-03-2017 at 05:19 AM. Reason: Not quite as fixed as I thought it was! |
05-03-2017, 10:56 AM | #5 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
I assumed you had removed the old toc from the document first and just wanted to hide one other file. The epub2 guide items allow only one true toc not multiple.
That said, should an index ever index front matter? Are dedications ever indexed? Perhaps we should modify Sigil to skip all guide types that map to any type of front matter? That could be easily done. What is the right/official thing to do here? Should all front matter be excluded by an index generator or not? |
Advert | |
|
05-03-2017, 12:10 PM | #6 | ||
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
Quote:
Quote:
As the Semantics can be set for each file by right clicking on the filename and selecting "Add Semantics" How about controlling the index tool access by Semantic type. A new panel could be created in "Preferences" showing a list of the available semantic types and allowing a tick box for each one to say "Include files of this type in the Index" - maximum flexibility achieved with (fingers crossed!) not too much effort and everybody happy? (he said - running for cover!) Just out of interest where does the semantic list come from - I couldn't spot it in the epub2 standard? Andy B. |
||
05-03-2017, 12:33 PM | #7 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
|
05-03-2017, 12:39 PM | #8 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Its an idea but changing a semantic type is easy and indexing should only be generated once the book is complete. So having a list of front matter guide types/nav types that are not indexed would work. If you decide you want to index those semantic types for any reason, you can temporarily toggle them off and then reenable them after the index is built.
Should work easily that way if we can agree which on semantic types should default to not being indexed. |
05-03-2017, 12:48 PM | #9 | |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
According to the MIT Press on how to create an index...
Quote:
|
|
05-03-2017, 12:52 PM | #10 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
So to be consistent, we should default to excluding all front matter semantic tags. If an developer instead wants to index it, them can temporarily toggle off that semantic type before creating that index.
I will look into modifying Sigil's code to do that. |
05-03-2017, 12:52 PM | #11 |
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
That'll teach me to play - I think I've found a bug!
There appears to be an issue with Right-click-on-filename>Add Semantics - it appears to be toggling the semantics rather than setting them and not telling you what it's doing. For example: on a clean start of Sigil file "Section0001.xhtml" has no semantic set and the <guide> tag in content.opf is empty If I set the file Section0001.xhtml to have a semantic type of "text" the correct entry is created in content.opf Code:
<guide> <reference type="text" title="Text" href="Text/Section0001.xhtml"/> </guide> This is obviously a new meaning for the word "Add" that I previously wasn't aware of Not realizing this was happening initially caused no end of confusion! Andy B. |
05-03-2017, 01:23 PM | #12 |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Yes, it is and always has been a toggle. If you have already added something it will turn it back off. This is how Sigil has always worked. If you want to see if a particular file has something already set, you can do one of the following:
- Mouse Over the File name in the BookBrowser Window - Create a Report - Edit the content.opf |
05-03-2017, 02:50 PM | #13 | |
Sigil Developer
Posts: 7,633
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Okay, I have gone over the set of possible epub2 guide types and epub3 landmarks (ie. semantics) and can see that by default we should be only indexing files with no semantics set at all OR that have the following semantic types:
Quote:
If an author decides they want to index one of these files, they they can simply toggle off the semantics before running the index generator and then toggle it back on. That should give the user more control over what gets indexed and what does not at the file level and follows the guidelines provided by the MIT Press for generating indexes. I will commit that change now so that it will be available in the next release. |
|
05-03-2017, 04:15 PM | #14 |
Member
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
|
Many thanks for the time and effort...
. Andy B. |
05-03-2017, 07:14 PM | #15 | |||
Wizard
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Side Note: Being able to Ignore certain files would also be helpful in the Tools > Spellcheck. Sometimes you don't want to include words in places like the Cover/Title/Copyright/TOC/Index, and just want to focus on the main text itself. This is a much worse issue in Calibre though, because that also looks through many non-HTML files like the content.opf + toc.ncx. Plus Calibre's Spellchecking includes numbers (you can imagine how many numbers show up in an Index and drown out legitimate numbers in the main text itself). I think that would work better as a little GUI checkbox though: (You can tell my artistic skills are impeccable. ) Quote:
As you can see at MIT's recommendations, their indexes typically focuses on the main matter of the book. But some may include the Foreword/Introduction/Preface (Roman Numeral pages). I have seen all types. Here is a portion of the 16th Edition of the Chicago Manual of Style, "Chapter 16: Indexes": What Parts of the Work to Index 16.109 Indexing the text, front matter, and back matter. The entire text of a book or journal article, including substantive content in notes (see 16.110), should be indexed. Much of the front matter, however, is not indexable-title page, dedication, epigraphs, lists of illustrations and tables, and acknowledgments. A preface, or a foreword by someone other than the author of the work, may be indexed if it concerns the subject of the work and not simply how the work came to be written. A true introduction, whether in the front matter or, more commonly, in the body of the work, is always indexed (for introduction versus preface, see 1.42). Book appendixes should be indexed if they contain information that supplements the text, but not if they merely reproduce documents that are discussed in the text (the full text of a treaty, for example, or a questionnaire). Appendixes to journal articles are indexed as part of the articles. Glossaries, bibliographies, and other such lists are usually not indexed. ----- Another potential/alternate solution might also be allowing Sigil to index Front Matter, but change the numbering scheme: Current Output Example, <a href="Preface.xhtml">[1]</a>, <a href="Introduction.xhtml">[2]</a>, <a href="Chap01.xhtml">[3]</a>, <a href="Chap02.xhtml">[4]</a>, <a href="Chap03.xhtml">[5]</a> With Front Matter Example, <a href="Preface.xhtml">[i]</a>, <a href="Introduction.xhtml">[ii]</a>, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a> Without Front Matter Example, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a> I don't know how the edge case would work of many people shifting traditional Front Matter to the back of ebook files... or if somehow the Index generator can be revamped to take into account "Real Page Numbers" (RPNs). Quote:
I am also scratching my head at MIT's guidelines... according to all of the books I have digitized, Footnotes and Endnotes are indexed quite often. I am most familiar with the formats: ###n [page number + n] ###n## [page number + n + footnote number] or similar variants. Of course, that specific naming scheme isn't applicable to Sigil's current Index generation though. Here is the relevant piece in the 16th Edition of the Chicago Manual of Style: 16.110 Indexing footnotes and endnotes. Notes, whether footnotes or endnotes, should be indexed only if they continue or amplify discussion in the text (substantive notes). Notes that merely contain source citations documenting statements in the text (reference notes) need not be indexed. 16.111 Endnote locators in index entries. Endnotes in printed works are referred to by page, the letter n (for note), and-extremely important-the note number, with no internal space (334n14). If two or more consecutive notes are referred to, two n's and an en dash are used (e.g., 334nn14-16). Nonconsecutive notes on the same page are treated separately (334n14, 334n16, 334n19). Occasionally, when reference to a note near the end of one chapter of a book is followed by reference to a note near the beginning of the next, nonchronological order will result (334n19, 334n2). To avoid the appearance of error, the chapter number may be added in parentheses after the lower note number. cats, 334n19, 334n2 (chap. 9), 335n5 16.112 Footnote locators in index entries. Footnotes in a printed work are generally referred to in the same way as endnotes. When a footnote is the only one on the page, however, the note number (or symbol, if numbers are not used) may be omitted (156n). Note numbers should never be omitted when several notes appear on the same page. If there is indexable material in a text passage and in a related footnote, only the page number need be given. But if the text and the footnote materials are not connected, both text and note should be cited (156, 156n, 278, 278n30). Last edited by Tex2002ans; 05-03-2017 at 07:48 PM. |
|||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Keeping text together (block vs. page-break-inside:avoid) | Psymon | ePub | 2 | 10-12-2014 09:56 AM |
Isolated Block of Text | crutledge | Sigil | 10 | 06-22-2013 02:49 PM |
How to make Amazon Kindle Text to Speech skip over some text | xsaero00 | Kindle Developer's Corner | 3 | 06-18-2011 07:09 PM |
Forcing monocrome text in browser? | kdgarris | Kindle Developer's Corner | 0 | 06-13-2011 10:26 AM |