View Single Post
Old 05-03-2017, 07:14 PM   #15
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KevinH View Post
All of the other semantic types in both epub2 and epub3 represent front matter, back matter, notes/footnotes, appendices, toc/indexes, etc and these should not be indexed by default.
Seems like reasonable assumptions.

Side Note: Being able to Ignore certain files would also be helpful in the Tools > Spellcheck.

Sometimes you don't want to include words in places like the Cover/Title/Copyright/TOC/Index, and just want to focus on the main text itself. This is a much worse issue in Calibre though, because that also looks through many non-HTML files like the content.opf + toc.ncx. Plus Calibre's Spellchecking includes numbers (you can imagine how many numbers show up in an Index and drown out legitimate numbers in the main text itself).

I think that would work better as a little GUI checkbox though:

Click image for larger version

Name:	CrudeIncludeGUI.png
Views:	144
Size:	4.8 KB
ID:	156507

(You can tell my artistic skills are impeccable. )

Quote:
Originally Posted by KevinH View Post
What is the right/official thing to do here? Should all front matter be excluded by an index generator or not?
Front matter being indexed depends on the publisher/book.

As you can see at MIT's recommendations, their indexes typically focuses on the main matter of the book. But some may include the Foreword/Introduction/Preface (Roman Numeral pages). I have seen all types.

Here is a portion of the 16th Edition of the Chicago Manual of Style, "Chapter 16: Indexes":

What Parts of the Work to Index

16.109 Indexing the text, front matter, and back matter. The entire text of a book or journal article, including substantive content in notes (see 16.110), should be indexed. Much of the front matter, however, is not indexable-title page, dedication, epigraphs, lists of illustrations and tables, and acknowledgments. A preface, or a foreword by someone other than the author of the work, may be indexed if it concerns the subject of the work and not simply how the work came to be written. A true introduction, whether in the front matter or, more commonly, in the body of the work, is always indexed (for introduction versus preface, see 1.42). Book appendixes should be indexed if they contain information that supplements the text, but not if they merely reproduce documents that are discussed in the text (the full text of a treaty, for example, or a questionnaire). Appendixes to journal articles are indexed as part of the articles. Glossaries, bibliographies, and other such lists are usually not indexed.

-----

Another potential/alternate solution might also be allowing Sigil to index Front Matter, but change the numbering scheme:

Current Output

Example, <a href="Preface.xhtml">[1]</a>, <a href="Introduction.xhtml">[2]</a>, <a href="Chap01.xhtml">[3]</a>, <a href="Chap02.xhtml">[4]</a>, <a href="Chap03.xhtml">[5]</a>

With Front Matter

Example, <a href="Preface.xhtml">[i]</a>, <a href="Introduction.xhtml">[ii]</a>, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>

Without Front Matter

Example, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>

I don't know how the edge case would work of many people shifting traditional Front Matter to the back of ebook files... or if somehow the Index generator can be revamped to take into account "Real Page Numbers" (RPNs).

Quote:
Originally Posted by KevinH View Post
That should give the user more control over what gets indexed and what does not at the file level and follows the guidelines provided by the MIT Press for generating indexes.
Again, see my image above. I think something along those lines would be an ok way to include/exclude certain files from Indexing.

I am also scratching my head at MIT's guidelines... according to all of the books I have digitized, Footnotes and Endnotes are indexed quite often. I am most familiar with the formats:

###n [page number + n]
###n## [page number + n + footnote number]

or similar variants. Of course, that specific naming scheme isn't applicable to Sigil's current Index generation though.

Here is the relevant piece in the 16th Edition of the Chicago Manual of Style:

16.110 Indexing footnotes and endnotes. Notes, whether footnotes or endnotes, should be indexed only if they continue or amplify discussion in the text (substantive notes). Notes that merely contain source citations documenting statements in the text (reference notes) need not be indexed.

16.111 Endnote locators in index entries. Endnotes in printed works are referred to by page, the letter n (for note), and-extremely important-the note number, with no internal space (334n14). If two or more consecutive notes are referred to, two n's and an en dash are used (e.g., 334nn14-16). Nonconsecutive notes on the same page are treated separately (334n14, 334n16, 334n19). Occasionally, when reference to a note near the end of one chapter of a book is followed by reference to a note near the beginning of the next, nonchronological order will result (334n19, 334n2). To avoid the appearance of error, the chapter number may be added in parentheses after the lower note number.

cats, 334n19, 334n2 (chap. 9), 335n5

16.112 Footnote locators in index entries. Footnotes in a printed work are generally referred to in the same way as endnotes. When a footnote is the only one on the page, however, the note number (or symbol, if numbers are not used) may be omitted (156n). Note numbers should never be omitted when several notes appear on the same page. If there is indexable material in a text passage and in a related footnote, only the page number need be given. But if the text and the footnote materials are not connected, both text and note should be cited (156, 156n, 278, 278n30).

Last edited by Tex2002ans; 05-03-2017 at 07:48 PM.
Tex2002ans is offline   Reply With Quote