Quote:
Originally Posted by KevinH
All of the other semantic types in both epub2 and epub3 represent front matter, back matter, notes/footnotes, appendices, toc/indexes, etc and these should not be indexed by default.
|
Seems like reasonable assumptions.
Side Note: Being able to Ignore certain files would also be helpful in the
Tools > Spellcheck.
Sometimes you don't want to include words in places like the Cover/Title/Copyright/TOC/Index, and just want to focus on the main text itself. This is a much worse issue in Calibre though, because that also looks through many non-HTML files like the content.opf + toc.ncx. Plus Calibre's Spellchecking includes numbers (you can imagine how many numbers show up in an Index and drown out legitimate numbers in the main text itself).
I think that would work better as a little GUI checkbox though:
(You can tell my artistic skills are impeccable.
)
Quote:
Originally Posted by KevinH
What is the right/official thing to do here? Should all front matter be excluded by an index generator or not?
|
Front matter being indexed depends on the publisher/book.
As you can see at MIT's recommendations, their indexes typically focuses on the main matter of the book. But some may include the Foreword/Introduction/Preface (Roman Numeral pages). I have seen all types.
Here is a portion of the 16th Edition of the Chicago Manual of Style, "Chapter 16: Indexes":
What Parts of the Work to Index
16.109 Indexing the text, front matter, and back matter. The entire text of a book or journal article, including substantive content in notes (see 16.110), should be indexed. Much of the front matter, however, is not indexable-title page, dedication, epigraphs, lists of illustrations and tables, and acknowledgments. A preface, or a foreword by someone other than the author of the work, may be indexed if it concerns the subject of the work and not simply how the work came to be written. A true introduction, whether in the front matter or, more commonly, in the body of the work, is always indexed (for introduction versus preface, see 1.42). Book appendixes should be indexed if they contain information that supplements the text, but not if they merely reproduce documents that are discussed in the text (the full text of a treaty, for example, or a questionnaire). Appendixes to journal articles are indexed as part of the articles. Glossaries, bibliographies, and other such lists are usually not indexed.
-----
Another potential/alternate solution might also be allowing Sigil to index Front Matter, but change the numbering scheme:
Current Output
Example, <a href="Preface.xhtml">[1]</a>, <a href="Introduction.xhtml">[2]</a>, <a href="Chap01.xhtml">[3]</a>, <a href="Chap02.xhtml">[4]</a>, <a href="Chap03.xhtml">[5]</a>
With Front Matter
Example, <a href="Preface.xhtml">[i]</a>, <a href="Introduction.xhtml">[ii]</a>, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>
Without Front Matter
Example, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>
I don't know how the edge case would work of many people shifting traditional Front Matter to the back of ebook files... or if somehow the Index generator can be revamped to take into account "Real Page Numbers" (RPNs).
Quote:
Originally Posted by KevinH
That should give the user more control over what gets indexed and what does not at the file level and follows the guidelines provided by the MIT Press for generating indexes.
|
Again, see my image above. I think something along those lines would be an ok way to include/exclude certain files from Indexing.
I am also scratching my head at MIT's guidelines... according to all of the books I have digitized, Footnotes and Endnotes are indexed quite often. I am most familiar with the formats:
###
n [page number +
n]
###
n## [page number +
n + footnote number]
or similar variants. Of course, that specific naming scheme isn't applicable to Sigil's current Index generation though.
Here is the relevant piece in the 16th Edition of the Chicago Manual of Style:
16.110 Indexing footnotes and endnotes. Notes, whether footnotes or endnotes, should be indexed only if they continue or amplify discussion in the text (substantive notes). Notes that merely contain source citations documenting statements in the text (reference notes) need not be indexed.
16.111 Endnote locators in index entries. Endnotes in printed works are referred to by page, the letter n (for note), and-extremely important-the note number, with no internal space (334n14). If two or more consecutive notes are referred to, two n's and an en dash are used (e.g., 334nn14-16). Nonconsecutive notes on the same page are treated separately (334n14, 334n16, 334n19). Occasionally, when reference to a note near the end of one chapter of a book is followed by reference to a note near the beginning of the next, nonchronological order will result (334n19, 334n2). To avoid the appearance of error, the chapter number may be added in parentheses after the lower note number.
cats, 334n19, 334n2 (chap. 9), 335n5
16.112 Footnote locators in index entries. Footnotes in a printed work are generally referred to in the same way as endnotes. When a footnote is the only one on the page, however, the note number (or symbol, if numbers are not used) may be omitted (156n). Note numbers should never be omitted when several notes appear on the same page. If there is indexable material in a text passage and in a related footnote, only the page number need be given. But if the text and the footnote materials are not connected, both text and note should be cited (156, 156n, 278, 278n30).