Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-02-2017, 04:28 PM   #1
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
Forcing indexing to skip a block of text?

Hi folks,

Sorry to be back so soon with another indexing question, should be the last one (for a while at least!)

Whilst nosing about in Sigils code view I noticed there is a class called "sigil_index_marker" used to flag index entries from within the epub content.

Does the opposite exist i.e. a class that can be applied to a block of text, maybe with a <div> or similar to tell the indexing tool to skip the marked block when looking for text from the index editor?

Failing that, is there a way to tell the indexing tool to skip a file or similar?

Reason being, I'm in the process of adding a "List of Illustrations" to my epub and don't want the list content included in the index even if it matches the index editor text.

I realise that I could use the "sigil_index_marker" class and mark all the occurrences I want to include but this would mean upwards of 1000 entries to manually flag - ouch!

Hope this makes sense...

Andy B.
andyb is offline   Reply With Quote
Advert
Old 05-02-2017, 10:00 PM   #2
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
The indexer will skip "toc" guide items in the opf in epub2 and the nav in epub3.
So by manually playing with the opf guide items you might be able to force it to ignore a file.
KevinH is offline   Reply With Quote
Old 05-03-2017, 01:42 AM   #3
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
Many thanks, I'll give it a try...

Andy B.
andyb is offline   Reply With Quote
Old 05-03-2017, 02:53 AM   #4
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
Ops - close but not quite cigar worthy ...

My "List of Illustrations" is now fixed, its content is not appearing in the index, but I now have text from my ToC appearing...

A ToC snippet looks like this before indexing:
Code:
  <h3 class="sigil_not_in_toc">12. <a href="../Text/12-green-horrors.xhtml">Green Horrors</a></h3>

  <ul>
    <li>What constitutes a horror? Opinions vary…</li>
    <li>Japanese knotweed</li>
    <li>Giant hogweed</li>
    <li>A succession of pondweeds</li>
    <li>New Zealand pigmyweed</li>
    <li>Azolla</li>
    <li>Least duckweed</li>
    <li>Floating pennywort</li>
    <li>Himalayan balsam</li>
    <li>Rhododendron ponticum</li>
  </ul>
The unordered list is my own addition because a high level summary was required in the ToC for which there were no headings in the main text.

And this after indexing:
Code:
    .
    .
    .
    <li>Least duckweed</li>
    <li>Floating pennywort</li>
    <li>Himalayan balsam</li>
    <li id="sigil_index_id_1">Rhododendron ponticum</li>
  </ul>
"Rhododendron ponticum" is in my index editor

The relevant part of my tweaked (as suggested) content.opf file looks like this:
Code:
  <guide>
    <reference type="copyright-page" title="Copyright Page" href="Text/copyright.xhtml"/>
    <reference type="dedication" title="Dedication" href="Text/dedication.xhtml"/>
    <reference type="toc" title="Table of Content" href="Text/contents.xhtml"/>
    <reference type="toc" title="Table of Content" href="Text/illustrations.xhtml"/>
    <reference type="preface" title="Preface" href="Text/preface.xhtml"/>
    <reference type="acknowledgements" title="Acknowledgements" href="Text/acknowledgements.xhtml"/>
    <reference type="text" title="Text" href="Text/01-early-doors.xhtml"/>
    <reference type="text" title="Text" href="Text/02-chill-out.xhtml"/>
    .
    .
    .
Seems odd that stuff that was in illustrations.xhtml that was causing a problem appears fixed whilst stuff in contents.xhtml that was fine now appears broken!

I'm confused...

This is starting to get quite involved, I'm happy to take the discussion off list if required...

Andy B.

Last edited by andyb; 05-03-2017 at 05:19 AM. Reason: Not quite as fixed as I thought it was!
andyb is offline   Reply With Quote
Old 05-03-2017, 10:56 AM   #5
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
I assumed you had removed the old toc from the document first and just wanted to hide one other file. The epub2 guide items allow only one true toc not multiple.

That said, should an index ever index front matter? Are dedications ever indexed? Perhaps we should modify Sigil to skip all guide types that map to any type of front matter? That could be easily done.

What is the right/official thing to do here? Should all front matter be excluded by an index generator or not?
KevinH is offline   Reply With Quote
Advert
Old 05-03-2017, 12:10 PM   #6
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
Quote:
Originally Posted by KevinH View Post
I assumed you had removed the old toc from the document first and just wanted to hide one other file. The epub2 guide items allow only one true toc not multiple.
Shame - though re-visiting the Semantics types I've just spotted "List of Illustrations" which I had missed before - I'll go and play with that. At least I'll be back with one ToC...

Quote:
Originally Posted by KevinH View Post
That said, should an index ever index front matter? Are dedications ever indexed? Perhaps we should modify Sigil to skip all guide types that map to any type of front matter? That could be easily done.

What is the right/official thing to do here? Should all front matter be excluded by an index generator or not?
I'm sorely tempted to say "exclude all front matter" however I'm sure that someone will disagree and I have no wish to start a flame war so...

As the Semantics can be set for each file by right clicking on the filename and selecting "Add Semantics" How about controlling the index tool access by Semantic type.

A new panel could be created in "Preferences" showing a list of the available semantic types and allowing a tick box for each one to say "Include files of this type in the Index" - maximum flexibility achieved with (fingers crossed!) not too much effort and everybody happy? (he said - running for cover!)

Just out of interest where does the semantic list come from - I couldn't spot it in the epub2 standard?

Andy B.
andyb is offline   Reply With Quote
Old 05-03-2017, 12:33 PM   #7
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
http://www.idpf.org/epub/20/spec/OPF...htm#Section2.6
KevinH is offline   Reply With Quote
Old 05-03-2017, 12:39 PM   #8
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
Its an idea but changing a semantic type is easy and indexing should only be generated once the book is complete. So having a list of front matter guide types/nav types that are not indexed would work. If you decide you want to index those semantic types for any reason, you can temporarily toggle them off and then reenable them after the index is built.

Should work easily that way if we can agree which on semantic types should default to not being indexed.
KevinH is offline   Reply With Quote
Old 05-03-2017, 12:48 PM   #9
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
According to the MIT Press on how to create an index...

Quote:
Guidelines for Preparing an Index
The purpose of the index is to give the reader an informative, balanced portrait of what is in the book and a concise, useful guide to all pertinent facts in the book. These facts, in the form of an alphabetically ordered list of main entries and subentries, will include both proper names and subjects.

WHAT TO INDEX

As a general rule, only the body of the text is indexed. Front matter, back matter (glossary, bibliography, appendixes, notes, etc.), and footnotes are not usually indexed. Possible exceptions are an introduction that has been placed in the front matter, endnotes or footnotes that contribute substantively to the discussion, and appendixes that do much more than document the text. Figures, tables, and charts in the text are indexed lightly—only when items contribute significantly to the text discussion. Exceptions to this rule are certain art or architecture books that require a thorough coverage of illustrations in the index.
KevinH is offline   Reply With Quote
Old 05-03-2017, 12:52 PM   #10
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
So to be consistent, we should default to excluding all front matter semantic tags. If an developer instead wants to index it, them can temporarily toggle off that semantic type before creating that index.

I will look into modifying Sigil's code to do that.
KevinH is offline   Reply With Quote
Old 05-03-2017, 12:52 PM   #11
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
That'll teach me to play - I think I've found a bug!

There appears to be an issue with Right-click-on-filename>Add Semantics - it appears to be toggling the semantics rather than setting them and not telling you what it's doing.

For example: on a clean start of Sigil file "Section0001.xhtml" has no semantic set and the <guide> tag in content.opf is empty

If I set the file Section0001.xhtml to have a semantic type of "text" the correct entry is created in content.opf
Code:
  <guide>
    <reference type="text" title="Text" href="Text/Section0001.xhtml"/>
  </guide>
If I repeat the exact same process and apply "text" to "Section0001.hxtml" again the entry in <guide> vanishes.

This is obviously a new meaning for the word "Add" that I previously wasn't aware of

Not realizing this was happening initially caused no end of confusion!

Andy B.
andyb is offline   Reply With Quote
Old 05-03-2017, 01:23 PM   #12
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
Yes, it is and always has been a toggle. If you have already added something it will turn it back off. This is how Sigil has always worked. If you want to see if a particular file has something already set, you can do one of the following:

- Mouse Over the File name in the BookBrowser Window
- Create a Report
- Edit the content.opf
KevinH is offline   Reply With Quote
Old 05-03-2017, 02:50 PM   #13
KevinH
Wizard
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 2,374
Karma: 756400
Join Date: Nov 2009
Device: many
Okay, I have gone over the set of possible epub2 guide types and epub3 landmarks (ie. semantics) and can see that by default we should be only indexing files with no semantics set at all OR that have the following semantic types:

Quote:
"text", "bodymatter", "chapter", "conclusion", "division", "epilogue", "introduction", "part", "preamble", "subchapter", "prologue", "volume"
All of the other semantic types in both epub2 and epub3 represent front matter, back matter, notes/footnotes, appendices, toc/indexes, etc and these should not be indexed by default.

If an author decides they want to index one of these files, they they can simply toggle off the semantics before running the index generator and then toggle it back on.

That should give the user more control over what gets indexed and what does not at the file level and follows the guidelines provided by the MIT Press for generating indexes.

I will commit that change now so that it will be available in the next release.
KevinH is offline   Reply With Quote
Old 05-03-2017, 04:15 PM   #14
andyb
Member
andyb began at the beginning.
 
andyb's Avatar
 
Posts: 14
Karma: 10
Join Date: May 2015
Location: Kent, UK
Device: Kobo Aura
Many thanks for the time and effort...

.

Andy B.
andyb is offline   Reply With Quote
Old 05-03-2017, 07:14 PM   #15
Tex2002ans
Guru
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 812
Karma: 3849999
Join Date: Jul 2012
Device: Nook
Quote:
Originally Posted by KevinH View Post
All of the other semantic types in both epub2 and epub3 represent front matter, back matter, notes/footnotes, appendices, toc/indexes, etc and these should not be indexed by default.
Seems like reasonable assumptions.

Side Note: Being able to Ignore certain files would also be helpful in the Tools > Spellcheck.

Sometimes you don't want to include words in places like the Cover/Title/Copyright/TOC/Index, and just want to focus on the main text itself. This is a much worse issue in Calibre though, because that also looks through many non-HTML files like the content.opf + toc.ncx. Plus Calibre's Spellchecking includes numbers (you can imagine how many numbers show up in an Index and drown out legitimate numbers in the main text itself).

I think that would work better as a little GUI checkbox though:

Click image for larger version

Name:	CrudeIncludeGUI.png
Views:	16
Size:	4.8 KB
ID:	156507

(You can tell my artistic skills are impeccable. )

Quote:
Originally Posted by KevinH View Post
What is the right/official thing to do here? Should all front matter be excluded by an index generator or not?
Front matter being indexed depends on the publisher/book.

As you can see at MIT's recommendations, their indexes typically focuses on the main matter of the book. But some may include the Foreword/Introduction/Preface (Roman Numeral pages). I have seen all types.

Here is a portion of the 16th Edition of the Chicago Manual of Style, "Chapter 16: Indexes":

What Parts of the Work to Index

16.109 Indexing the text, front matter, and back matter. The entire text of a book or journal article, including substantive content in notes (see 16.110), should be indexed. Much of the front matter, however, is not indexable-title page, dedication, epigraphs, lists of illustrations and tables, and acknowledgments. A preface, or a foreword by someone other than the author of the work, may be indexed if it concerns the subject of the work and not simply how the work came to be written. A true introduction, whether in the front matter or, more commonly, in the body of the work, is always indexed (for introduction versus preface, see 1.42). Book appendixes should be indexed if they contain information that supplements the text, but not if they merely reproduce documents that are discussed in the text (the full text of a treaty, for example, or a questionnaire). Appendixes to journal articles are indexed as part of the articles. Glossaries, bibliographies, and other such lists are usually not indexed.

-----

Another potential/alternate solution might also be allowing Sigil to index Front Matter, but change the numbering scheme:

Current Output

Example, <a href="Preface.xhtml">[1]</a>, <a href="Introduction.xhtml">[2]</a>, <a href="Chap01.xhtml">[3]</a>, <a href="Chap02.xhtml">[4]</a>, <a href="Chap03.xhtml">[5]</a>

With Front Matter

Example, <a href="Preface.xhtml">[i]</a>, <a href="Introduction.xhtml">[ii]</a>, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>

Without Front Matter

Example, <a href="Chap01.xhtml">[1]</a>, <a href="Chap02.xhtml">[2]</a>, <a href="Chap03.xhtml">[3]</a>

I don't know how the edge case would work of many people shifting traditional Front Matter to the back of ebook files... or if somehow the Index generator can be revamped to take into account "Real Page Numbers" (RPNs).

Quote:
Originally Posted by KevinH View Post
That should give the user more control over what gets indexed and what does not at the file level and follows the guidelines provided by the MIT Press for generating indexes.
Again, see my image above. I think something along those lines would be an ok way to include/exclude certain files from Indexing.

I am also scratching my head at MIT's guidelines... according to all of the books I have digitized, Footnotes and Endnotes are indexed quite often. I am most familiar with the formats:

###n [page number + n]
###n## [page number + n + footnote number]

or similar variants. Of course, that specific naming scheme isn't applicable to Sigil's current Index generation though.

Here is the relevant piece in the 16th Edition of the Chicago Manual of Style:

16.110 Indexing footnotes and endnotes. Notes, whether footnotes or endnotes, should be indexed only if they continue or amplify discussion in the text (substantive notes). Notes that merely contain source citations documenting statements in the text (reference notes) need not be indexed.

16.111 Endnote locators in index entries. Endnotes in printed works are referred to by page, the letter n (for note), and-extremely important-the note number, with no internal space (334n14). If two or more consecutive notes are referred to, two n's and an en dash are used (e.g., 334nn14-16). Nonconsecutive notes on the same page are treated separately (334n14, 334n16, 334n19). Occasionally, when reference to a note near the end of one chapter of a book is followed by reference to a note near the beginning of the next, nonchronological order will result (334n19, 334n2). To avoid the appearance of error, the chapter number may be added in parentheses after the lower note number.

cats, 334n19, 334n2 (chap. 9), 335n5

16.112 Footnote locators in index entries. Footnotes in a printed work are generally referred to in the same way as endnotes. When a footnote is the only one on the page, however, the note number (or symbol, if numbers are not used) may be omitted (156n). Note numbers should never be omitted when several notes appear on the same page. If there is indexable material in a text passage and in a related footnote, only the page number need be given. But if the text and the footnote materials are not connected, both text and note should be cited (156, 156n, 278, 278n30).

Last edited by Tex2002ans; 05-03-2017 at 07:48 PM.
Tex2002ans is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Keeping text together (block vs. page-break-inside:avoid) Psymon ePub 2 10-12-2014 09:56 AM
Isolated Block of Text crutledge Sigil 10 06-22-2013 02:49 PM
How to make Amazon Kindle Text to Speech skip over some text xsaero00 Kindle Developer's Corner 3 06-18-2011 07:09 PM
Forcing monocrome text in browser? kdgarris Kindle Developer's Corner 0 06-13-2011 10:26 AM


All times are GMT -4. The time now is 10:28 PM.


MobileRead.com is a privately owned, operated and funded community.