MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Create Index (https://www.mobileread.com/forums/showthread.php?t=277022)

brolny 08-07-2016 09:10 AM

Create Index
 
Sigil on "Tools -> Index -> Create Index" also includes entries from the NAV file into the index file. Is there a way to ignore NAV file during index creating?

Also, can you, please, word with soft hyphen and without it - consider as the same for the indexing.

Thanks a lot.

DiapDealer 08-07-2016 10:13 AM

There is no way to ignore the NAV when using the auto Create Index feature (nor is there a way to ignore the copyright page, or the half-title, or the epigraph, or the dedication, etc...). You can, of course, edit the index to remove any entries you don't want after creating it (or manually create the index with exactly the item you want). It's not an unreasonable request to be able to ignore files in the auto-creation process, though. We'll look into it.

As for the second request: as far as I'm concerned, you've just been handed one more reason why littering an ebook's text with soft-hyphens is a terrible idea for ebook creators. ;)

KevinH 08-07-2016 11:09 AM

As I mentioned once before, adding in soft hyphens should be the very last step when producing an ebook and only done if the targeted downstream platform supports it.

I recommend using search and replace to remove all soft hyphens first. Then build your epub, tweak it to your hearts content, validate it, and then as a final step use a hyphenator to put soft hyphens back if you truly need them.

As DiapDealer says, we will look into excluding the nav from index creation, but you can always edit the index after creation.

KevinH

KevinH 08-07-2016 12:36 PM

BTW, we could easily skip any resource that has an appropriate sematic type (guide types in the opf - landmark types in the nav) set. if someone could take a look at the landmarks/guide types and let me know which if any of these should not be indexed, I would be happy to add code to handle that while I am excluding the nav for index generation.

KevinH

brolny 08-07-2016 02:36 PM

Quote:

Originally Posted by DiapDealer (Post 3366639)

As for the second request: as far as I'm concerned, you've just been handed one more reason why littering an ebook's text with soft-hyphens is a terrible idea for ebook creators. ;)

I don't use soft hyphens after you first suggestion. Thanks.
It's hard to leave this my lovely whimsy ;)

Doitsu 08-07-2016 02:41 PM

Quote:

Originally Posted by KevinH (Post 3366724)
if someone could take a look at the landmarks/guide types and let me know which if any of these should not be indexed, I would be happy to add code to handle that while I am excluding the nav for index generation.

How about excluding all items that have a spine itemref linear="no" attribute?
This would finally put this heavily underappreciated attribute to good use. :)

DiapDealer 08-07-2016 02:52 PM

Quote:

Originally Posted by brolny (Post 3366791)
It's hard to leave this my lovely whimsy ;)

I understand completely. ;)

Hyphenation is just one of those things that I think is best left to the end-user. If hyphenation is important enough to them, they'll be using devices and apps that have hyphenation algorithms built into them (or they'll edit the book themselves to insert soft hyphens). And if it's NOT important to them (or they dislike hyphenation), then adding soft-hyphens everywhere could actually prevent them from choosing the reading experience they prefer.

KevinH 08-07-2016 04:13 PM

Hi,
That is a posibility but you may want an index may want to access those items.

So I think controlling this via guide and landmarks settings is better. If someone wants to index one of these special xhtml files, they can toggle on and off the semantic settings before creating the index. It gives easy gui control to the user who may not be up to adding or removing linear attributes in the opf.

I will commit a fix for this, later this evening or tomorrow evening at the latest.

KevinH

Quote:

Originally Posted by Doitsu (Post 3366793)
How about excluding all items that have a spine itemref linear="no" attribute?
This would finally put this heavily underappreciated attribute to good use. :)


Doitsu 08-07-2016 06:47 PM

Quote:

Originally Posted by KevinH (Post 3366813)
That is a posibility but you may want an index may want to access those items.

True, but since only some epub3 apps support linear="no" and linear="no" is usually used for items that are supposed to be somewhat hidden, I thought this would be the easiest solution without adding a dedicated GUI option.

Quote:

Originally Posted by KevinH (Post 3366813)
So I think controlling this via guide and landmarks settings is better.

I may be mistaken, but I thought that you can only mark certain files as an index via guide and landmarks items, but not the files that an index is generated from.

brolny 08-07-2016 06:49 PM

There is one more question. I want to be sure, that all index entities are right and there are no trash in the index.

For example, the term is "Aorist".
So, I replace this word in all files with <span class="dic">Aorist</span>.
Then change/delete/lowercase each unnecessary word and delete the span tag.
By hands, but it's easy to see them all in the book view because of text's style.
Than Sigil adds id="yyyyy" fortunately directly to the spans during creating the index.

Is this ok, or anybody can suggest a better solution for checking?
Thanks a lot.

PS
As I remember, the point like this was discussed for about a year before. Not exactly the same, but about absent of class="..." for index entities from the list and presence for manually added index entities...

DiapDealer 08-07-2016 08:13 PM

Quote:

Originally Posted by KevinH (Post 3366813)
So I think controlling this via guide and landmarks settings is better. If someone wants to index one of these special xhtml files, they can toggle on and off the semantic settings before creating the index.

The only concerns I would have would be the Text (epub2) and Body Matter (epub3) guide/landmarks. Those files would nearly always be included in any indexing. Though as you mention, they can easily be toggled off while the index is generated to give the user control.

Tex2002ans 08-07-2016 08:59 PM

Quote:

Originally Posted by Doitsu (Post 3366892)
True, but since only some epub3 apps support linear="no" and linear="no" is usually used for items that are supposed to be somewhat hidden, I thought this would be the easiest solution without adding a dedicated GUI option.

Or similar to how there is a class "sigil_not_in_toc", there could be a class called "sigil_not_in_index".

Or there could be a GUI similar to "Tools - Generate Table of Contents", where you could check/uncheck which files to include in the Index.

KevinH 08-07-2016 09:20 PM

No exactly ...
Only a specific list of selected guides types/landmarks would be auto disabled for index generation and that would NOT include Text or BodyMatter, and would instead include things like title-page and etc. That is why I was asking people which of the guide types and landmarks should be automatically excluded from index generation.

If someone instead decided they want those autoexcluded files to actually be included in the index generation, they could simply unset their semantics, before generating the index.

Thereby no gui is needed. It already exists in the form of setting and unsetting guide types/landmarks.

All that is left to do then is look at the list of all possible guide types/landmarks and decide which ones should not automatically be included when building an index.

For example, for epub2, would this be a good list of types to automatically exclude from index generation ...

Code:

GuideItems to remove from index
        acknowledgements
        bibliography
        colophon
        copyright-page
        dedication
        foreword
        glossary
        index
        loi
        lot
        preface
        title-page
        toc

I can build a similar list from the epub3 landmarks too.

I just wanted input about the list of ones to autoexclude.

KevinH

DiapDealer 08-07-2016 09:54 PM

Gotcha. I misunderstood.
I don't really use the index feature so I'll let those who do give their input on which specific ones make the most sense to auto-exclude.

HarryT 08-08-2016 07:09 AM

Quote:

Originally Posted by Doitsu (Post 3366793)
How about excluding all items that have a spine itemref linear="no" attribute?
This would finally put this heavily underappreciated attribute to good use. :)

I've never come across that attribute - what does it do? Omit an item from the linear reading sequence and leave it only accessible from the TOC?


All times are GMT -4. The time now is 07:11 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.