MobileRead Forums - View Single Post

Doitsu · 12-25-2015, 09:56 AM

Quote:

Originally Posted by roger64

2. Letters with diacritics classed at the end of the alphabetical order.

That is the default sort order, if words are sorted by character codes, since the character code for â (226/00E0) is higher than the character code for a (97/0061). Most likely the index generation code doesn't do locale-specific sorting.

@KevinH: Does the Sigil index generation code use built-in c++ sorting functions that allow you to specify a locale for sorting? If so would it be possible to use the language defined in the epub metadata as the locale?

@roger64: As a work-around you could add the unaccented version of the index entry in the index entries field. For example:

Code:

Text to include Index entries
âge             age

Of course, you'd have fix the spelling of the index entry in the generated index afterwards.

BTW, there's a Python package that'll automatically transform accented characters to unaccented characters: Unidecode. (IIRC, this package is also used by Calibre for transliterating non-Latin alphabets.)

Since all index entries are stored in a text file (sigil_index.ini), you might be able to write a simple Python script that'll add the unaccented version as the second entry.

This might also be a good first Sigil plugin project. For example, you could access sigil_index.ini and display all index entries from a Sigil plugin as follows:

Spoiler: