Quote:
Originally Posted by roger64
2. Letters with diacritics classed at the end of the alphabetical order.
|
That is the default sort order, if words are sorted by character codes, since the character code for â (226/00E0) is higher than the character code for a (97/0061). Most likely the index generation code doesn't do locale-specific sorting.
@KevinH: Does the Sigil index generation code use built-in c++ sorting functions that allow you to specify a locale for sorting? If so would it be possible to use the language defined in the epub metadata as the locale?
@roger64: As a work-around you could add the unaccented version of the index entry in the index entries field. For example:
Code:
Text to include Index entries
âge age
Of course, you'd have fix the spelling of the index entry in the generated index afterwards.
BTW, there's a Python package that'll automatically transform accented characters to unaccented characters:
Unidecode. (IIRC, this package is also used by Calibre for transliterating non-Latin alphabets.)
Since all index entries are stored in a text file (sigil_index.ini), you might be able to write a simple Python script that'll add the unaccented version as the second entry.
This might also be a good first Sigil plugin project. For example, you could access
sigil_index.ini and display all index entries from a Sigil plugin as follows:
Spoiler:
Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals, division, absolute_import, print_function
import os, sys
PY2 = sys.version_info[0] == 2
if PY2:
import ConfigParser as configparser
else:
import configparser
# main routine
def run(bk):
# get sigil_index.ini path
index_ini = os.path.abspath(os.path.join(bk._w.usrsupdir, 'sigil_index.ini'))
print('sigil_index.ini path:', index_ini)
# read values
config = configparser.ConfigParser(allow_no_value = True)
config.read(index_ini)
number_of_entries = config.getint('index_entries', 'size')
# print entries
for index_entry in range(1, number_of_entries + 1):
if PY2:
entry = unicode(config.get('index_entries', str(index_entry) + '\Text%20to%20Include'), 'unicode-escape')
else:
entry = bytes(config.get('index_entries', str(index_entry) + '\Text%20to%20Include'), "utf-8").decode("unicode_escape")
print(entry)
def main():
print('I reached main when I should not have\n')
return -1
if __name__ == "__main__":
sys.exit(main())