View Single Post
Old 12-25-2015, 08:56 AM   #32
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,737
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by roger64 View Post
2. Letters with diacritics classed at the end of the alphabetical order.
That is the default sort order, if words are sorted by character codes, since the character code for â (226/00E0) is higher than the character code for a (97/0061). Most likely the index generation code doesn't do locale-specific sorting.

@KevinH: Does the Sigil index generation code use built-in c++ sorting functions that allow you to specify a locale for sorting? If so would it be possible to use the language defined in the epub metadata as the locale?

@roger64: As a work-around you could add the unaccented version of the index entry in the index entries field. For example:

Code:
Text to include Index entries
âge             age
Of course, you'd have fix the spelling of the index entry in the generated index afterwards.

BTW, there's a Python package that'll automatically transform accented characters to unaccented characters: Unidecode. (IIRC, this package is also used by Calibre for transliterating non-Latin alphabets.)

Since all index entries are stored in a text file (sigil_index.ini), you might be able to write a simple Python script that'll add the unaccented version as the second entry.

This might also be a good first Sigil plugin project. For example, you could access sigil_index.ini and display all index entries from a Sigil plugin as follows:

Spoiler:

Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals, division, absolute_import, print_function
import os, sys

PY2 = sys.version_info[0] == 2

if PY2:
    import ConfigParser as configparser
else:
    import configparser

# main routine
def run(bk):
    # get sigil_index.ini path
    index_ini = os.path.abspath(os.path.join(bk._w.usrsupdir, 'sigil_index.ini'))
    print('sigil_index.ini path:', index_ini)
    
    # read values 
    config = configparser.ConfigParser(allow_no_value = True)
    config.read(index_ini)
    number_of_entries = config.getint('index_entries', 'size')
    
    # print entries
    for index_entry in range(1, number_of_entries + 1):
        if PY2:
            entry = unicode(config.get('index_entries', str(index_entry) + '\Text%20to%20Include'), 'unicode-escape')
        else:
            entry = bytes(config.get('index_entries', str(index_entry) + '\Text%20to%20Include'), "utf-8").decode("unicode_escape")
        print(entry)

def main():
    print('I reached main when I should not have\n')
    return -1

if __name__ == "__main__":
    sys.exit(main())
Doitsu is offline   Reply With Quote