MobileRead Forums - View Single Post

roger64 · 12-24-2015, 11:54 PM

A practical test

Hi

I did a test using 30 entries (out of a 550 medium-size index) from a French book. It was mostly successful though some quirks seem to need to be ironed out. Please find attached here in the zip file:
- petiteliste.txt which is the text file I used to import and generate the index
- index.xhtml which is the index output that Sigil 9.2. (Linux Arch build) produced.
You can reproduce it by copying the text list in any EPUB text file. I also provided a screenshot of the index file in Sigil.

I - Two points that probably need to be considered.

1. Entries sorted out of the beginning of the alphabetical order.

For no reason than I can explain, some entries were classed right at the beginning of the alphabetical order under no heading. Here you can see it happen for Armani, but not Acton and for apparat and ascot but for not other words beginning with an a.

In the medium-sized index, this phenomenon did happen too and the entries concerned (between 5 to 10% of the total amount) seemed to be chosen randomly.

2. Letters with diacritics classed at the end of the alphabetical order.

Some words beginning with an accented letter (like âge or écossais) are sorted out under their own heading at the end of the alphabetical order. For the French language at least, they should be classed like if there was no diacritic: for example to be on the light side, in a French dictionary you can find âge between say affreux and agonie.

II - Practical tips.

3. Using exclusively the vertical bar (pipe).

In previous tries, I had problems with "Text to include" entries containing these very valid regex: (?i), [es] which were not dealt with but reproduced as such in the entry list. I dropped them totally and used exclusively the pipe (|) which works quite reliably for processing the index and can cope with all the cases I need.

I do not know if this problem comes from my Linux build or can be reproduced with other instances of Sigil.

4. Using a fixed value for the tab.

I used a Linux text editor named gedit to prepare the text file. I found handy to specify it in the Preferences to keep a fixed value for the tab (I chose 24). This allows for a more unified presentation.