View Single Post
Old 12-24-2015, 11:54 PM   #30
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
A practical test

Hi

I did a test using 30 entries (out of a 550 medium-size index) from a French book. It was mostly successful though some quirks seem to need to be ironed out. Please find attached here in the zip file:
- petiteliste.txt which is the text file I used to import and generate the index
- index.xhtml which is the index output that Sigil 9.2. (Linux Arch build) produced.
You can reproduce it by copying the text list in any EPUB text file. I also provided a screenshot of the index file in Sigil.

I - Two points that probably need to be considered.

1. Entries sorted out of the beginning of the alphabetical order.

For no reason than I can explain, some entries were classed right at the beginning of the alphabetical order under no heading. Here you can see it happen for Armani, but not Acton and for apparat and ascot but for not other words beginning with an a.

In the medium-sized index, this phenomenon did happen too and the entries concerned (between 5 to 10% of the total amount) seemed to be chosen randomly.

2. Letters with diacritics classed at the end of the alphabetical order.

Some words beginning with an accented letter (like âge or écossais) are sorted out under their own heading at the end of the alphabetical order. For the French language at least, they should be classed like if there was no diacritic: for example to be on the light side, in a French dictionary you can find âge between say affreux and agonie.

II - Practical tips.

3. Using exclusively the vertical bar (pipe).

In previous tries, I had problems with "Text to include" entries containing these very valid regex: (?i), [es] which were not dealt with but reproduced as such in the entry list. I dropped them totally and used exclusively the pipe (|) which works quite reliably for processing the index and can cope with all the cases I need.

I do not know if this problem comes from my Linux build or can be reproduced with other instances of Sigil.

4. Using a fixed value for the tab.

I used a Linux text editor named gedit to prepare the text file. I found handy to specify it in the Preferences to keep a fixed value for the tab (I chose 24). This allows for a more unified presentation.
Attached Thumbnails
Click image for larger version

Name:	indexscreenshot.png
Views:	206
Size:	45.4 KB
ID:	144888  
Attached Files
File Type: zip index.zip (2.3 KB, 110 views)

Last edited by roger64; 12-25-2015 at 05:56 AM. Reason: pipe and alphabetical order
roger64 is offline   Reply With Quote