![]() |
#31 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,788
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Please verify that the words sorted before the first A entry did not include any leading whitespace including nbsp, thin spaces, etc.
|
![]() |
![]() |
![]() |
#32 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
@KevinH: Does the Sigil index generation code use built-in c++ sorting functions that allow you to specify a locale for sorting? If so would it be possible to use the language defined in the epub metadata as the locale? @roger64: As a work-around you could add the unaccented version of the index entry in the index entries field. For example: Code:
Text to include Index entries âge age BTW, there's a Python package that'll automatically transform accented characters to unaccented characters: Unidecode. (IIRC, this package is also used by Calibre for transliterating non-Latin alphabets.) Since all index entries are stored in a text file (sigil_index.ini), you might be able to write a simple Python script that'll add the unaccented version as the second entry. This might also be a good first Sigil plugin project. For example, you could access sigil_index.ini and display all index entries from a Sigil plugin as follows: Spoiler:
|
|
![]() |
![]() |
![]() |
#33 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,788
Karma: 6000000
Join Date: Nov 2009
Device: many
|
Doitsu,
The complete entries are never actually sorted. Entries are built "sorted" by being inserted in order into the IndexEditorModel by QString comparison (so by unicode character value). Any fixups need to be done inside the IndexEditor by the user before the Index itself is generated. |
![]() |
![]() |
![]() |
#34 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
The scrambling of entries names has been done during the index generation process. As it seems to concern from 5 to 10% of entries, there is no way I could consider reordering them manually. Hopefully it would be possible to find a way to sort again the entries names once the index file has been processed (and this time while taking into account the locale specs and the above defect?) As for writing a plugin, sorry but this is way beyond my technical knowledge. ![]() Last edited by roger64; 12-25-2015 at 07:55 PM. |
|
![]() |
![]() |
![]() |
#35 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,788
Karma: 6000000
Join Date: Nov 2009
Device: many
|
roger64,
As I can not recreate the out of order before A part, at my end will you please create a small test epub that will recreate this issue, so that I can see what might be causing it. It could be caused by whitespace or newlines being captured as part of a pattern or inherent given layout of the tags or spans involved. I noticed that each entry generated before the A has a / in your wordlist. as opposed to a | How or why are you using the / in that way, The first part before the / is supposed to be the actual category name while the part after the / is supposed to be the entry name (which can be left blank if needed and then the entry itself should be used. As far as I can tell from your sample you are using it backwards (at least I think so). I almost never use indexing, so maybe I am the one backwards here. But that is my reading from the online help Doitsu pointed us at. Last edited by KevinH; 12-25-2015 at 09:42 PM. |
![]() |
![]() |
![]() |
#36 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for taking care of this.
I will build one complete test EPUB around this list and also the text list to use. I think the obvious solution, if we wish to take into account some locale specs without too much trouble, is to let the user provide an already sorted text list of entries, including delineations for the index-new-letters to be used. |
![]() |
![]() |
![]() |
#37 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Here it is.
This is what I did: I wanted to provide a real book and not a two page test book. So, I selected one of the EPUB2 I produced this year (in French), I inserted the words or expressions of the petiteliste.txt without taking any care for meaning and saved. So I apologize for those who wish to read it, they will face some understanding problems... ![]() Then, opening Sigil 9.2 (Archlinux x64 build), I inserted the text file in the window of the index editor, saved and close it. Then I generated the index. Result: the same phenomenon occurred: this time I had two entries (instead of three) in the beginning before the a, and the French diacritic was placed at the end, so the two main problems are confirmed with this Sigil build. |
![]() |
![]() |
![]() |
#38 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
The problem with out of order index entries appears to have been caused by two tab characters in row in the petitelist.txt index file. Here's what it looks like after the import on a Windows machine:
|
![]() |
![]() |
![]() |
#39 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@Doitsu
So this is my mistake and for this please accept my apologies. I just hope this mistake will help other people not to repeat it. ![]() I had checked many times the left column and saw nothing suspicious. I also thought I had inserted only one tab of fixed length per line. This is another -easy- practical tip to remember. There should be a way to colour the tabs in my text Editor. So there is only the locale question to solve. Last edited by roger64; 12-26-2015 at 03:50 AM. |
![]() |
![]() |
![]() |
#40 | |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,550
Karma: 19500001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
For example, in Spanish the order is similar to French, with all accented versions of vowels being sorted as if they were the plain vowel (but "ņ" is a separate letter, sorted after "n"). In Swedish the letters "å", "ä", "ö" are separate and sorted at the end, after "z". If I have a list of words in English, Spanish and Swedish, how should I sort "nino", "niņa", "ninå"? An English reader would expect "ninå"/"niņa", "nino"; a Spanish speaker would expect "ninå", "nino", "niņa"; a Swedish speaker would expect "niņa", "nino", "ninå". |
|
![]() |
![]() |
![]() |
#41 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
Default locale collation order: Zebra ar förnamn zebra ängel år ögrupp English locale collation order: ängel ar år förnamn ögrupp zebra Zebra Swedish locale collation order: ar förnamn zebra Zebra år ängel ögrupp For example, if you change the last entry to: Code:
âge ages/âges de la vie Code:
Acton
Acton Harold 1
agenda 1
ages
âges de la vie 1
Agnelli
Agnelli Gianni 1
agrafe 1
|
|
![]() |
![]() |
![]() |
#42 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
![]() Not taking into account bilingual texts which probably require two separate indexes, usually a book is published in one main language, and I think the reader expects that the index will follow its rules, even if there are some foreign words. But also, there could be, by courtesy, some hypertext refinements: The same word could be placed in several places. |
![]() |
![]() |
![]() |
#43 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,731
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
What you call "courtesy" actually means a lot of work for developers. However, thanks to the very user-friendly Sigil plugin framework, you could easily add you own custom indexing plugin with all the features that you require. |
|
![]() |
![]() |
![]() |
#44 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
@Doitsu
I surely would appreciate a locale-aware index order since, right now, I cannot see myself publishing a French book index with accented words placed at the end (ā, â, é, č, ę, ô, į, etc.really?) and I can't think of any French publisher who would accept to do it. I will study your proposal and will need some time to test it. As for the "refinements", yes this is extra-work for everybody not only the developers. The foreign words would need to be tagged with their own language which is rarely done. For a few words, this can be done manually. For me, I really don't push for it. Last edited by roger64; 12-26-2015 at 09:05 AM. |
![]() |
![]() |
![]() |
#45 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Quote:
Last edited by roger64; 12-26-2015 at 09:44 AM. Reason: accented |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question about indexing on basic e-reader | bonacker | Amazon Kindle | 9 | 02-01-2015 04:15 AM |
Troubleshooting Indexing | latepaul | Amazon Kindle | 13 | 01-15-2013 05:22 PM |
Question about disable indexing permanently by disabling access to "Search Indexes" | WS64 | Kindle Developer's Corner | 1 | 12-17-2011 05:51 PM |
kindle 3 indexing question | kpfeifle | Amazon Kindle | 2 | 09-06-2010 12:07 AM |
Question about indexing | Dragoro | Amazon Kindle | 4 | 02-25-2009 03:39 PM |