05-08-2014, 01:26 PM | #1 |
Enthusiast
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
|
Author sort algorithm and accented characters
I was modifying the Author sort tweak to include some additional copy words and came across a problem when adding "Académie". I was able to add the word to the copy list okay, but after restarting calibre I noticed that the author sort values for the appropriate entries in my library weren't being calculated correctly when I recalculated all author sort values. Opening up the tweak, I see that the word had been mangled to: "Acad\xc3\xa9mie"
Further testing showed that other accented characters behaved similarly. Further, using unaccented characters in the tweak doesn't work either ("Academie" stays the same in the tweak, but doesn't appear to match "Académie" in the author name and so it isn't recognized as a copy word). Any idea on how to add words with accented characters to the copy words list (and the other lists to, while we're at it)? |
05-08-2014, 10:23 PM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Works for me. Is your system not utf-8? In any case, just use the escaped form for encoded chars, to be absolutely safe. So academie would become
u'Acad\xe9mie' or if you want to use unicode code points (whicha re easier to look up) u'Acad\u00e9mie' |
Advert | |
|
05-08-2014, 10:30 PM | #3 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
And note that "Acad\xc3\xa9mie" is the utf-8 encoded form.
|
05-09-2014, 10:37 AM | #4 |
Enthusiast
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
|
Before this problem, I would have said yes. This makes me rethink things. I'll have to look into it further.
So, changing to the unicode escaped character works for that word, but I've run into a problem with another word. I tried adding "père" to the suffixes in the same manner (i.e. as u'p\u00e8re' which calibre shows as u'p\xe8re' after restart) and the suffix is recognized, but when the author sort value is calculated it gets transformed into 'pére' (note the accent has changed from grave to acute). Example: Alexandre Dumas, père -> Dumas, Alexandre, pére That's clearly not the correct behavior. Am I still doing something wrong or is this a bug? |
05-09-2014, 10:57 AM | #5 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I cannot replicate that either. Adding père to the list of author name suffixes and clicking the calculate author sort button in the edit metadata dialog gives
Alexandre Dumas père -> Dumas, Alexandre père and Alexandre Dumas, père -> Alexandre Dumas, père |
Advert | |
|
05-09-2014, 11:18 AM | #6 |
Enthusiast
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
|
Other issues with the algorithm:
I have the author "Libreria Editrice Vaticana" which I'd like to trip the copy mechanism rather than the inversion one. I tried adding "Editrice" to the copy word list, but that doesn't seem to be working. I know the word is in the list validly because an author value of "Libreria Editrice" works correctly, but not the three word combo. I get similar behavior when adding "Libreria" or "Vaticana" to the list, a two word name which includes the term in the list works, but not a three word name. Sometimes a suffix that is preceded by a comma is not recognized as a suffix, despite being in the list. This seems to only be happening with long suffixes. I.e. "John Smith, Jr" gets changed to "Smith, John, Jr" but "John Smith, Junior" stays the same ("Junior" is treated as a first name after a comma rather than a suffix). Furthermore "John Smith Junior" gets changed to "Smith, John Junior" ("Junior" is treated, correctly, as a suffix) Is there a way to mess with this behavior? Finally, is there a way to exclude certain words from being placed in the author sort field automatically? For instance, I have some books which were edited by Eric Flint and some books which were written by him. I distinguish this in the Author field by doing something like "edited by Eric Flint" or "Eric Flint editor" (I'd like a comma there, but that's running into the suffix error above). However, I'd like the author sort value to simply be "Flint, Eric" so that all works by him, whether edited or authored are sorted the same. Right now I have to do this manually, is there a way to do it automatically? |
05-09-2014, 11:29 AM | #7 |
Enthusiast
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
|
It appears that using parentheses works for the last one. I.e. "Eric Flint (editor)" becomes "Flint, Eric" as I want it to.
|
05-09-2014, 11:38 AM | #8 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You're making this way too complex. If you want to apply special values for author sort for certain author names, simply click the author name in the tag browser and use manage authors to manually specify the author sort value. Then calibre will always use that value when it encounters that author in the future.
|
05-09-2014, 04:11 PM | #9 |
Enthusiast
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
|
I realize that I can use a manual override for these things. It's what I have been doing. I'm just trying to make sure that I'm leveraging all of calibre's capabilities and using said manual override as little as possible.
After some further playing, I've been able to resolve the issue with the "Libreria Editrice Vaticana". I'm not sure what I've done differently, but it is working as expected now. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Author Sort Name Algorithm and multiple libraries | texasnightowl | Calibre | 11 | 06-14-2012 10:34 PM |
\b matches accented characters | ElMiko | Sigil | 11 | 06-14-2012 12:50 PM |
Sorting with accented characters | chaley | Calibre | 20 | 12-11-2010 07:14 AM |
Accented characters on PRS-505 | gandalfbp | Calibre | 4 | 04-19-2010 07:48 AM |
Accented characters | bingle | Sony Reader | 7 | 07-25-2007 06:36 AM |