Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 05-08-2014, 01:26 PM   #1
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Author sort algorithm and accented characters

I was modifying the Author sort tweak to include some additional copy words and came across a problem when adding "Académie". I was able to add the word to the copy list okay, but after restarting calibre I noticed that the author sort values for the appropriate entries in my library weren't being calculated correctly when I recalculated all author sort values. Opening up the tweak, I see that the word had been mangled to: "Acad\xc3\xa9mie"

Further testing showed that other accented characters behaved similarly. Further, using unaccented characters in the tweak doesn't work either ("Academie" stays the same in the tweak, but doesn't appear to match "Académie" in the author name and so it isn't recognized as a copy word).

Any idea on how to add words with accented characters to the copy words list (and the other lists to, while we're at it)?
rpspringuel is offline   Reply With Quote
Old 05-08-2014, 10:23 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Works for me. Is your system not utf-8? In any case, just use the escaped form for encoded chars, to be absolutely safe. So academie would become

u'Acad\xe9mie'

or if you want to use unicode code points (whicha re easier to look up)


u'Acad\u00e9mie'
kovidgoyal is offline   Reply With Quote
Advert
Old 05-08-2014, 10:30 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
And note that "Acad\xc3\xa9mie" is the utf-8 encoded form.
kovidgoyal is offline   Reply With Quote
Old 05-09-2014, 10:37 AM   #4
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Quote:
Originally Posted by kovidgoyal View Post
Works for me. Is your system not utf-8?
Before this problem, I would have said yes. This makes me rethink things. I'll have to look into it further.

So, changing to the unicode escaped character works for that word, but I've run into a problem with another word.

I tried adding "père" to the suffixes in the same manner (i.e. as u'p\u00e8re' which calibre shows as u'p\xe8re' after restart) and the suffix is recognized, but when the author sort value is calculated it gets transformed into 'pére' (note the accent has changed from grave to acute).

Example:

Alexandre Dumas, père -> Dumas, Alexandre, pére

That's clearly not the correct behavior. Am I still doing something wrong or is this a bug?
rpspringuel is offline   Reply With Quote
Old 05-09-2014, 10:57 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I cannot replicate that either. Adding père to the list of author name suffixes and clicking the calculate author sort button in the edit metadata dialog gives

Alexandre Dumas père -> Dumas, Alexandre père

and

Alexandre Dumas, père -> Alexandre Dumas, père
kovidgoyal is offline   Reply With Quote
Advert
Old 05-09-2014, 11:18 AM   #6
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Other issues with the algorithm:

I have the author "Libreria Editrice Vaticana" which I'd like to trip the copy mechanism rather than the inversion one. I tried adding "Editrice" to the copy word list, but that doesn't seem to be working. I know the word is in the list validly because an author value of "Libreria Editrice" works correctly, but not the three word combo. I get similar behavior when adding "Libreria" or "Vaticana" to the list, a two word name which includes the term in the list works, but not a three word name.

Sometimes a suffix that is preceded by a comma is not recognized as a suffix, despite being in the list. This seems to only be happening with long suffixes. I.e. "John Smith, Jr" gets changed to "Smith, John, Jr" but "John Smith, Junior" stays the same ("Junior" is treated as a first name after a comma rather than a suffix). Furthermore "John Smith Junior" gets changed to "Smith, John Junior" ("Junior" is treated, correctly, as a suffix) Is there a way to mess with this behavior?

Finally, is there a way to exclude certain words from being placed in the author sort field automatically? For instance, I have some books which were edited by Eric Flint and some books which were written by him. I distinguish this in the Author field by doing something like "edited by Eric Flint" or "Eric Flint editor" (I'd like a comma there, but that's running into the suffix error above). However, I'd like the author sort value to simply be "Flint, Eric" so that all works by him, whether edited or authored are sorted the same. Right now I have to do this manually, is there a way to do it automatically?
rpspringuel is offline   Reply With Quote
Old 05-09-2014, 11:29 AM   #7
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
It appears that using parentheses works for the last one. I.e. "Eric Flint (editor)" becomes "Flint, Eric" as I want it to.
rpspringuel is offline   Reply With Quote
Old 05-09-2014, 11:38 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You're making this way too complex. If you want to apply special values for author sort for certain author names, simply click the author name in the tag browser and use manage authors to manually specify the author sort value. Then calibre will always use that value when it encounters that author in the future.
kovidgoyal is offline   Reply With Quote
Old 05-09-2014, 04:11 PM   #9
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
I realize that I can use a manual override for these things. It's what I have been doing. I'm just trying to make sure that I'm leveraging all of calibre's capabilities and using said manual override as little as possible.

After some further playing, I've been able to resolve the issue with the "Libreria Editrice Vaticana". I'm not sure what I've done differently, but it is working as expected now.
rpspringuel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Author Sort Name Algorithm and multiple libraries texasnightowl Calibre 11 06-14-2012 10:34 PM
\b matches accented characters ElMiko Sigil 11 06-14-2012 12:50 PM
Sorting with accented characters chaley Calibre 20 12-11-2010 07:14 AM
Accented characters on PRS-505 gandalfbp Calibre 4 04-19-2010 07:48 AM
Accented characters bingle Sony Reader 7 07-25-2007 06:36 AM


All times are GMT -4. The time now is 04:10 PM.


MobileRead.com is a privately owned, operated and funded community.