Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

View Poll Results: Do you want sorting as described in the first post?
Yes 5 23.81%
No 6 28.57%
Don't care 10 47.62%
Voters: 21. You may not vote on this poll

Reply
 
Thread Tools Search this Thread
Old 10-03-2010, 09:07 AM   #1
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,849
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Sorting with accented characters

Mixx has suggested in this post that calibre should sort accented characters as equivalent to their non-accented ASCII version.

What the suggestion means is that e, Ŕ, and Ú would be sorted as if they are exactly the same letter. There is no guarantee that one would come in front of the other. The same is true for c and š, s and ▀, A and ┼, etc.

I have no idea what it means for non-latin characters such as Greek, Cyrillic or Chinese, but my guess is that they would sort using the letters that are used when creating file system names. It should (at least) be consistent.

Do you, the calibre users, want this?
chaley is offline   Reply With Quote
Old 10-03-2010, 09:19 AM   #2
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I'd like to elaborate beyond just voting: Personally, I don't care either way, as I most often use search to find the book(s) I want. But, as I see it, this would make sorting more consistent for international users (for example I personally feel that "÷" should follow "o", or, at least be sorted in the same area), and that, in my eyes would be a Good Thing. One modifier applies, though: If it's worth the trouble for you developers. If this means you have to rip apart everything and rewrite most of it, my vote would change to no.
Another comment regarding sorting that goes slightly off-topic: Personally, I'd much more like to see an option to include non- english articles in the list that Calibre ignores when sorting, like the german "der, die, das". Optimally, this would be implemented as a tweak, pre-filled with the most common articles, say english, french, spanish and german. For this, the same modifier as above applies.
Manichean is offline   Reply With Quote
Advert
Old 10-03-2010, 09:47 AM   #3
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by chaley View Post
[...] A and ┼, etc. [...]
I'm all for new features, but please make this one optional, or at least with selectable collation algorithms as you can do in mySQL.

In Norwegian, common collation of o° and aň is wildly incorrect, in fact 'a' and 'ň' resides respectively at the beginning and end of our alfabet

For names you might also consider language-specific collation. For instance "van Eeden, Frederik Willem" should be collated under E.

I presume you would use some sort of library for this?

Last edited by Man Eating Duck; 10-03-2010 at 09:50 AM.
Man Eating Duck is offline   Reply With Quote
Old 10-03-2010, 09:50 AM   #4
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,849
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Man Eating Duck View Post
I'm all for new features, but please make this one optional, or at least with selectable collation algorithms as you can do in mySQL.

In Norwegian, common collation of o° and aň is wildly incorrect, in fact 'a' and 'ň' resides respectively at the beginning and end of our alfabet

For names you might also consider language-specific collation. For instance "van Eeden, Frederik Willem" should be collated under E.
It won't be optional. It is, or it isn't. Otherwise the development isn't worth the trouble.
Quote:
I presume you would use a library for this, which one?
No library. Uses unicode codepoint groupings, the same ones that are currently used for cleaning file names.

I have looked at using unicode sorting libraries. The amount of work to do so is enormous, and I have no desire to do it.

Edit: for names, that is under your control. Enter the sort string using the 'manage authors' function available by right-clicking on any author in the left-hand tag pane.

Last edited by chaley; 10-03-2010 at 09:52 AM.
chaley is offline   Reply With Quote
Old 10-03-2010, 10:00 AM   #5
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by chaley View Post
It won't be optional. It is, or it isn't. Otherwise the development isn't worth the trouble.
Ok, in that case I'll vote against. This might seem negative, but at least for Scandinavian users it is just plain wrong
Man Eating Duck is offline   Reply With Quote
Advert
Old 10-03-2010, 10:03 AM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,849
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Man Eating Duck View Post
This might seem negative, but at least for Scandinavian users it is just plain wrong
Not in the least negative. I am not at all sure about changing from one (well-understood) wrong thing to some other (not understood) wrong thing.
chaley is offline   Reply With Quote
Old 10-03-2010, 03:01 PM   #7
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,849
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Manichean View Post
Another comment regarding sorting that goes slightly off-topic: Personally, I'd much more like to see an option to include non- english articles in the list that Calibre ignores when sorting, like the german "der, die, das". Optimally, this would be implemented as a tweak, pre-filled with the most common articles, say english, french, spanish and german. For this, the same modifier as above applies.
Tweak will be in the next release.

It is 'pre-filled' as today, with A, An, The. You can add what you want, or turn it off completely.
chaley is offline   Reply With Quote
Old 10-03-2010, 03:54 PM   #8
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by chaley View Post
Tweak will be in the next release.

It is 'pre-filled' as today, with A, An, The. You can add what you want, or turn it off completely.
Wow, great, thank you!
Manichean is offline   Reply With Quote
Old 10-03-2010, 11:16 PM   #9
Scott Nielsen
Groupie
Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!
 
Posts: 155
Karma: 112134
Join Date: May 2009
Location: Kuala Lumpur
Device: iPad, K3, K4, T1
Quote:
Originally Posted by Man Eating Duck View Post
Ok, in that case I'll vote against. This might seem negative, but at least for Scandinavian users it is just plain wrong
Agree - it's just plain wrong. Another vote against, here.
Scott Nielsen is offline   Reply With Quote
Old 12-03-2010, 07:38 AM   #10
Coleccionista
Connoisseur
Coleccionista began at the beginning.
 
Posts: 67
Karma: 40
Join Date: Aug 2010
Device: iPad, Kindle Paperwhite
The poll is closed but my vote should have been a resounding yes for the very minimum, common sense implementation:

Code:
a = ß = Ó
e = Ú = Ŕ
i = Ý = ý
o = ˛ = ˇ
u = ˙ =¨
I cannot find any reason why the current sorting:

Quote:
Carlos Fuentes
Cornelia Funke
Alan Furst
Francis FŔvre
should be prefered to the quite more sensible:

Quote:
Francis FŔvre
Carlos Fuentes
Cornelia Funke
Alan Furst
I understand that there may be some difficulties with other characters and languages but for these vowels it looks a sensible change
Coleccionista is offline   Reply With Quote
Old 12-03-2010, 07:45 AM   #11
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Note that you can already control the sorting of Authors (where I expect it is most important) by setting the Author-sort field to only use the non-accented versions of characters.
itimpi is offline   Reply With Quote
Old 12-03-2010, 08:23 AM   #12
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,849
Karma: 7035877
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Coleccionista View Post
The poll is closed but my vote should have been a resounding yes for the very minimum, common sense implementation:
...
I understand that there may be some difficulties with other characters and languages but for these vowels it looks a sensible change
This is a non-issue for me, but it seems that you could be the needed 'someone who cares'. I invite you to do the work and submit it. I am sure that Kovid would entertain such a patch.
chaley is offline   Reply With Quote
Old 12-03-2010, 09:49 AM   #13
Coleccionista
Connoisseur
Coleccionista began at the beginning.
 
Posts: 67
Karma: 40
Join Date: Aug 2010
Device: iPad, Kindle Paperwhite
Quote:
Originally Posted by chaley View Post
This is a non-issue for me, but it seems that you could be the needed 'someone who cares'. I invite you to do the work and submit it. I am sure that Kovid would entertain such a patch.
Given that I don't have a clue of calibre code and python in general I feel free to make some bold assumptions.

If the grid where calibre is displaying the data is actually the result of a database query (some view query) whe should be able to skip any calibre change and use database functions for collation

Would the following be applicable?

How to sort text in sqlite3 with specified locale?

So in the end we could use the calibre locale to build database queries like:

Code:
SELECT * FROM authors ORDER BY name COLLATE POLISH;
Coleccionista is offline   Reply With Quote
Old 12-03-2010, 10:06 AM   #14
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,086
Karma: 57259778
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Coleccionista View Post
So in the end we could use the calibre locale to build database queries like:

Code:
SELECT * FROM authors ORDER BY name COLLATE POLISH;
So we have a Multi-Lingual Library, What sort is the "Correct" one to use?

Your example works beautiful as long as the data is mono-lingual

Kovid and crew have no easy task making Calibre work across many regional differences
crew
theducks is offline   Reply With Quote
Old 12-03-2010, 10:27 AM   #15
Coleccionista
Connoisseur
Coleccionista began at the beginning.
 
Posts: 67
Karma: 40
Join Date: Aug 2010
Device: iPad, Kindle Paperwhite
The idea I tried to convey was that you have all locale queries created:

Example:
Code:
SELECT * FROM authors ORDER BY name COLLATE POLISH;
SELECT * FROM authors ORDER BY name COLLATE SPANISH;
SELECT * FROM authors ORDER BY name COLLATE FRENCH;
.............
When you install calibre you select a language so from them on the queries launched against the database would use the language collation that you are using in the interface. It seems quite logic.

To summarize, is this SQLite/database-based approach possible?

Code:
calibre_lang = German // Or English, or korean, or whatever
db_query = SELEC foo, bar, baz ... whatever COLLATE $calibre_lang
Coleccionista is offline   Reply With Quote
Reply

Tags
accent, sorting

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Accented characters on PRS-505 gandalfbp Calibre 4 04-19-2010 07:48 AM
PRS-600 any way to type spanish accented characters? arielinflux Sony Reader 1 03-17-2010 04:22 AM
Foreign accented characters and libprs500 Stingo Calibre 6 02-24-2008 07:51 PM
PRS-500 Accented characters onto reader using Mac squiggle8 Sony Reader Dev Corner 9 12-06-2007 04:01 PM
Accented characters bingle Sony Reader 7 07-25-2007 06:36 AM


All times are GMT -4. The time now is 09:03 PM.


MobileRead.com is a privately owned, operated and funded community.