Quote:
Originally Posted by BeccaPrice
Where do special characters come in? for example, I've got "-And he built a crooked house" that sorts first, because of the -.
it's not classic ASCII sorting, because lower case and upper case letters are treated the same.
also, my dates are European style (dd/mm/yyyy) - is there a tweak to put them American style (mm/dd/yyyy)?
|
Abandon hope all ye who enter here...
The problem is that people want the data to sort in the order that they know is correct, where "correct" depends on what the data represents. This is a perfectly reasonable point of view, but one that drives developers batty. For example, consider the three following strings that happen to be addresses:
28 Aardvark Lane
1 Zebra's Run
33 Mystreet
Most people (but not all) would want these sorted by house number within street name, as in
28 Aardvark Lane
33 Mystreet
1 Zebra's Run
To make things more interesting, the postal delivery person would want them sorted by the distance between them. Unfortunately, the computer does not know these are addresses and in general doesn't know an address from a hole in the wall, so the computer does neither. Instead it sorts them in the order indicated by the characters.
1 Zebra's Run
28 Aardvark Lane
33 Mystreet
which annoys almost everyone.
The most common case arises when the user considers the items to be numbers but computer thinks that the items are text (like dictionary entries). Consider the values
2
1
19
In dictionary order, you get
1
19
2
but in numeric order you get
1
2
19
When considering "the order indicated by the characters", don't forget the "national language rules" that specify the correct order of letters. For example, the letter "ä" sorts near the letter "a" in German but after the letter "z" in Swedish. In addition, we run into "Natural Order" (sometimes called "dictionary order"), which amongst other things combines lower- and upper-case together instead of putting "a" after "Z" (or vice versa).
In the end, the person must tell the computer (somehow) what the data represents. The common cases are numbers, text, dates, and currencies. Calibre supports the first 3. It uses the
International Components for Unicode (ICU) for text, which ignores case when comparing. In addition, calibre uses some special rules for what to do with "no value given". A less common case supported by calibre is Yes/No (Boolean) values.
Dates add their own complexity. They are a bit like addresses in that the components of a date have specific meanings. Unfortunately (again), it isn't usually obvious which component is what part of the date. Consider the date 10/12/2005. In the US, this is October 12, 2005. In Europe (or at least with the places I am familiar with) this is 10 December 2005. Clearly they must sort differently, but how does calibre know what to do? Calibre uses the current "locale" to guess at the order of fields by converting a date like "4 May 2006" to a slash-separated form, such as 4/5/2006. The underlying converter looks at the settings in the computer for dates and builds the "right" result, which tells calibre where the month, day, and year parts are in a normal date.
You might enjoy reading
http://www.codinghorror.com/blog/200...ort-order.html, and especially the comments.
In general, special characters appear in sorted order where they appear in the older
ASCII tables, but this is not always true. The order can be changed by the "
code page" that the computer has been told to use or by the "standard order" for that language that in calibre is specified by the ICU package.
As for your last question: there are date format tweaks (Control how dates are displayed) that can be used to put the components of a date where you want them. Every custom date column has its own format.