Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

View Poll Results: Do you want sorting as described in the first post?
Yes 5 23.81%
No 6 28.57%
Don't care 10 47.62%
Voters: 21. You may not vote on this poll

Reply
 
Thread Tools Search this Thread
Old 12-03-2010, 10:29 AM   #16
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
It's not obvious, but sorting of author and title is almost completely independent of what's in the author and title field. There are actually three sort fields: two author sort fields (one that sorts authors and one that sorts books by author) and one title sort field (for sorting books by title). They can be completely different from whatever is in the corresponding author/title fields.

By default, the sort fields are hidden. I suspect one could use Search and Replace to modify them by changing accented chars to whatever is desired to control sort order. (I'm not sure if S&R has access to the title sort field, but I imagine Charles would add it on request if it doesn't.)

Last edited by Starson17; 12-03-2010 at 10:33 AM.
Starson17 is offline   Reply With Quote
Old 12-03-2010, 10:36 AM   #17
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Coleccionista View Post
Given that I don't have a clue of calibre code and python in general I feel free to make some bold assumptions.
And challenge some of ours.
Quote:
If the grid where calibre is displaying the data is actually the result of a database query (some view query) whe should be able to skip any calibre change and use database functions for collation
For various reasons, calibre sorts in memory. The biggest one relates to sort stability. Sorting on column A then later B will have items ordered by A within B. There are also some issues with null and special values to overcome.

The python code to handle sorting is under the spoiler. Most of the work to generate a value is done in the SortKeyGenerator class. Ignoring the python magic involving iterators and multiple fields, you will see how various types are handled differently. Series and dates are good examples.

The multisort function actually does the work. In particular, the call to sort() does the sorting (imagine that). It uses the sort key generator to build a sort key per record, a process that is done once when the record is first touched. The python sort routine compares these keys using a strict value-based collation.

So, to do collated sorts, the strings must be changed so that strict value-based collation generates the right result. Therefore the sort key generator would need to do character equivalence mapping on the strings when generating the key. That would happen inside the 'dt in ['text' ...]' if block. The trick is to do the scanning and mapping in a way that is sufficiently performant, but sufficiently customizable. I could imagine using a translate table, but I haven't thought much about it.

Spoiler:
Code:
    .....
    # Sorting functions {{{

    def sanitize_sort_field_name(self, field):
        field = self.field_metadata.search_term_to_field_key(field.lower().strip())
        # translate some fields to their hidden equivalent
        if field == 'title': field = 'sort'
        elif field == 'authors': field = 'author_sort'
        return field

    def sort(self, field, ascending, subsort=False):
        self.multisort([(field, ascending)])

    def multisort(self, fields=[], subsort=False):
        fields = [(self.sanitize_sort_field_name(x), bool(y)) for x, y in fields]
        keys = self.field_metadata.sortable_field_keys()
        fields = [x for x in fields if x[0] in keys]
        if subsort and 'sort' not in [x[0] for x in fields]:
            fields += [('sort', True)]
        if not fields:
            fields = [('timestamp', False)]

        keyg = SortKeyGenerator(fields, self.field_metadata, self._data)
        if len(fields) == 1:
            self._map.sort(key=keyg, reverse=not fields[0][1])
        else:
            self._map.sort(key=keyg)

        tmap = list(itertools.repeat(False, len(self._data)))
        for x in self._map_filtered:
            tmap[x] = True
        self._map_filtered = [x for x in self._map if tmap[x]]


class SortKey(object):

    def __init__(self, orders, values):
        self.orders, self.values = orders, values

    def __cmp__(self, other):
        for i, ascending in enumerate(self.orders):
            ans = cmp(self.values[i], other.values[i])
            if ans != 0:
                return ans * ascending
        return 0

class SortKeyGenerator(object):

    def __init__(self, fields, field_metadata, data):
        self.field_metadata = field_metadata
        self.orders = [-1 if x[1] else 1 for x in fields]
        self.entries = [(x[0], field_metadata[x[0]]) for x in fields]
        self.library_order = tweaks['title_series_sorting'] == 'library_order'
        self.data = data

    def __call__(self, record):
        values = tuple(self.itervals(self.data[record]))
        if len(values) == 1:
            return values[0]
        return SortKey(self.orders, values)

    def itervals(self, record):
        for name, fm in self.entries:
            dt = fm['datatype']
            val = record[fm['rec_index']]

            if dt == 'datetime':
                if val is None:
                    val = UNDEFINED_DATE

            elif dt == 'series':
                if val is None:
                    val = ('', 1)
                else:
                    val = val.lower()
                    if self.library_order:
                        val = title_sort(val)
                    sidx_fm = self.field_metadata[name + '_index']
                    sidx = record[sidx_fm['rec_index']]
                    val = (val, sidx)

            elif dt in ('text', 'comments', 'composite', 'enumeration'):
                if val is None:
                    val = ''
                val = val.lower()

            elif dt == 'bool':
                val = {True: 1, False: 2, None: 3}.get(val, 3)

            yield val
chaley is offline   Reply With Quote
Advert
Old 12-03-2010, 11:11 AM   #18
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This is fairly simple to do. What you have to implement is a python C extension that defines a cmp function that compares two unicode objects. Given that integrating it into calibre would be trivial.

In psedo-code the thing would look like:

Code:
function set_collation(lang_code) {
...
}

function cmp(a, b) {
return a - b; //Where a -b means return -1 if a < b, +i if a > b and 0 if a==b
}
It should be possible to link the extension against ICU and use that to implement cmp, though that would require that ICU be compilable on win/linux/osx.
kovidgoyal is online now   Reply With Quote
Old 12-03-2010, 11:16 AM   #19
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
@Kovid: I am working on a tweak (almost done) that will permit user-specified character translation. The tweak would be used in sort key gen.

My test translating 'é' to 'e' and ç' to 'c' works fine, and there is almost zero performance cost.

I will submit the tweak shortly.
chaley is offline   Reply With Quote
Old 12-06-2010, 01:16 AM   #20
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
@Everyone: Kovid integrated the relevant parts of the IBM ICU (International Components for Unicode) into calibre, after which I modified sorting and case conversion functions to use them. Sorting now works properly.

By default, calibre uses the locale specified by its language. A tweak has been provided to override that locale with a different one. See under the spoiler for the tweak documentation.

I tested the example provided by Man Eating Duck, sorting names beginning with a and å. With the locale set to 'en', the letters are equivalent. With the locale set to 'nb' (norway), they å sorts after z.

Tweak documentation:
Spoiler:
Code:
# Language to use when sorting. Setting this tweak will force sorting to use the
# collating order for the specified language. This might be useful if you run
# calibre in English but want sorting to work in the language where you live.
# Set the tweak to the desired ISO 639-1 language code, in lower case.
# You can find the list of supported locales at
# http://publib.boulder.ibm.com/infocenter/iseries/v5r3/topic/nls/rbagsicusortsequencetables.htm
# Default: locale_for_sorting = '' -- use the language calibre displays in
# Example: locale_for_sorting = 'fr' -- sort using French rules.
# Example: locale_for_sorting = 'nb' -- sort using Norwegian rules.
locale_for_sorting =  ''
chaley is offline   Reply With Quote
Advert
Old 12-11-2010, 07:14 AM   #21
Coleccionista
Connoisseur
Coleccionista began at the beginning.
 
Posts: 67
Karma: 40
Join Date: Aug 2010
Device: iPad, Kindle Paperwhite
Although I'll have to wait until Monday to upgrade and check the new feature on my books I couldn't wait to say that I am thankful , it's amazing how fast calibre devs keeps adding features and fixing things.
Coleccionista is offline   Reply With Quote
Reply

Tags
accent, sorting


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Accented characters on PRS-505 gandalfbp Calibre 4 04-19-2010 07:48 AM
PRS-600 any way to type spanish accented characters? arielinflux Sony Reader 1 03-17-2010 04:22 AM
Foreign accented characters and libprs500 Stingo Calibre 6 02-24-2008 07:51 PM
PRS-500 Accented characters onto reader using Mac squiggle8 Sony Reader Dev Corner 9 12-06-2007 04:01 PM
Accented characters bingle Sony Reader 7 07-25-2007 06:36 AM


All times are GMT -4. The time now is 10:12 AM.


MobileRead.com is a privately owned, operated and funded community.