MobileRead Forums - View Single Post

chaley · 12-03-2010, 10:36 AM

Quote:

Originally Posted by Coleccionista

Given that I don't have a clue of calibre code and python in general I feel free to make some bold assumptions.

And challenge some of ours.

Quote:

If the grid where calibre is displaying the data is actually the result of a database query (some view query) whe should be able to skip any calibre change and use database functions for collation

For various reasons, calibre sorts in memory. The biggest one relates to sort stability. Sorting on column A then later B will have items ordered by A within B. There are also some issues with null and special values to overcome.

The python code to handle sorting is under the spoiler. Most of the work to generate a value is done in the SortKeyGenerator class. Ignoring the python magic involving iterators and multiple fields, you will see how various types are handled differently. Series and dates are good examples.

The multisort function actually does the work. In particular, the call to sort() does the sorting (imagine that). It uses the sort key generator to build a sort key per record, a process that is done once when the record is first touched. The python sort routine compares these keys using a strict value-based collation.

So, to do collated sorts, the strings must be changed so that strict value-based collation generates the right result. Therefore the sort key generator would need to do character equivalence mapping on the strings when generating the key. That would happen inside the 'dt in ['text' ...]' if block. The trick is to do the scanning and mapping in a way that is sufficiently performant, but sufficiently customizable. I could imagine using a translate table, but I haven't thought much about it.

Spoiler:

Code:

    .....
    # Sorting functions {{{

    def sanitize_sort_field_name(self, field):
        field = self.field_metadata.search_term_to_field_key(field.lower().strip())
        # translate some fields to their hidden equivalent
        if field == 'title': field = 'sort'
        elif field == 'authors': field = 'author_sort'
        return field

    def sort(self, field, ascending, subsort=False):
        self.multisort([(field, ascending)])

    def multisort(self, fields=[], subsort=False):
        fields = [(self.sanitize_sort_field_name(x), bool(y)) for x, y in fields]
        keys = self.field_metadata.sortable_field_keys()
        fields = [x for x in fields if x[0] in keys]
        if subsort and 'sort' not in [x[0] for x in fields]:
            fields += [('sort', True)]
        if not fields:
            fields = [('timestamp', False)]

        keyg = SortKeyGenerator(fields, self.field_metadata, self._data)
        if len(fields) == 1:
            self._map.sort(key=keyg, reverse=not fields[0][1])
        else:
            self._map.sort(key=keyg)

        tmap = list(itertools.repeat(False, len(self._data)))
        for x in self._map_filtered:
            tmap[x] = True
        self._map_filtered = [x for x in self._map if tmap[x]]


class SortKey(object):

    def __init__(self, orders, values):
        self.orders, self.values = orders, values

    def __cmp__(self, other):
        for i, ascending in enumerate(self.orders):
            ans = cmp(self.values[i], other.values[i])
            if ans != 0:
                return ans * ascending
        return 0

class SortKeyGenerator(object):

    def __init__(self, fields, field_metadata, data):
        self.field_metadata = field_metadata
        self.orders = [-1 if x[1] else 1 for x in fields]
        self.entries = [(x[0], field_metadata[x[0]]) for x in fields]
        self.library_order = tweaks['title_series_sorting'] == 'library_order'
        self.data = data

    def __call__(self, record):
        values = tuple(self.itervals(self.data[record]))
        if len(values) == 1:
            return values[0]
        return SortKey(self.orders, values)

    def itervals(self, record):
        for name, fm in self.entries:
            dt = fm['datatype']
            val = record[fm['rec_index']]

            if dt == 'datetime':
                if val is None:
                    val = UNDEFINED_DATE

            elif dt == 'series':
                if val is None:
                    val = ('', 1)
                else:
                    val = val.lower()
                    if self.library_order:
                        val = title_sort(val)
                    sidx_fm = self.field_metadata[name + '_index']
                    sidx = record[sidx_fm['rec_index']]
                    val = (val, sidx)

            elif dt in ('text', 'comments', 'composite', 'enumeration'):
                if val is None:
                    val = ''
                val = val.lower()

            elif dt == 'bool':
                val = {True: 1, False: 2, None: 3}.get(val, 3)

            yield val