Quote:
Originally Posted by Coleccionista
Given that I don't have a clue of calibre code and python in general I feel free to make some bold assumptions.
|
And challenge some of ours.
Quote:
If the grid where calibre is displaying the data is actually the result of a database query (some view query) whe should be able to skip any calibre change and use database functions for collation
|
For various reasons, calibre sorts in memory. The biggest one relates to sort stability. Sorting on column A then later B will have items ordered by A within B. There are also some issues with null and special values to overcome.
The python code to handle sorting is under the spoiler. Most of the work to generate a value is done in the SortKeyGenerator class. Ignoring the python magic involving iterators and multiple fields, you will see how various types are handled differently. Series and dates are good examples.
The multisort function actually does the work. In particular, the call to sort() does the sorting (imagine that). It uses the sort key generator to build a sort key per record, a process that is done once when the record is first touched. The python sort routine compares these keys using a strict value-based collation.
So, to do collated sorts, the strings must be changed so that strict value-based collation generates the right result. Therefore the sort key generator would need to do character equivalence mapping on the strings when generating the key. That would happen inside the 'dt in ['text' ...]' if block. The trick is to do the scanning and mapping in a way that is sufficiently performant, but sufficiently customizable. I could imagine using a translate table, but I haven't thought much about it.
Spoiler:
Code:
.....
# Sorting functions {{{
def sanitize_sort_field_name(self, field):
field = self.field_metadata.search_term_to_field_key(field.lower().strip())
# translate some fields to their hidden equivalent
if field == 'title': field = 'sort'
elif field == 'authors': field = 'author_sort'
return field
def sort(self, field, ascending, subsort=False):
self.multisort([(field, ascending)])
def multisort(self, fields=[], subsort=False):
fields = [(self.sanitize_sort_field_name(x), bool(y)) for x, y in fields]
keys = self.field_metadata.sortable_field_keys()
fields = [x for x in fields if x[0] in keys]
if subsort and 'sort' not in [x[0] for x in fields]:
fields += [('sort', True)]
if not fields:
fields = [('timestamp', False)]
keyg = SortKeyGenerator(fields, self.field_metadata, self._data)
if len(fields) == 1:
self._map.sort(key=keyg, reverse=not fields[0][1])
else:
self._map.sort(key=keyg)
tmap = list(itertools.repeat(False, len(self._data)))
for x in self._map_filtered:
tmap[x] = True
self._map_filtered = [x for x in self._map if tmap[x]]
class SortKey(object):
def __init__(self, orders, values):
self.orders, self.values = orders, values
def __cmp__(self, other):
for i, ascending in enumerate(self.orders):
ans = cmp(self.values[i], other.values[i])
if ans != 0:
return ans * ascending
return 0
class SortKeyGenerator(object):
def __init__(self, fields, field_metadata, data):
self.field_metadata = field_metadata
self.orders = [-1 if x[1] else 1 for x in fields]
self.entries = [(x[0], field_metadata[x[0]]) for x in fields]
self.library_order = tweaks['title_series_sorting'] == 'library_order'
self.data = data
def __call__(self, record):
values = tuple(self.itervals(self.data[record]))
if len(values) == 1:
return values[0]
return SortKey(self.orders, values)
def itervals(self, record):
for name, fm in self.entries:
dt = fm['datatype']
val = record[fm['rec_index']]
if dt == 'datetime':
if val is None:
val = UNDEFINED_DATE
elif dt == 'series':
if val is None:
val = ('', 1)
else:
val = val.lower()
if self.library_order:
val = title_sort(val)
sidx_fm = self.field_metadata[name + '_index']
sidx = record[sidx_fm['rec_index']]
val = (val, sidx)
elif dt in ('text', 'comments', 'composite', 'enumeration'):
if val is None:
val = ''
val = val.lower()
elif dt == 'bool':
val = {True: 1, False: 2, None: 3}.get(val, 3)
yield val