Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-09-2010, 02:24 AM   #1
rkworthy
Junior Member
rkworthy began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2010
Device: Blackberry, Nook
Development Question: author_sort in books table?

I've just started working on calibre and I'm still feeling my way around the source, but I had a question.

Why does the db.author_sort() lookup the author sort in the books table rather than the authors table and authors link table? As far as I can tell the only places db.author_sort() is used are in the initialization of the metadata_single dialog and in db.get_metadata(), which also uses authors_with_sort_strings().

I'm probably missing something, but it looks like a one way street, if authors->author_sort is updated using set_sort_field_for_author then books is also updated with the new author_sort, but if books->author_sort is updated with set_author_sort then just books is updated.

I'm specifically working on the Ticket #847 and noticed that when I built author_sort using db.get_metadata().author_sort_map it came out differently than db.get_metadata().author_sort. After looking around I saw it was because they are drawing from two different tables, and in my particular case those tables had different data. For example my books table has "Oath of Swords" author_sort listed as Weber, David J. but looking up that book through book_authors_link and then to authors it shows as Weber, David.

I'm not really sure how they got out of sink, but it would make sense that if the author_sort column was necessary in both tables that there would only be one function for changing both tables rather than ones for updating one table or the other.

To summarize that wall of ramble.
  • Why is there author_sort in the books table when author is looked up through the link table?
  • Why are there functions for updating one table but not both?
  • Why does get_metadata() draw from both rather than just building author_sort from the author_sort_map?

I'm still learning developing non-trivial programs and would appreciate your thoughts.

Also is there a better place to post questions like this?
rkworthy is offline   Reply With Quote
Old 12-09-2010, 03:56 AM   #2
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I think that historically there was just the value on the books table, and the author_sort on the authors table is relatively recent. I believe that the reason both are kept is that at times one can want books to be sorted differently to the authors table - particularly in the case of multiple authors.
itimpi is offline   Reply With Quote
Advert
Old 12-09-2010, 04:38 AM   #3
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,738
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by rkworthy View Post
Why does the db.author_sort() lookup the author sort in the books table rather than the authors table and authors link table?
Because author_sort can be (and is) different from the concatenation of authors' sort strings.
Quote:
As far as I can tell the only places db.author_sort() is used are in the initialization of the metadata_single dialog and in db.get_metadata(), which also uses authors_with_sort_strings().
Most of the accesses of author_sort are done through the meta2 view. There is seldom a need to use db.author_sort.
Quote:
I'm probably missing something, but it looks like a one way street, if authors->author_sort is updated using set_sort_field_for_author then books is also updated with the new author_sort, but if books->author_sort is updated with set_author_sort then just books is updated.
Setting a book's author sort does not change the sort strings for the author(s), because there may be no discernible relation between the two. Nothing stops me from setting a book's author_sort to whatever I want.

The choice to update the books when the sort string for an author is changed was not made lightly, and it could be argued that it was wrong. In the end, I decided that if you are playing with an author's sort, then you *probably* want to change the author's books. No one has complained yet. It is arguably better to change only the books where the value of the author_sort equals the value of the previous author's sort, but I didn't do this because it hinders repair when the author sort values get wildly out of sync. It would also be slow and possibly mysterious.
Quote:
I'm specifically working on the Ticket #847 and noticed that when I built author_sort using db.get_metadata().author_sort_map it came out differently than db.get_metadata().author_sort. After looking around I saw it was because they are drawing from two different tables, and in my particular case those tables had different data. For example my books table has "Oath of Swords" author_sort listed as Weber, David J. but looking up that book through book_authors_link and then to authors it shows as Weber, David.
This is precisely the situation that must be accounted for. For whatever reason, accident or intentional, the book has a different value than the author.

As regards the enhancement request, the custom columns already do much of what the ticket is asking for, at least for fields with single values (although I seem to have forgotten series, which is probably a bug). Multi-value (is_multiple) fields are more complicated: what does equals mean, exactly? The authors field is even more complicated, because order matters, which isn't the case with tags.

It isn't at all clear to me what should be displayed for is_multiple fields. Should it display only if all are identical? Should it display the ones that are identical (which would open up some interesting UI issues)? For authors, should it respect order (I think yes)?
Quote:
I'm not really sure how they got out of sink, but it would make sense that if the author_sort column was necessary in both tables that there would only be one function for changing both tables rather than ones for updating one table or the other.
No. Remember that it is intentional that a user can set the author_sort for a book to be different from the 'natural' string, which is built from the authors' sort strings. There are reasons for people to do this, especially in edited volumes, multi-author books, when devices expect strange author values (although plugboards ameliorate this one), or simply personal preference.
Quote:
To summarize that wall of ramble.
[*]Why is there author_sort in the books table when author is looked up through the link table?
Because they contain different information. The book table author_sort is used all over the place through its inclusion in the meta2 view.

The link table question is different. All is_multiple or single-field tag-like information is normalized in the database, which is good practice and generally useful.
Quote:
[*]Why are there functions for updating one table but not both?
There are functions for updating both. If you are asking about the side effect of setting the sort field for an author, I discussed that above.
Quote:
[*]Why does get_metadata() draw from both rather than just building author_sort from the author_sort_map?
Because as noted above, they can be (and are) different.

Including the author_sort_map isn't strictly necessary because it doesn't appear in the user-visible metadata for a book. It is included for completeness, making it easier to compute on authors without needing to split the authors string and so that the OPF contains all the information for a book.

Final note: as itimpi said, there are some legacy aspects involved. I added the sort string for authors in the 0.7 releases (don't remember exactly when), something that permitted fixing some bugs but also made explicit the difference between the author_sort and authors' sort strings. If I were designing it today, I would need to reflect on whether the two values are strictly necessary. I think yes, for the same reason that there is title and title_sort. Second question: does author_sort need to be editable? Good question. I think yes, because users have all sorts of needs that we can't anticipate. That raises the title/title_sort functionality discrepancy; title_sort cannot be changed by the user. I have wondered about that decision, because I have had instances where I wanted to manually edit title_sort.
chaley is offline   Reply With Quote
Old 12-09-2010, 10:53 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@chaley: title_sort was actually introduced for performance reasons so that the title_sort function would not have to be called for each entry every time a sort on the title field was requested. That's why it isn't editable
kovidgoyal is offline   Reply With Quote
Old 12-09-2010, 11:22 AM   #5
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,738
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kovidgoyal View Post
@chaley: title_sort was actually introduced for performance reasons so that the title_sort function would not have to be called for each entry every time a sort on the title field was requested. That's why it isn't editable
I agree that it should be there. What I object to is not being able to change it (that trigger is very persistent). For example, it is rather hard to get the book 'A is for Alibi' to sort where it goes -- under 'A' instead of under 'is'.

Instead of the trigger, I would rather title_sort operate like author_sort, giving me a default value but letting me change it in edit metadata. I haven't proposed/built this up to now because it could be that I am just weird and no one else cares.
chaley is offline   Reply With Quote
Advert
Old 12-09-2010, 11:23 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I certainly dont, but I wont object to the change
kovidgoyal is offline   Reply With Quote
Old 12-09-2010, 11:25 AM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Thinking about it a little, why is the trigger a problem. Surely if the title is changed, the sort value also needs to be changed?
kovidgoyal is offline   Reply With Quote
Old 12-09-2010, 11:35 AM   #8
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,738
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by kovidgoyal View Post
Thinking about it a little, why is the trigger a problem. Surely if the title is changed, the sort value also needs to be changed?
I can work around the trigger, but it takes knowledge of how the DB does its writes. Also, because it is a books update trigger, it is invoked whenever any field is written, even if the title hasn't changed.

I am not up enough on triggers to know if one can easily check the old and new values of title and not do the update of title_sort if the title hasn't changed.

Then, even if the trigger is changed, one would need to be careful with transaction order. For example, assume that I can change title_sort in edit metadata. Assume further that I change the title and the title sort. I want my title sort to win, so I would need to ensure that the write of the title happened before the write of title sort, so that the trigger doesn't overwrite my change.

I think it would probably be easier to add a parameter to set_title, specifying the title_sort to use. If it is None, then generate a new one. If not, then use it. However, I haven't thought a lot about it up to now.
chaley is offline   Reply With Quote
Old 12-09-2010, 01:24 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC in a sqlite update trigger you have access to OLD.title and NEW.title
kovidgoyal is offline   Reply With Quote
Old 12-09-2010, 09:43 PM   #10
rkworthy
Junior Member
rkworthy began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2010
Device: Blackberry, Nook
Quote:
Originally Posted by chaley View Post
Because author_sort can be (and is) different from the concatenation of authors' sort strings.
Ok that makes sense I was looking at it from the perspective of duplicated data, and hadn't connected the fact that though there were many forms of author_sort for an author in Books there was only one instance of that author in Authors. I was still under the impression that one author had one author sort and if the sorts were different it was more than one instance of an author.

Quote:
Originally Posted by chaley View Post
Most of the accesses of author_sort are done through the meta2 view. There is seldom a need to use db.author_sort.
This is an area that's unfamiliar for me. I can only find 3 files that reference the meta2 view and they don't seem to provide an interface to it. Could you point me in the right direction to learn more about this? Is db.get_metadata not the best way to retrieve metadata for a book?

Quote:
Originally Posted by chaley View Post
Setting a book's author sort does not change the sort strings for the author(s), because there may be no discernible relation between the two. Nothing stops me from setting a book's author_sort to whatever I want.

The choice to update the books when the sort string for an author is changed was not made lightly, and it could be argued that it was wrong. In the end, I decided that if you are playing with an author's sort, then you *probably* want to change the author's books. No one has complained yet. It is arguably better to change only the books where the value of the author_sort equals the value of the previous author's sort, but I didn't do this because it hinders repair when the author sort values get wildly out of sync. It would also be slow and possibly mysterious.
That's certainly understandable.

Quote:
Originally Posted by chaley View Post
This is precisely the situation that must be accounted for. For whatever reason, accident or intentional, the book has a different value than the author.

As regards the enhancement request, the custom columns already do much of what the ticket is asking for, at least for fields with single values (although I seem to have forgotten series, which is probably a bug). Multi-value (is_multiple) fields are more complicated: what does equals mean, exactly? The authors field is even more complicated, because order matters, which isn't the case with tags.

It isn't at all clear to me what should be displayed for is_multiple fields. Should it display only if all are identical? Should it display the ones that are identical (which would open up some interesting UI issues)? For authors, should it respect order (I think yes)?
Since I'm only adding it for the built-in fields specifically authors, author_sort, series, publisher, and maybe tags(I haven't approached this area yet), I figure the use case is pretty straightforward. An example case would be like my David Weber example earlier, if I was just sorting by author in the library view I wouldn't be able to tell which ones had Weber, David and which ones had Weber, David J unless I happen to notice the titles weren't alphabetical. If I bulk edit them for something else, let's say publisher, but the author sort field isn't filled then I know something is wonky and can fix it then. Otherwise I would keep going by that erroneous assumption until something else went wrong. It's similar to the colored backgrounds in the metadata-single dialog. It's just a quick visual check to make sure your assumptions are accurate.

The main goal is a quick check to make sure that the metadata agrees when selecting a group of books, which in my mind would be that they match exactly. It would only be for testing if all are identical. If it only showed the ones that matched and forgave any extras then that would lead to more erroneous assumptions. Programmatically I'm saying if there is only one unique item in a set of that particular field then they all match and there is no need to overwrite that field unless the user changes it. I agree that tags would be different, though you could just test for set equality rather than string equality.
rkworthy is offline   Reply With Quote
Old 12-10-2010, 03:07 AM   #11
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,738
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by rkworthy View Post
This is an area that's unfamiliar for me. I can only find 3 files that reference the meta2 view and they don't seem to provide an interface to it. Could you point me in the right direction to learn more about this? Is db.get_metadata not the best way to retrieve metadata for a book?
These show up as an array reference because of how caching works.

Look at library.cache.py, and in particular refresh(), to see how the view is used to build the cache. You will see that it populates an array '_data'. There are some related pointer arrays that are used for sorting and searching.

Next look at library.database2, line 306 in current source. You see that the variable 'data' is assigned to the cache object. If data is subscripted (data[12]), because of python magic the ResultCache method __getitem__ is called, which will return the row from _data, thus the values returned by querying the meta2 view.

There are zillions of examples of using 'data' in caches.py and gui2.library.models.py. In particular, look at 'data' in models.py, where you will see how column names are translated to the cell in a line in data.
chaley is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What am I missing? (author_sort) megachirops Calibre 12 09-06-2010 11:15 AM
Table Of Contents Question Humble Calibre 2 07-26-2010 09:10 PM
Question Regarding Table of Contents Guns4Hire Sigil 2 01-12-2010 11:15 PM
Forget coffee table books-- how about a kitchen table book? ardeegee Lounge 10 12-02-2009 12:00 PM
Table of contents and time question lizzielou Sony Reader 2 11-29-2009 04:48 AM


All times are GMT -4. The time now is 04:12 AM.


MobileRead.com is a privately owned, operated and funded community.