08-31-2010, 07:23 AM | #1 |
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Sony collections and custom fields: how to handle duplicates?
I am (finally) getting around to implementing custom field support for file name templates and collections, and have reached a point where choices must be made. I am looking for guidance.
The question: what do I do if two fields being used for collection generation contain the same value? For example, I might have a tag 'Fiction' and a #genre 'Fiction'. If I name the collection 'Fiction', then books with either the tag or the #genre will end up in the same collection, even if they mean something different in the context of their field. The question becomes more complicated if one of the two (or three or ...) is a series. Imagine I have a tag 'Foo' and a series 'Foo'? They will both end up in the same collection, named 'Foo'. How should the collection be sorted? If by series_index, then what index do I use for books that are tagged 'Foo', but are members of the series 'Bar'? (This can happen today, and the results are strange.) I can imagine several ways forward. 1) Have a single category name and put everything into it, regardless of where the value comes from. Live with the strange sorting and the meaning collisions. 2) Add the field lookup value to the category, thereby creating multiple categories. In this case and using the above examples, we would have categories: - "Fiction (tags)" and "Fiction (#genre)" - "Foo (tags)" and "Foo (series)" 3) As in #2, but reversed: "(tags) Fiction" and "(#genre) Fiction. 4. As in #2 or #3, but using the column name: "Fiction (Tags)" and "Fiction (Genre)" 5: As in #2-#4, but using something like colon separation, e.g., Foo:tags 6: Refuse to do anything, saying that there is a name conflict. 7: Let the first one win, and ignore all the rest. Anyone care to discuss? Or provide other ideas? Speak now, or forever hold your peace. My choice would be #4, or possibly its #5 equivalent. It is worth noting that user-defined categories make this problem even worse. These categories have no provenance (no source field), but can conflict with categories that come from metadata fields. This happens today with series, but I have a hack in the code to detect that conflict. In the end, I may need to refuse to allow custom-field categories if the 'Metadata Management' option is set to 'Manual'. |
08-31-2010, 07:47 AM | #2 | |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
If I had the option I would love to use a user defined #genre column for collections. Given this option I would drop tags from the send template and use genre and series. Good luck moving forward and thanks in advance for all the work you're about to undertake. Maybe I can add something of substance this evening. |
|
08-31-2010, 11:27 AM | #3 |
creator of calibre
Posts: 43,776
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'd vote for 4, but since I don't use custom cols or the SONY reader, I probably dont grasp the subtleties involved.
|
08-31-2010, 01:04 PM | #4 |
Wizard
Posts: 1,377
Karma: 9400
Join Date: Sep 2009
Location: Europe
Device: PRS-650, iPod touch 4G, iPad 3
|
#4 looks good and reasonable for me.
|
08-31-2010, 02:26 PM | #5 |
Evangelist
Posts: 469
Karma: 600816
Join Date: Sep 2009
Device: Kobo Aura HD, Kobo Aura One
|
+1 for #4. However, on user-defined series I think merging is preferable (#1), since one can use index for sorting. So I would suggest that if you run into a name conflict that only involves series-type columns that the name is left as is.
For example, let's say I have a series (built-in column) John Doe and it has books 1-6 in it and another series (built-in) Jane Doe with books 1-3 & 5-7 in it. If I proceed to create a user-defined series column and place the value of Jane Doe[4] in it for the John Doe[3] then I would only want to see two series on the reader: John Doe (six books) and Jane Doe (seven books), with one book appearing in both series. Does this approach make sense? |
08-31-2010, 03:59 PM | #6 | |
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
I can imagine cases where the series assertion isn't true. For example, I might have a series field for publication order that uses the same names as the series column. One prime example would be MZ Bradley's Darkover books, where series order is very different from publication order. I would use the standard series order to indicate temporal ordering in the Darkover universe, while using a #pub_series column to indicate publication order. In this case I cannot merge the two columns. I think that your proposal would be appropriate when one wants to say that a given book is in more than one series. I would create N series columns, where N is the maximum number of series a book can have, then put the series information in some column or another. I can see why you want this, but it does raise some interesting problems related to searching and sorting. In your example, if I ask for series:"Jane Doe", I would not see book 4. It would not be possible to sort the Jane Doe series into order. The disparity between what one sees on the reader and what one sees in Calibre concerns me, especially from a support standpoint. However, I could be fretting over nothing. |
|
08-31-2010, 04:28 PM | #7 | |
Evangelist
Posts: 469
Karma: 600816
Join Date: Sep 2009
Device: Kobo Aura HD, Kobo Aura One
|
Quote:
Tickets: 2581, 6249 and 4943. Last edited by dmapr; 08-31-2010 at 04:32 PM. |
|
08-31-2010, 05:09 PM | #8 | |
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
The one that won't happen is a series column that can have multiple (name,index) values. The database scheme simply won't allow it, and changing the schema is too hard. Kovid is talking about someday doing a library redesign, and that is the only time when it might happen. Searching across all columns with a common prefix (series, series_1, series_2, etc) is feasible. I had a conversation with starson17 about this, where the idea was to search for series*:value. Someone (probably me) needs to be convinced that the effort is worth it. I am leaning toward 'yes'. Merging instead of separating series is also feasible. My guess is that it should be done with a tweak, so that the standard behavior is consistent across all column types. I will look (hard) at this possibility as I implement the collection management stuff. Of course, having this tweak for series creates the possibility/need for a similar tweak for tags/text columns, so I suppose I need to look at that as well. The more tweaks that appear, the less happy I am about doing any of them, because feature interaction problems and bugs become too hard for my brain to work through. It seems to me that common prefix searching and series collapsing will give you a usable approximation of what you want. Given that what you want is reasonable, I will try to make it happen. |
|
08-31-2010, 06:36 PM | #9 |
Evangelist
Posts: 469
Karma: 600816
Join Date: Sep 2009
Device: Kobo Aura HD, Kobo Aura One
|
OK, just to make sure I understand you correctly. If in my previous example I used the name series_1 for the column where I stuck the Jane Doe[4] value then I would get the behavior I was after and there would also be a way to consolidate the view in Calibre; while if the column was named secondary_character then I'd have a Jane Doe series as well as Jane Doe (Secondary Character) series on the reader?
If this is what you propose then I believe this to be the best of both worlds. Last edited by dmapr; 08-31-2010 at 07:15 PM. |
08-31-2010, 06:56 PM | #10 | |
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
The name of the column has nothing to do with what happens on the reader. What I hope to be able to do is to provide a 'tweak' that indicates that *all* series columns are to be collapsed when creating collections on the reader. In other words, from the reader's point of view, there is only one column. Strange things will happen if two columns for a given book contain the same series name but different series index numbers. I cannot predict which index will be used to sort, or even if the same index will be consistently used. (Hmmm.... I suppose I could look at doing prefix matching when building collections, but then I would need some way to know what prefixes to look at. My suspicion is that doing so will complicate life beyond my willingness to deal with it, but I will think about it.) Now to searching. What I propose and hope to be able to build is the ability to search across a set of columns sharing a common search name prefix. In your example, that prefix would be 'series'. Searching for 'series*' would check columns named series, series_1, series_2, seriesblunderbuss, etc. This allows you to search for series*:"=Jane Doe" and find values across all the matching series columns. You can search individual columns by providing its name without the '*'. The sorting problem remains unresolved. There is no way to sort on a merger of two columns. Note that all of these ideas are subject to Kovid's approval. He rightly does not want to add functionality that is not consistent with his plans for calibre's future. |
|
08-31-2010, 07:19 PM | #11 | ||
Evangelist
Posts: 469
Karma: 600816
Join Date: Sep 2009
Device: Kobo Aura HD, Kobo Aura One
|
Quote:
Quote:
|
||
09-01-2010, 03:01 AM | #12 | ||
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
I would prefer a tweak over duplicating the driver. Quote:
Kovid made a fundamental choice early in calibre's development, to use a denormalized view as the interface between the GUI and the DB. This choice makes a lot of sense when using a tabular interface vs a forms interface. A consequence is that there is a view column per display/sortable/searchable column. Naturally multiple fields like tags and authors are collapsed into a single comma-separated list. Last edited by chaley; 09-01-2010 at 03:45 PM. |
||
09-01-2010, 02:46 PM | #13 | ||
Evangelist
Posts: 469
Karma: 600816
Join Date: Sep 2009
Device: Kobo Aura HD, Kobo Aura One
|
Quote:
Quote:
On the other hand, this does present an interesting problem to think about — obviously the comma-separated list of series doesn't make much sense… |
||
09-02-2010, 04:11 PM | #14 |
Grand Sorcerer
Posts: 11,703
Karma: 6658935
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Result:
I picked #4, and added 2 tweaks: one that combines collections and another that creates new search terms grouping existing ones. Example: - Assume custom fields #myseries and mybool. - Assume two books: 1. Title: Title1, Series: foo[1], #read: Yes, #myseries: bar[2] 2. Title: Title2, Series: bar[1], #read: Undefined, #myseries: mumble[1] - Assume device customization 'make collections from': series, #mybool, #myseries - Assume Metadata Management=Automatic - Assume column headings #read:'Read' and #myseries:'My Series' Default case (no tweaks set): Collections on device: - 'foo' contains Title1 - 'bar' (contains Title2 - 'bar (My Series)' contains Title1 - 'mumble (My Series)' contains Title2 - 'Yes (Read)' contains Title1 Case with tweak set merging series and #myseries collections: - 'foo' (contains Title1) - 'bar' (contains Title2 and Title1) - 'mumble' (contains Title2) - 'Yes (Read)' contains Title1 Case with tweak merging series and #myseries, renaming the collection source to 'Zap': - 'foo (Zap)' (contains Title1) - 'bar (Zap)' (contains Title2 and Title1) - 'mumble (Zap)' (contains Title2) - 'Yes (Read)' contains Title1 In addition, I added a tweak to create new search terms that search multiple existing categories. This permits creating a search term that in the above example would search both 'series' and '#myseries'. For example, the tweak "'myseries':['series', '#myseries']" creates a search term 'myseries'. Searching for 'myseries:bar' will find both Title1 and Title2. The search term tweak will appear in the next calibre release. The custom field collections stuff will be in a beta that will come out in a week or so. |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre beta available that fully integrates custom fields | chaley | Calibre | 206 | 10-03-2010 02:28 AM |
Turn custom series column into collection on a Sony reader? | dmapr | Calibre | 4 | 08-21-2010 02:31 PM |
Custom collections on sony 505 | slantybard | Calibre | 2 | 01-30-2010 09:08 AM |
Sony, Jetbook, ESlick, or BeBook? Also: Canadian custom fees? | JBean | Which one should I buy? | 12 | 08-04-2009 08:19 PM |
Ended Custom Sony Prs-505 Book Light | aapezzuto | Flea Market | 9 | 04-24-2008 02:08 PM |