Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 04-18-2011, 01:18 PM   #1
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Use of comma as field separator and author_sort

This is probably too niche of an issue to do anything about, but just to make you aware that the use of a comma (",") as the field separator for columns (as in Comma separated text like tags) conflicts with the Author Sort use of the comma.

Specifically I have an import/export function in my plugin such that users can import collection names from their Kindle into a custom column name. They can then export/create collections on their Kindle from this column once they add/remove/edit/etc collection names/books.

If their collection names on their device have commas in them (e.g. "LN, FN"), then when they import they have to use a simple text field custom column - otherwise calibre splits the name on the comma into multiple and incorrect values ("LN" and "FN"). The custom column will look reasonably ok with multiple collections for the book (e.g. "FN, LN, Mystery"), but when they export using the data all of the entries will be sent as one joined name so they end up with different names than expected (they get "FN, LN, Mystery" instead of "FN, LN", and "Mystery).

Is there a way to change the separator character (per field?) to something other than comma? Or some other simple way around this? (other than saying collection names shouldn't have commas in them or there will be problems )
meme is offline   Reply With Quote
Old 04-18-2011, 01:38 PM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Author_sort is a plain text field, not an is_multiple field, so commas aren't a problem.

Commas are significant for tags. That won't change.

Perhaps your users should use the authors-variant of the tags-like column. It takes names and, like authors, uses the '&' as the separator. Note that an is_names column respects the author_sort tweak when sorting on the tags browser.

To directly answer your question, there is no way to change the separator beyond the two you already have (comma and ampersand). That said, assuming you know the correct values, you could do what calibre does for authors (horrible hack) and change the comma to something else, then change it back when you create the collection. Calibre uses the vertical bar.
chaley is offline   Reply With Quote
Advert
Old 04-18-2011, 03:15 PM   #3
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Ah, I see - the checkbox called Contains Names in the add new column dialog.

I'll need to test it out but it looks like it'd mean everything has to have & instead of comma which I don't think will be too intuitive.

Using the hack of a vertical bar sounds reasonable and would only require a little bit of explanation. So if I import a book with the collection name "LN, FN" and "Mystery" I store it as "LN|FN, Mystery". And if the user manually edits the column they need to put "LN|FN" or "LN| FN" as in "Smith|John, Mystery" so when I export I always convert "|" to "," (as long as its preceded by a-z to avoid issues with prefixes used for sorting on the Kindle) so the collection name looks right on the device. Shouldn't be too many if any Collection names with | in them that aren't a prefix. And I can restrict it to comma separated text fields since it'd be of no use in a plain text field. Great suggestion
meme is offline   Reply With Quote
Old 04-19-2011, 05:50 AM   #4
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Ok, I've currently implemented this so that if you import collections and the collection name has a "," in it (after the first occurrence of a-zA-Z), I convert the "," to a "\" before loading it into the custom column.

When automatically creating collections I convert any "\" to "," (after a-zA-Z). Users can manually enter "|" instead of "," in the custom column so that "," can act as the separator for is_multiple columns. I do it for plain text columns as well just to be consistent.

I'm not sure about "\" (regex flashbacks) but I figured it isn't likely to be in any Collection name and its on the keyboard though it doesn't look great. ";" would be ok, but I can't help look at that and think the default separator should have be ";"


I tried to use "|" but I get odd results. If I type it in manually or import it in an multiple text custom column it gets converted to a ",". Its clear "|" is quite special since it gets left as is for plain text, converted to "&" if a Contains Name column, and to "," if multiple. I think I'll stay away from "|"
meme is offline   Reply With Quote
Old 04-19-2011, 06:25 AM   #5
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by meme View Post
Ok, I've currently implemented this so that if you import collections and the collection name has a "," in it (after the first occurrence of a-zA-Z), I convert the "," to a "\" before loading it into the custom column.
What happens if I am using non-ascii characters? For example, what happens if the collection name is "é,çèö"? Or if the name is "1001,dalmations"? There are no a-zA-Z characters in the first string, and there are none before the comma in the second string.

I think that you want to do 'foo.strip(',').strip()' before processing an item. That will remove any leading or trailing whitespace, then remove any leading or trailing commas.
chaley is offline   Reply With Quote
Advert
Old 04-19-2011, 06:47 AM   #6
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Good point. I was trying to account for people using "\" as part of their prefix for a collection and getting over-fancy by trying to allow for it to be more than just the first character (e.g. |\ stuff), but I think just ignoring one or more \ at the start is good enough. And the end as well. Of course I have to save those characters and leave them in the collection name.
meme is offline   Reply With Quote
Old 04-19-2011, 11:30 AM   #7
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Wouldn't you know it - after finally figuring out how to use re.match to deal with this I realized that it wasn't going to work. I've simplified it to a simple re.sub of "," to "\" for every occurrence. If I left some it would just end up creating a mess - better to warn people not to use commas in the prefix, but they'll still work ok.
meme is offline   Reply With Quote
Old 04-19-2011, 01:17 PM   #8
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,734
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
If you are replacing all occurrences of commas with backslashes, then you should use str.replace instead of re.sub. Replace will be much faster.

I am also wondering about the choice of backslash. My concern is that it could be misinterpreted as an escape in some situations. In particular, backslashes could easily get stripped or misinterpreted in search expressions.
chaley is offline   Reply With Quote
Old 04-19-2011, 02:43 PM   #9
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
Yeah, me too. I haven't settled it yet and I'm holding release under after 7.57 so I can add the device checks. Once I pick something though, I'm stuck with it unless I want to have to explain to users to manually edit their data. / is a popular separator in names. + is probably too. I thought of ¬ but that might not be on every keyboard and looks quite odd. ; maybe, = or ~ maybe. Hmmm, these are the tough choices

One thought was to make it configurable - but I'd rather not add another setting, and it might make troubleshooting harder.

I'll try the str.replace. I haven't tried to tune the code much. I can see there's a bulk update for calibre fields I'll need to test - right now I update the full collection list id by id so I'd have to change it to update the list of ids collection by collection.
meme is offline   Reply With Quote
Old 04-19-2011, 04:26 PM   #10
meme
Sigil developer
meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.meme ought to be getting tired of karma fortunes by now.
 
Posts: 1,275
Karma: 1101600
Join Date: Jan 2011
Location: UK
Device: Kindle PW, K4 NT, K3, Kobo Touch
I think it'll be semicolon (";") since its very close to comma, and when someone's semicolon in their Kindle collection name is converted to a comma after import/export it won't matter much as it won't look much different. And it looks ok in the custom column.
meme is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple comma separated values in custom column? silentguy Library Management 8 04-19-2011 05:10 AM
Switching author name from comma separated to first last kbaggs Library Management 1 01-30-2011 03:36 PM
Using a comma in the Tags field Agama Calibre 1 11-22-2010 06:10 PM
Little bug in bgcolor for author_sort? Coleccionista Calibre 4 11-12-2010 10:57 AM
What am I missing? (author_sort) megachirops Calibre 12 09-06-2010 11:15 AM


All times are GMT -4. The time now is 11:26 PM.


MobileRead.com is a privately owned, operated and funded community.