![]() |
#1 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Query/Request/Nuisance
Hi there, calibre developers! I know you have oodles of free time, particularly since none of you have recently made a life-changing relocation of any sort.
I have a question about future functionality with respect to foreign languages in general, and indefinite articles in particular. Namely, with calibre's increasing support for foreign language e-text, I was curious as to the possibility/desirability of a selectable function that would, in a batch (or individual) edit, appropriately adjust the indefinite articles in "title sort" field based on a Language selection. That is, upon selecting "French" (in a drop down menu, or something), the edit function would use the title, "Une fois pour toutes" to fill the "title sort" field with "fois pour toutes, Une". Or if you were to select "Spanish" it would use "El amor en los tiempos de colera" in the title field to create a "title sort" of "amor en los tiempos de colera, El". Or for Italian use "Il padrino" to get "padrino, Il". (I'm all out of multi-lingual examples). Currently, the user can go through and do this manually, but it sets off the "not matching" red goof light. Also, it's pretty tedious (if admittedly kind of meditative). Seriously (for a moment), I know that calibre development is a labor of love, and that the financial compensation in all likelihood falls way short of the manhours you've all dedicated to it. So of the three words in the thread title, I hope you pay most attention to the first one. This posts (and the others I've made of its ilk) is really written in the spirit of providing "food for thought", not critical scrutiny. As far as I'm concerned, every single piece of added functionality is a luxury -- pure gravy, since calibre already does more than any tool I could've possibly designed myself. And, as always, thanks for taking the time to read. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is a problem with that, namely that title sort is not actually dynamically calculated, except on initial import. After that it is stored in the db. Often on initial import, the language metadata is not present/correct, so dynamically adjusting the algorithm might actually yield worse results on average. Another place it can be applied dynamically is on the clicking of the auto generate title sort button, based on the current book language. However, for this to happen someone will have to write the language specific sorting code for each language.
|
![]() |
![]() |
![]() |
#3 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
So if you (or the other developers) were provided with lists of indefinite articles sorted by language, would that make implementing the second kind of dynamic alteration more feasible?
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Certainly, if some wrote/tested the regexes/provided a list of articles for a bunch of different languages, it should be fairly easy for me to implement. To make it worthwhile, it would have to be a fair number of languages, say about a dozen. I don't really want to go to the trouble for a couple of languages.
|
![]() |
![]() |
![]() |
#5 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
I'm very familiar with the romance languages, having studied 3 of them. As for the rest, the internet proved conveniently educational. The only one for which I'm afraid I might be missing an article or two is Greek (It's Greek to me! Ha. Ha. Oh geez, I'm too much). I also omitted the genitive case for articles since "title sort" doesn't modify it in English (Of Mice And Men).
Additionally, I'm not hating on Slavic or Asian languages by omitting them. In the course of my research, I found that many of the Slavic languages (and Arabic!) express articles as suffixes (or simply don't have them). As for the Asian languages, my own study of Chinese and my passing exposure to Japanese (neither of which have articles of any kind) leads me to believe the same is likely true of other East Asian languages. That said, if anyone else speaks Korean, Thai, Vietnamese, Khmer, or any of the Austronesian languages, please contribute! As for South Asian languages, I plead total ignorance, but I figure Kovid would be able to provide some input there. Now, I realize I've only provided half (the easiest half, no less) of what you asked for. Unfortunately, regex is one language that I don't understand at all. Still, this list (and any others that forum members might contribute!) could serve as a reference in the even that someday someone gets an irresistible urge to create a bunch of regular expressions for relocating definite and indefinite articles in a title sort. Spoiler:
Last edited by ElMiko; 11-27-2011 at 12:29 PM. Reason: Updated list of articles to include Spanish "Lo"; German 4 new articles; Hungarian articles; French partitive articles |
![]() |
![]() |
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cool, please open a ticket and I will get to it once my immediate TODO pile has wound down a little.
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 52
Karma: 8860
Join Date: Jul 2009
Location: Madrid, Spain, EU
Device: Sony PRS-505, Sony PRS-T1, Sony PRS-T3
|
I found this thread almost by chance by its referral in Calibre's bug tracking system. I know it is an old thread, but just for the records - and in case it is of interest for someone someday, I would like to point out that in the Spanish language section the "singular definite neutral" article Lo has been omitted.
Therefore, the full list of standard articles in the Spanish language is: El La Lo Los Las Un Una Unos Unas Best from Madrid, -Enrique EDIT: It is not such and old thread after all. Sorry: I misread, or misinterpreted the message dates. Last edited by enriquep; 11-11-2011 at 07:12 AM. Reason: Added last paragraph. |
![]() |
![]() |
![]() |
#8 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
@Enrique - Thanks! This is where a native speaker is particularly useful. I omitted "Lo" based on its English translation of "That which" or "It", neither of which would be modified in a title sort. Should a hypothetical title, "Lo dudo" be sorted "dudo, Lo"? or "Lo que piensan" as "que piensan, Lo"? or "Lo pensado" as "pensado, Lo? I would have thought no, but again I'm basing this off observed English norms. As a Spaniard, you're in a better position to speak of prevailing practices in Spanish. If you could provide some confirmation, that'd be great!
|
![]() |
![]() |
![]() |
#9 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 52
Karma: 8860
Join Date: Jul 2009
Location: Madrid, Spain, EU
Device: Sony PRS-505, Sony PRS-T1, Sony PRS-T3
|
Elmiko,
I understand your point, and it is extremely reasonable. On one hand you are completely right, as I admit that sometimes Spanish sorting might sound bizarre to an English speaker. But also have in mind that, to a Spanish speaker "lo" is only the singular form of "los", and it would be really difficult to understand that a given sorting rule applies to "los" and not to "lo". Also, from a Spanish speaker's point of view if a sorting rule applies to articles it should apply to all articles, and that's it... even if it sounds weird in English (and probably in Spanish). To all "standard" articles, at least, as I said in my previous post. We can (and perhaps should) leave out freom this discussion the contractions "al" and "del", that technically are articles - although not included in the "really 100% standard articles" list that we all memorize at school and never forget ![]() By the way, in the two examples you mention "lo" does not behave as an article but as a direct object (somehow replacing the English "it", not the English "the") and are two cases were Calibre manual title sort will have to come into action ![]() Best from Madrid (well, from Málaga right now, accidentally), -Enrique |
![]() |
![]() |
![]() |
#10 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
|
Awesome! Thanks! Went ahead and added it to my list, too.
|
![]() |
![]() |
![]() |
#11 |
Junior Member
![]() ![]() ![]() ![]() Posts: 1
Karma: 300
Join Date: Nov 2011
Location: Hungary
Device: Kindle 4
|
![]()
Hungarian
A Az Egy |
![]() |
![]() |
![]() |
#12 |
Zealot
![]() ![]() ![]() ![]() Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
|
On a related note:
I see now (Version 0.8.28) that there is a language specific list of values in Preferences/Tweaks/Set of words considered to be articles, called per_language_title_sort_articles Where do take the language code for this list from? "German" is "deu", it says. Is that some kind of official code? Are those all 3 characters? How about other languages? Why not the usual language codes (en, de, ...)? I suppose that the same code is to be used for the other tweak default_language_for_title_sort Can anyone enlighten me, please? I'd like to use this facility from now on. Thanxx, Mixx |
![]() |
![]() |
![]() |
#13 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 100
Karma: 2092
Join Date: Sep 2011
Location: UK
Device: Kobo Sage, iPad
|
Quote:
de du d’ And would “de la” need listing separately, or would it already be covered by “de” and “la”*? |
|
![]() |
![]() |
![]() |
#14 |
Avid Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 161
Karma: 36472
Join Date: Sep 2008
Location: Look for rain, hail and snow...
Device: PRS-505, PRS-600, PRS T1, Kobo Glo
|
|
![]() |
![]() |
![]() |
#15 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I have added them, for the next release. You can add them yourself via Preferences->Tweaks in the meantime.
|
![]() |
![]() |
![]() |
Tags |
calibre, metadata, multi-lingual support |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Firmware query | Orn8one | Sony Reader | 9 | 10-12-2011 07:15 PM |
Query about Calibre 0.7.9 and request | mitch13 | Calibre | 5 | 07-22-2010 10:19 PM |
Query - Kindle DX US | ukhant | Kindle Developer's Corner | 2 | 07-20-2010 03:23 AM |
NCX playOrder nuisance | erik5000 | ePub | 3 | 12-24-2009 08:08 AM |
Query: Getting Title to Appear How You Want It To Appear | Mindy | Bookeen | 18 | 08-08-2008 08:15 AM |