Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-05-2011, 08:06 AM   #1
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Query/Request/Nuisance

Hi there, calibre developers! I know you have oodles of free time, particularly since none of you have recently made a life-changing relocation of any sort.

I have a question about future functionality with respect to foreign languages in general, and indefinite articles in particular. Namely, with calibre's increasing support for foreign language e-text, I was curious as to the possibility/desirability of a selectable function that would, in a batch (or individual) edit, appropriately adjust the indefinite articles in "title sort" field based on a Language selection. That is, upon selecting "French" (in a drop down menu, or something), the edit function would use the title, "Une fois pour toutes" to fill the "title sort" field with "fois pour toutes, Une". Or if you were to select "Spanish" it would use "El amor en los tiempos de colera" in the title field to create a "title sort" of "amor en los tiempos de colera, El". Or for Italian use "Il padrino" to get "padrino, Il". (I'm all out of multi-lingual examples).

Currently, the user can go through and do this manually, but it sets off the "not matching" red goof light. Also, it's pretty tedious (if admittedly kind of meditative).

Seriously (for a moment), I know that calibre development is a labor of love, and that the financial compensation in all likelihood falls way short of the manhours you've all dedicated to it. So of the three words in the thread title, I hope you pay most attention to the first one. This posts (and the others I've made of its ilk) is really written in the spirit of providing "food for thought", not critical scrutiny. As far as I'm concerned, every single piece of added functionality is a luxury -- pure gravy, since calibre already does more than any tool I could've possibly designed myself.

And, as always, thanks for taking the time to read.
ElMiko is online now   Reply With Quote
Old 11-05-2011, 08:29 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is a problem with that, namely that title sort is not actually dynamically calculated, except on initial import. After that it is stored in the db. Often on initial import, the language metadata is not present/correct, so dynamically adjusting the algorithm might actually yield worse results on average. Another place it can be applied dynamically is on the clicking of the auto generate title sort button, based on the current book language. However, for this to happen someone will have to write the language specific sorting code for each language.
kovidgoyal is offline   Reply With Quote
Old 11-05-2011, 10:10 AM   #3
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
So if you (or the other developers) were provided with lists of indefinite articles sorted by language, would that make implementing the second kind of dynamic alteration more feasible?
ElMiko is online now   Reply With Quote
Old 11-05-2011, 10:32 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Certainly, if some wrote/tested the regexes/provided a list of articles for a bunch of different languages, it should be fairly easy for me to implement. To make it worthwhile, it would have to be a fair number of languages, say about a dozen. I don't really want to go to the trouble for a couple of languages.
kovidgoyal is offline   Reply With Quote
Old 11-05-2011, 01:44 PM   #5
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
I'm very familiar with the romance languages, having studied 3 of them. As for the rest, the internet proved conveniently educational. The only one for which I'm afraid I might be missing an article or two is Greek (It's Greek to me! Ha. Ha. Oh geez, I'm too much). I also omitted the genitive case for articles since "title sort" doesn't modify it in English (Of Mice And Men).

Additionally, I'm not hating on Slavic or Asian languages by omitting them. In the course of my research, I found that many of the Slavic languages (and Arabic!) express articles as suffixes (or simply don't have them). As for the Asian languages, my own study of Chinese and my passing exposure to Japanese (neither of which have articles of any kind) leads me to believe the same is likely true of other East Asian languages. That said, if anyone else speaks Korean, Thai, Vietnamese, Khmer, or any of the Austronesian languages, please contribute! As for South Asian languages, I plead total ignorance, but I figure Kovid would be able to provide some input there.

Now, I realize I've only provided half (the easiest half, no less) of what you asked for. Unfortunately, regex is one language that I don't understand at all. Still, this list (and any others that forum members might contribute!) could serve as a reference in the even that someday someone gets an irresistible urge to create a bunch of regular expressions for relocating definite and indefinite articles in a title sort.

Spoiler:


Spanish
El
La
Lo
Los
Las
Un
Una
Unos
Unas

French
Le
La
L'
Les
Un
Une
Des
De
Du
D’

Italian
Lo
Il
L'
La
Gli
I
Le

Portuguese
A [This can also mean "To" as in "To Whom It May Concern"]
O
Os
As
Um
Uns
Uma
Umas

Romanian
Un
O
Nişte [note the non-standard letter]

German
Der
Die
Das
Des
Den
Dem
Ein
Eine
Einen
Einem
Eines

Dutch
De
Het
Een

Swedish
En
Ett
Det
Den
De

Turkish
Bir

Afrikaans
'n [Never capitalized]
Die

Greek
O
I
To
Ta
Tus
Tis
'Enas
'Mia
'Ena
'Enan

Hungarian
A
Az
Egy


Last edited by ElMiko; 11-27-2011 at 12:29 PM. Reason: Updated list of articles to include Spanish "Lo"; German 4 new articles; Hungarian articles; French partitive articles
ElMiko is online now   Reply With Quote
Old 11-05-2011, 09:54 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool, please open a ticket and I will get to it once my immediate TODO pile has wound down a little.
kovidgoyal is offline   Reply With Quote
Old 11-10-2011, 09:25 PM   #7
enriquep
Connoisseur
enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.
 
enriquep's Avatar
 
Posts: 52
Karma: 8860
Join Date: Jul 2009
Location: Madrid, Spain, EU
Device: Sony PRS-505, Sony PRS-T1, Sony PRS-T3
I found this thread almost by chance by its referral in Calibre's bug tracking system. I know it is an old thread, but just for the records - and in case it is of interest for someone someday, I would like to point out that in the Spanish language section the "singular definite neutral" article Lo has been omitted.

Therefore, the full list of standard articles in the Spanish language is:

El
La
Lo
Los
Las
Un
Una
Unos
Unas

Best from Madrid,
-Enrique

EDIT: It is not such and old thread after all. Sorry: I misread, or misinterpreted the message dates.

Last edited by enriquep; 11-11-2011 at 07:12 AM. Reason: Added last paragraph.
enriquep is offline   Reply With Quote
Old 11-12-2011, 11:53 AM   #8
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
@Enrique - Thanks! This is where a native speaker is particularly useful. I omitted "Lo" based on its English translation of "That which" or "It", neither of which would be modified in a title sort. Should a hypothetical title, "Lo dudo" be sorted "dudo, Lo"? or "Lo que piensan" as "que piensan, Lo"? or "Lo pensado" as "pensado, Lo? I would have thought no, but again I'm basing this off observed English norms. As a Spaniard, you're in a better position to speak of prevailing practices in Spanish. If you could provide some confirmation, that'd be great!
ElMiko is online now   Reply With Quote
Old 11-12-2011, 02:13 PM   #9
enriquep
Connoisseur
enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.enriquep shines like a glazed doughnut.
 
enriquep's Avatar
 
Posts: 52
Karma: 8860
Join Date: Jul 2009
Location: Madrid, Spain, EU
Device: Sony PRS-505, Sony PRS-T1, Sony PRS-T3
Elmiko,

I understand your point, and it is extremely reasonable.

On one hand you are completely right, as I admit that sometimes Spanish sorting might sound bizarre to an English speaker. But also have in mind that, to a Spanish speaker "lo" is only the singular form of "los", and it would be really difficult to understand that a given sorting rule applies to "los" and not to "lo". Also, from a Spanish speaker's point of view if a sorting rule applies to articles it should apply to all articles, and that's it... even if it sounds weird in English (and probably in Spanish).

To all "standard" articles, at least, as I said in my previous post. We can (and perhaps should) leave out freom this discussion the contractions "al" and "del", that technically are articles - although not included in the "really 100% standard articles" list that we all memorize at school and never forget

By the way, in the two examples you mention "lo" does not behave as an article but as a direct object (somehow replacing the English "it", not the English "the") and are two cases were Calibre manual title sort will have to come into action , but it does (behave as an article) in many others, that do not sound so strange when translated: "Lo peor de cada casa" --> "peor de cada casa, Lo", "Lo más triste de mi vida" --> "más triste de mi vida, Lo".

Best from Madrid (well, from Málaga right now, accidentally),
-Enrique
enriquep is offline   Reply With Quote
Old 11-12-2011, 07:29 PM   #10
ElMiko
Evangelist
ElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileReadElMiko has read every ebook posted at MobileRead
 
ElMiko's Avatar
 
Posts: 450
Karma: 65460
Join Date: Jun 2011
Device: Kindle
Awesome! Thanks! Went ahead and added it to my list, too.
ElMiko is online now   Reply With Quote
Old 11-26-2011, 12:06 AM   #11
SPQR10
Junior Member
SPQR10 has a complete set of Star Wars action figures.SPQR10 has a complete set of Star Wars action figures.SPQR10 has a complete set of Star Wars action figures.SPQR10 has a complete set of Star Wars action figures.
 
Posts: 1
Karma: 300
Join Date: Nov 2011
Location: Hungary
Device: Kindle 4
Cool

Hungarian
A
Az
Egy
SPQR10 is offline   Reply With Quote
Old 11-26-2011, 11:58 AM   #12
Mixx
Zealot
Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.Mixx has a complete set of Star Wars action figures.
 
Posts: 143
Karma: 387
Join Date: Sep 2010
Device: Kindle 3
On a related note:

I see now (Version 0.8.28) that there is a language specific list of values in Preferences/Tweaks/Set of words considered to be articles, called

per_language_title_sort_articles

Where do take the language code for this list from? "German" is "deu", it says. Is that some kind of official code? Are those all 3 characters? How about other languages? Why not the usual language codes (en, de, ...)?

I suppose that the same code is to be used for the other tweak

default_language_for_title_sort

Can anyone enlighten me, please? I'd like to use this facility from now on.

Thanxx, Mixx
Mixx is offline   Reply With Quote
Old 11-26-2011, 12:11 PM   #13
Insalata
Zealot
Insalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it isInsalata knows what time it is
 
Posts: 100
Karma: 2092
Join Date: Sep 2011
Location: UK
Device: Kobo Sage, iPad
Quote:
French
Le
La
L'
Les
Un
Une
Des
I'm not a native (or even very good) French speaker, so I don't know all the sorting conventions, but I have lots of French books, so I'm interested in this option. If you're including “des” then you'd also need:

de
du
d’

And would “de la” need listing separately, or would it already be covered by “de” and “la”*?
Insalata is offline   Reply With Quote
Old 11-26-2011, 06:39 PM   #14
opitzs
Avid Reader
opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.opitzs can successfully navigate the Paris bus system.
 
opitzs's Avatar
 
Posts: 161
Karma: 36472
Join Date: Sep 2008
Location: Look for rain, hail and snow...
Device: PRS-505, PRS-600, PRS T1, Kobo Glo
Quote:
Originally Posted by ElMiko View Post
German
Der
Die
Das
Den
Ein
Eine
Einen
Here I miss "Dem", but that is exceedingly rare
opitzs is offline   Reply With Quote
Old 11-26-2011, 09:15 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I have added them, for the next release. You can add them yourself via Preferences->Tweaks in the meantime.
kovidgoyal is offline   Reply With Quote
Reply

Tags
calibre, metadata, multi-lingual support


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Firmware query Orn8one Sony Reader 9 10-12-2011 07:15 PM
Query about Calibre 0.7.9 and request mitch13 Calibre 5 07-22-2010 10:19 PM
Query - Kindle DX US ukhant Kindle Developer's Corner 2 07-20-2010 03:23 AM
NCX playOrder nuisance erik5000 ePub 3 12-24-2009 08:08 AM
Query: Getting Title to Appear How You Want It To Appear Mindy Bookeen 18 08-08-2008 08:15 AM


All times are GMT -4. The time now is 07:26 PM.


MobileRead.com is a privately owned, operated and funded community.