06-13-2011, 07:46 AM | #76 | |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Quote:
de der des (van de) (van der) (van den) ter ten den |
|
06-13-2011, 07:48 AM | #77 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
After the last update, duplicate scan is a lot slower. Does this someting have to do with the highlight fix you did on the quality scan? Or is this fix not added on this plugin?
|
Advert | |
|
06-13-2011, 07:55 AM | #78 |
Grand Sorcerer
Posts: 11,731
Karma: 6690881
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Adding to the honorific list, you should include "de la", "von", "del", and "della", To be picky, they should be in lower case. In French, the de in "De XXX" is not an indication of nobility, while in "de XXX" it is. I think the same thing is true for von in German and van in Dutch.
|
06-13-2011, 08:01 AM | #79 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
I added "van" because Kovid had added it to the Calibre code a while ago.
This is the full list of words that are currently ignored by this plugin in an author's name (except for identical author searches): 'von', 'van', 'jr', 'sr', 'i', 'ii' 'iii', 'second', 'third', 'md', 'phd' @drMerry - no changes were made that would affect performance. |
06-13-2011, 08:04 AM | #80 | |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Quote:
Last edited by drMerry; 06-13-2011 at 08:10 AM. Reason: Quality check idea should not be posted here |
|
Advert | |
|
06-13-2011, 08:06 AM | #81 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
than what about
dr dr. prof m.d. ph.d. sr. jr. et al. |
06-13-2011, 08:18 AM | #82 |
Connoisseur
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
|
Oh ohh, there I have started something. ;-)
|
06-13-2011, 08:26 AM | #83 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
Something completely else:
Another problem is the maiden name: If A. B. marries C. D. and the author uses both names (B. and D.) it could be written as: A. D.-B. (seen as one name on calibre) A. D. B. (Other options could be the author still uses B. or only uses D. but that is not 'check-able' (B against D)) When author swaps B and D, it would be recognized as well. So I think - should be handled as " " in this plugin |
06-13-2011, 08:33 AM | #84 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
And of course:
Sir, etc. Maybe you could implement a way of adding this pre and suffixes manually by the user while it seems everybody needs other pre and suffixes according to the local used ones. |
06-13-2011, 08:34 AM | #85 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@drMerry - the plugin already strips a whole bunch of characters out of the names: "-+.:;" so most of what you have posted above is already taken care of.
|
06-13-2011, 08:47 AM | #86 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
I had a case that it missed yesterday, hope I can find it for you.
What do you mean by stripping. a-b becomes ab or a b? and a - b? a b? a b? a b? Does this has negative effects or is more than one space always seen as 1 (also in soundex)? |
06-13-2011, 08:58 AM | #87 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
Those particular characters get replace with spaces, so a-b will become "a b". Other characters get removed without space substitutions.
You have the code in the plugin - look in algorithms.py for "get_author_tokens()". |
06-13-2011, 09:17 AM | #88 |
Addict
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
|
thanks.
I think the problem was in the ; part. , is assumed to be ln, fn. I think there is another way, if you have "(((\w+)\s(\w+))+[,;$]){2,*}" you could (safely?) assume it is not ln, fn but multiple authors |
06-13-2011, 09:22 AM | #89 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
I don't believe this plugin is the correct place to start reinterpreting author names as multiple authors. It is crappy metadata, garbage in, garbage out.
I would suggest it is a Quality Check type thing, except the solution is so simple it is not worth bothering Quality Check with, since you can just search authors:; |
06-13-2011, 04:29 PM | #90 |
Enjoy Life
Posts: 26
Karma: 10
Join Date: Jun 2011
Location: Portugal
Device: Kindle
|
I was looking for this. Thank you
|
Tags |
cross library duplicates, in library duplicates |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1184 | Yesterday 06:17 PM |
[GUI Plugin] View Manager | kiwidude | Plugins | 414 | 04-13-2024 01:41 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 08:39 AM |
[GUI Plugin] Generate Cover | kiwidude | Plugins | 811 | 03-16-2024 11:31 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |