Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-13-2011, 07:46 AM   #76
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by Loeffel View Post
May I ask what else do you have in the exception list? I assmume "Mc" and "von", if I understand this right.
in advance of the reaction of kiwidude, these are some other parts that should in this group as well (dutch):
de
der
des
(van de)
(van der)
(van den)
ter
ten
den
drMerry is offline   Reply With Quote
Old 06-13-2011, 07:48 AM   #77
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
After the last update, duplicate scan is a lot slower. Does this someting have to do with the highlight fix you did on the quality scan? Or is this fix not added on this plugin?
drMerry is offline   Reply With Quote
Old 06-13-2011, 07:55 AM   #78
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,264
Karma: 821512
Join Date: Jan 2010
Location: France
Device: Many android devices
Adding to the honorific list, you should include "de la", "von", "del", and "della", To be picky, they should be in lower case. In French, the de in "De XXX" is not an indication of nobility, while in "de XXX" it is. I think the same thing is true for von in German and van in Dutch.
chaley is offline   Reply With Quote
Old 06-13-2011, 08:01 AM   #79
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
I added "van" because Kovid had added it to the Calibre code a while ago.

This is the full list of words that are currently ignored by this plugin in an author's name (except for identical author searches):
'von', 'van', 'jr', 'sr', 'i', 'ii' 'iii', 'second', 'third', 'md', 'phd'

@drMerry - no changes were made that would affect performance.
kiwidude is offline   Reply With Quote
Old 06-13-2011, 08:04 AM   #80
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Quote:
Originally Posted by chaley View Post
Adding to the honorific list, you should include "de la", "von", "del", and "della", To be picky, they should be in lower case. In French, the de in "De XXX" is not an indication of nobility, while in "de XXX" it is. I think the same thing is true for von in German and van in Dutch.
So, than the question is, how is this function implemented. And, maybe more important. Does this mean that "Peter Pan de XXX" should be ignored as duplicate of "Peter Pan"? Or, should you just check "Peter Pan" against "Peter Pan XXX"

Last edited by drMerry; 06-13-2011 at 08:10 AM. Reason: Quality check idea should not be posted here
drMerry is offline   Reply With Quote
Old 06-13-2011, 08:06 AM   #81
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
than what about
dr
dr.
prof
m.d.
ph.d.
sr.
jr.
et al.
drMerry is offline   Reply With Quote
Old 06-13-2011, 08:18 AM   #82
Loeffel
Connoisseur
Loeffel began at the beginning.
 
Loeffel's Avatar
 
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
Oh ohh, there I have started something. ;-)
Loeffel is offline   Reply With Quote
Old 06-13-2011, 08:26 AM   #83
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Something completely else:
Another problem is the maiden name:
If A. B. marries C. D. and the author uses both names (B. and D.) it could be written as:
A. D.-B. (seen as one name on calibre)
A. D. B.
(Other options could be the author still uses B. or only uses D. but that is not 'check-able' (B against D))
When author swaps B and D, it would be recognized as well.
So I think - should be handled as " " in this plugin
drMerry is offline   Reply With Quote
Old 06-13-2011, 08:33 AM   #84
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
And of course:
Sir, etc.
Maybe you could implement a way of adding this pre and suffixes manually by the user while it seems everybody needs other pre and suffixes according to the local used ones.
drMerry is offline   Reply With Quote
Old 06-13-2011, 08:34 AM   #85
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@drMerry - the plugin already strips a whole bunch of characters out of the names: "-+.:;" so most of what you have posted above is already taken care of.
kiwidude is offline   Reply With Quote
Old 06-13-2011, 08:47 AM   #86
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
I had a case that it missed yesterday, hope I can find it for you.
What do you mean by stripping. a-b becomes ab or a b?
and a - b?
a b?
a b?
a b?
Does this has negative effects or is more than one space always seen as 1 (also in soundex)?
drMerry is offline   Reply With Quote
Old 06-13-2011, 08:58 AM   #87
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Those particular characters get replace with spaces, so a-b will become "a b". Other characters get removed without space substitutions.

You have the code in the plugin - look in algorithms.py for "get_author_tokens()".
kiwidude is offline   Reply With Quote
Old 06-13-2011, 09:17 AM   #88
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
thanks.
I think the problem was in the ; part.
, is assumed to be ln, fn.
I think there is another way,
if you have "(((\w+)\s(\w+))+[,;$]){2,*}" you could (safely?) assume it is not ln, fn but multiple authors
drMerry is offline   Reply With Quote
Old 06-13-2011, 09:22 AM   #89
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
I don't believe this plugin is the correct place to start reinterpreting author names as multiple authors. It is crappy metadata, garbage in, garbage out.

I would suggest it is a Quality Check type thing, except the solution is so simple it is not worth bothering Quality Check with, since you can just search authors:;
kiwidude is offline   Reply With Quote
Old 06-13-2011, 04:29 PM   #90
madeinlisboa
Enjoy Life
madeinlisboa began at the beginning.
 
Posts: 24
Karma: 10
Join Date: Jun 2011
Location: Portugal
Device: Kindle
I was looking for this. Thank you
madeinlisboa is offline   Reply With Quote
Reply

Tags
cross library duplicates, in library duplicates

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Generate Cover kiwidude Plugins 489 08-15-2014 09:39 AM
[GUI Plugin] Quality Check kiwidude Plugins 738 08-02-2014 10:06 PM
[GUI Plugin] View Manager kiwidude Plugins 82 08-01-2014 12:37 PM
[GUI Plugin] Open With kiwidude Plugins 228 07-31-2014 01:06 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 02:53 PM.


MobileRead.com is a privately owned, operated and funded community.