Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 07-02-2011, 12:19 PM   #1
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,465
Karma: 986493
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Get Books showing if result is in library

I want to add the ability for results in Get Books to be marked if they are already in the users library. The idea being if a user searches for an author they can at a glance see what books they already have by the author.

My first thought is to use the comparison features of the Find Duplicates plugin. However, I'm concerned about performance and I'm also unsure about the best way to do this. The big thing that makes me think I need to use Find Duplicates is the fuzzy matching.

My initial thought is to do a soundex search over each book in the users library using author and title each time Get Books is opened. Then store a hash of the title and authors. Then compute the hash for each result as they come in and check if there is an existing hash from the library.

What I'm looking for is advice on the best way to go about this?
user_none is offline   Reply With Quote
Old 07-02-2011, 12:27 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,323
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
How fuzzy do you want to get? There is a find_identical_books() method in db2 that does a little bit of fuzzy matching.

But yeah the performance of this on larger libraries (where it is more useful) will be bad enough that you will likely need to run it in a separate thread like the drm status.
kovidgoyal is offline   Reply With Quote
 
Advertisement
Old 07-02-2011, 12:32 PM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,465
Karma: 986493
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by kovidgoyal View Post
How fuzzy do you want to get?
Fuzzy enough that these match:

Code:
Toll the Hounds (Malazan Book of the Fallen Series #8) by Steven Erikson

Toll the hounds by Steven Erikson

and

The Hobbit by J RR Tolkien

Hobbit by J. R. R. Tolkien
I'll look at find_identical_books() in db2 and see if it matches enough tests. I know it can't be 100% accurate and it can't be too fuzzy.

I can easily run in a thread and have the result appear dynamically like covers and additional info does.
user_none is offline   Reply With Quote
Old 07-02-2011, 12:49 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,323
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sounds like you'll need to use find duplicates in that case. Since find duplicates is supposed to be merged into the calibre code base anyway, I'd suggest you go ahead and do that (if it is ok with kiwidude).
kovidgoyal is offline   Reply With Quote
Old 07-02-2011, 12:58 PM   #5
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
I had replied to user_none with a PM while I was on holiday - if he or someone else is willing to do the work to merge the Find Duplicates plugin then please go for it. It isn't a high priority to do so for me given it is stable as is. As my available Calibre dev time is going to continue to be significantly reduced for at least the next little while I would rather spend the little I have doing any fixes for my other plugins...
kiwidude is offline   Reply With Quote
Old 07-05-2011, 12:53 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by user_none View Post
Fuzzy enough that these match:
Code:
Toll the Hounds (Malazan Book of the Fallen Series #8) by Steven Erikson
Toll the hounds by Steven Erikson

and

The Hobbit by J RR Tolkien
Hobbit by J. R. R. Tolkien
I'll look at find_identical_books() in db2 and see if it matches enough tests. I know it can't be 100% accurate and it can't be too fuzzy.

I can easily run in a thread and have the result appear dynamically like covers and additional info does.
find_identical_books() is mine (from AutoMerge and Copy To Library), and predates kiwidude's Find Duplicates plugin by a lot. It won't match either of those. It will fail on the first due to the content of the parenthetical in the title. The case differences wouldn't be a problem. It will fail on the second due to the differences in the author name. Periods and other punctuation gets stripped from titles, but not author names. The "The" would get stripped for those who haven't Tweaked their indefinite articles to another language.

It looks like merging kiwidude's code is your best option to have those match.

As you do so, think about whether kiwidude's code could/should be combined with find_identical_books. Perhaps the user should get an option to control how aggressive the automated duplicate finding should be for AutoMerge, Copy To Library and Get Books?
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
books not showing in library woodmansnow Calibre 2 05-14-2011 07:54 PM
Sideloaded books not showing up in Library (again) jhempel24 Nook Color & Nook Tablet 0 05-11-2011 02:52 AM
Sideloaded books not showing up in Library jhempel24 Nook Color & Nook Tablet 4 04-25-2011 12:42 PM
Help! Sideloaded books now not showing up in Library after upgrade to 1.0.1 leday Nook Color & Nook Tablet 16 01-19-2011 09:42 PM
Calibre not showing library changes costas Calibre 3 11-18-2010 03:58 AM


All times are GMT -4. The time now is 09:48 PM.


MobileRead.com is a privately owned, operated and funded community.