Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2011, 11:28 AM   #1
clittle
Zealot
clittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterclittle can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 109
Karma: 12788
Join Date: Nov 2010
Device: Kindle 3
How to merge and eliminate duplicates

Hi,

I have a lot of duplicate books and others where the same book but different formats are listed separately.

Is there any easy way to merge formats and delete duplicates?

Thanks
clittle is offline   Reply With Quote
Old 01-21-2011, 11:30 AM   #2
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Select all entries of one book and right click. There should be several merge options available.
Manichean is offline   Reply With Quote
Old 01-21-2011, 12:05 PM   #3
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,219
Karma: 5940081
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by clittle View Post
Hi,

I have a lot of duplicate books and others where the same book but different formats are listed separately.

Is there any easy way to merge formats and delete duplicates?

Thanks
1) Select the Destination entry FIRST, then select additional entries. M (for merge with delete) or Right-click for more options.
theducks is offline   Reply With Quote
Old 01-21-2011, 02:51 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by theducks View Post
1) Select the Destination entry FIRST, then select additional entries. M (for merge with delete) or Right-click for more options.
That's the manual merge method, which is usually the best. OTOH, if he has lots of these duplicates and all have approximately the same metadata, another option is to do it automatically. To do that he can turn on the autosort/automerge option in Export/Import|Adding Books, then copy the entire library into a new library. This process will check each book as it is copied into the new library and when it finds a book that has the same author and nearly the same title as a book that was previously copied, Calibre will copy the new format into the previous record. This method is not suitable for cases where the author/title differ significantly or where the metadata of the first record is worse than the metadata for later books.
Starson17 is offline   Reply With Quote
Old 02-05-2011, 06:22 PM   #5
whodean
Member
whodean began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Aug 2010
Device: iPad
In the automerge option method you describe, Starson, how would I determine which version would be copied?
whodean is offline   Reply With Quote
Old 02-05-2011, 08:03 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by whodean View Post
In the automerge option method you describe, Starson, how would I determine which version would be copied?
It's automatic. You don't have control. The first version of each format for each book sent into the new library is kept and any duplicates of that format are ignored.
Starson17 is offline   Reply With Quote
Old 02-07-2011, 02:46 AM   #7
Calliastra
Junior Member
Calliastra began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: none
Quote:
Originally Posted by Starson17 View Post
That's the manual merge method, which is usually the best. OTOH, if he has lots of these duplicates and all have approximately the same metadata, another option is to do it automatically. To do that he can turn on the autosort/automerge option in Export/Import|Adding Books, then copy the entire library into a new library. This process will check each book as it is copied into the new library and when it finds a book that has the same author and nearly the same title as a book that was previously copied, Calibre will copy the new format into the previous record. This method is not suitable for cases where the author/title differ significantly or where the metadata of the first record is worse than the metadata for later books.
Starson, I really appreciate you taking the time to help!! I have a similar problem and am not sure how to tackle it. I started out with a large number of ebooks, probably about 12-15K. I imported them into Calibre and now I have almost 40K and loads of duplicates. The problem with going down the list and deleting the one or two (or more!) extras is that the DB is really bogging down. I am a very new user, but am a programmer/software tester so I understand the lingo. Can you give me a short set of instructions and then perhaps I can techwrite them into a more complete help item? From what I've googled up, it looks like this is a common question.

Part of what I am wondering if it would be worth organizing the books properly (author, title, series) or downloading metadata or any other prework that one could do that would make the duplicate matching process more effective or streamlined.

P.S. Happy to help out in testing or other tech stuff as needed too since I am currently out of work.

Last edited by Calliastra; 02-07-2011 at 02:49 AM. Reason: incomplete thought
Calliastra is offline   Reply With Quote
Old 02-07-2011, 03:27 AM   #8
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Calliastra View Post
Starson, I really appreciate you taking the time to help!! I have a similar problem and am not sure how to tackle it. I started out with a large number of ebooks, probably about 12-15K. I imported them into Calibre and now I have almost 40K and loads of duplicates. The problem with going down the list and deleting the one or two (or more!) extras is that the DB is really bogging down. I am a very new user, but am a programmer/software tester so I understand the lingo. Can you give me a short set of instructions and then perhaps I can techwrite them into a more complete help item? From what I've googled up, it looks like this is a common question.
I will butt in since this is a topic of great interest to me currently. Firstly, have you read the Duplicate Detection thread in this forum? That discusses some changes and additions to Calibre we are in the process of making. Feedback on that thread as to what sounds useful or not is always welcomed (particularly as the plugin which will "find" duplicates has not been written yet and there's a few ways we can approach it).

As to "instructions", from a Calibre perspective Starson has given you what you need to do if you decide to try that approach. You just need to be aware of the implications:
- It will only find duplicates where the authors exactly match. There is no "fuzzy matching" on authors.
- You really have very little control over which version will be kept if you have duplicates of a format. As Starson says above it is done by order of "selection" - but if you are doing a bulk library all at once that "selection order" won't mean too much. You could maybe sort by date or something but unless you investigate each book one by one you won't know which version to keep and it could be pot luck. And if you were doing it one by one controlling selections, you wouldn't need Starson's approach and would just use Merge instead
Quote:
Part of what I am wondering if it would be worth organizing the books properly (author, title, series) or downloading metadata or any other prework that one could do that would make the duplicate matching process more effective or streamlined.
There's a few other threads in the forum if you look around at approaches people have taken. At the moment I have my own tool outside of Calibre that does fuzzy matches of authors and/or titles, doing direct sql queries against the Calibre database. Other people have their own tools/scripts, some of which were made available. Hopefully we will have a Calibre plugin soon (I've offered to write it but anyone is welcome to beat me to it), but we need to make decisions about it before I start and that discussion should be kept to the other thread.

Certainly the 1.0 version may "only" have the exact same comparison logic Starson's automerge functionality has - of exact match on author, fuzzy on title. In which case in terms of cleanup preparation getting any author dups sorted is going to greatly increase the success of any dup search on top. If you dont want to resort to sql, just use the tag browser on the left to look down your authors list and with it's alphabetical sorting you can hopefully spot a lot of the common issues like typos, initials, spacings, abbreviations of names etc. Stuff like "E.E.Doc Smith", "E. E. 'Doc' Smith", "E. E. Smith" etc etc - rename the "wrong" author variations and get them down to one.

Then at least if you decide to try Starson's described method above (not caring for instance about which EPUB to keep if you have two of them) you are in the best position to do so.

That's my 2p for what its worth.
kiwidude is offline   Reply With Quote
Old 02-07-2011, 02:22 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
I will butt in since this is a topic of great interest to me currently.
As far as I'm concerned - your help is always appreciated.
Quote:
Firstly, have you read the Duplicate Detection thread in this forum? That discusses some changes and additions to Calibre we are in the process of making. Feedback on that thread as to what sounds useful or not is always welcomed (particularly as the plugin which will "find" duplicates has not been written yet and there's a few ways we can approach it).
Yes, comments are welcome. The new automerge code is now in Kovid's hands, but he'll always consider improvements.
Quote:
As to "instructions", from a Calibre perspective Starson has given you what you need to do if you decide to try that approach. You just need to be aware of the implications:
- It will only find duplicates where the authors exactly match. There is no "fuzzy matching" on authors.
Correct.
Quote:
- You really have very little control over which version will be kept if you have duplicates of a format.
I'm not completely sure how much control you have for Copy to Library. We'd have to test it or ask Kovid. I suspect, however, that the order of the processing for Copy to Library will be according to the sorted selection order. As each selected book is processed, each format for that book is compared to the contents of the new library being constructed by the CTL code. Merging is done with the automerge code (assuming automerge is on). In that case, the first book the CTL code handles will be the final book. (Unlike the manual Merge cod I wrote, automerge has never merged metadata.)

Note that the new code in the linked thread only applies to automerge of incoming books. I did not replicate that code to automerge for Copy to Library. My suggestion to Calliastra (the OP) was to consider using Copy to Library. (In that scenario, there is always only one "identical book" since automerge in it's current form silently ignores duplicate formats. It's guaranteed to merge up all books with identical titles and similar titles. It can't currently make duplicate records with automerge on.

Quote:
As Starson says above it is done by order of "selection" - but if you are doing a bulk library all at once that "selection order" won't mean too much. You could maybe sort by date or something but unless you investigate each book one by one you won't know which version to keep and it could be pot luck.
Yes.

Quote:
There's a few other threads in the forum if you look around at approaches people have taken. At the moment I have my own tool outside of Calibre that does fuzzy matches of authors and/or titles, doing direct sql queries against the Calibre database. Other people have their own tools/scripts, some of which were made available. Hopefully we will have a Calibre plugin soon (I've offered to write it but anyone is welcome to beat me to it), but we need to make decisions about it before I start and that discussion should be kept to the other thread.
I assume you've seen the other duplicate finding threads I contributed to. They all have simple SQL queries for duplicate titles or duplicate author/title

Quote:
Certainly the 1.0 version may "only" have the exact same comparison logic Starson's automerge functionality has - of exact match on author, fuzzy on title.
That's fine by me, but remember that the reason I didn't fuzzy match more aggressively is that automerge is AUTOmatic. An error in automerge meant a lost book format. If we are doing duplicate finding with manual review on a dialog screen with merging manually controlled to find the best format or not merge, we can be much more aggressive.
Starson17 is offline   Reply With Quote
Old 02-07-2011, 05:33 PM   #10
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,230
Karma: 1345754
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Starson17 View Post
That's fine by me, but remember that the reason I didn't fuzzy match more aggressively is that automerge is AUTOmatic. An error in automerge meant a lost book format. If we are doing duplicate finding with manual review on a dialog screen with merging manually controlled to find the best format or not merge, we can be much more aggressive.
I'll reply to this in detail on the Duplicate Detection thread as to why I suggested it may be this way in the first release. There are some reasonings for my comment that I would like feedback on, but I don't want to split the discussion across threads. I absolutely agree that "fuzzier" searches on both author and title should be in at least its longer term goals. Whether such additions are required for the first version of the plugin is open for debate imho.

Last edited by kiwidude; 02-07-2011 at 08:11 PM. Reason: Linked to other response
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic Eliminate margins primetime34 Barnes & Noble NOOK 6 12-26-2010 11:07 PM
Merge feature request (different merge) Tarran Calibre 1 05-24-2010 11:57 AM
eliminate iphone glare scottjl Apple Devices 2 04-29-2010 11:05 PM
How to eliminate blank lines between paragraphs with Calibre Mr. Goodbar Calibre 8 06-02-2008 08:39 AM
utility to eliminate unwanted line breaks in txt profnachos Workshop 11 11-27-2007 07:24 PM


All times are GMT -4. The time now is 07:12 PM.


MobileRead.com is a privately owned, operated and funded community.