Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-23-2011, 05:51 AM   #1
Philosopher
Enthusiast
Philosopher began at the beginning.
 
Philosopher's Avatar
 
Posts: 43
Karma: 12
Join Date: Jun 2010
Device: Kindle
Lightbulb Duplicate Detection

I have a library of over 30,000 and growing. However I have a large number of duplicate files - both input and in folders waiting to input.

The problem I find is that the duplicate process - the pause and message about adding them - really does little. It seems to detect by title.

Yet when I import in a large number of the books come in with a title I have to correct and have thus the same title. So I can't eliminate duplicates that way.

Also I have many copies of the same book in different formats or different versions or different quality. So that too makes it useless.

I wonder if it would be possible for someone to develop (its out of my league) a really useful duplicate finder either to use on import or even to use afterwards.

It should check for title, author, filetype, isbn, and (most importantly) file size - to see if the two really are the same.

Better yet can't it check the actual file and compare to see if the two files are identical? I know this is possible but would have no idea how to program it into the program.

Then, in my ideal world, it would call up a list - it could be in the main window - and there should be a check box to choose which of the files you want to remove as a duplicate.

Just brainstorming and two other feature possibilities would be to (a) have the rows alternate shading to distinguish each set of books - so all with the same title would be the same shade but the next in the list would alternate - making it easier to quickly determine the dups. OR (b) having instead of a list with a check box - a dialogue box that goes through each suspected set of duplicates - presenting the list of only that book (suspected) with a check box on which to remove.

I think checking which to remove rather than which to keep would be safer - this way accidentally you could avoid removing all of them. Although checking one to keep - if there were several - would be quicker.

Just a thought - I think this would be a real enhancement to the program - and I am not sure if anyone out there actually finds the existing duplicate check very useful.

Philosopher is offline   Reply With Quote
Old 01-23-2011, 06:38 AM   #2
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
If you do a search you will find plenty of other threads here discussing the problems, existing behaviour, workarounds, sql reporting etc. I won't rehash it all here. I will say that Calibre does not match just on title - it is title and author, and there is a little bit of "fuzziness" in terms of things like leading "The" etc improving the match logic.

However I 100% agree that if like me you turn the preference on so that all books added will merge automatically (which is what you want for new formats of a book to be the same record in Calibre), then it does NOT handle the situation of the same format being added very well. The existing dialog telling you "after the event" that it "merged something" without even telling you which format it threw away is as you say not very useful.

It's on the list to be improved further I believe but as I understand it there are other priorities first.
kiwidude is offline   Reply With Quote
Old 01-23-2011, 07:53 AM   #3
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,596
Karma: 1065018
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Sony T2, Kindle Fire HDX 8.9 , Kindle for PC
i don't suffer from this problem, but if I did then I'd start with windows utilities & freeware fiile duplicate finders.

something as simple as a windows search of calibre library foir *.epub, then sort results on file size & eyeball the list, would show up most dups. zap them & then use library repair in calibre. with 30,00 books it should have done long before collection got that big but use other search filters & work your way though systematically. or go straight to a good utility program - where you have options to match on file attributes, file content or both

a match on file size , and file extension + a related file name should be enough for manually spotting a duplicate ebooks. most utilities will auto delete fully matching duplicates according to some rule ( like keep oldest, or keep newest )

for a merge - I'd run a windows duplicate files finder utility before adding the folder( to calibre, then zap dups from the "to be added" folder - then do the merge.

PS the dup finder is built into some boost speed / defrag packages -e.g. there is one is auslogics boost speed which free trail ware for 30 days - long enough to get the job done!

Last edited by cybmole; 01-23-2011 at 07:58 AM.
cybmole is offline   Reply With Quote
Old 01-23-2011, 08:07 AM   #4
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
I'd very much urge people not to delete files from outside Calibre. Yes, there are things like database repair, but why damage it intentionally in the first place?
Manichean is offline   Reply With Quote
Old 01-23-2011, 08:59 AM   #5
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,596
Karma: 1065018
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Sony T2, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by Manichean View Post
I'd very much urge people not to delete files from outside Calibre. Yes, there are things like database repair, but why damage it intentionally in the first place?
fair point

a better approach may be to create a new library of " good = non dup'd" books and gradually transfer the entire collection from old library to new, weeding out duplicates as you go

or dump the entire library out via save to disc -then let rip with duplicate finders on that folder. then add the results back into a newly created calibre library. once you are OK with that having worked, ditch the old library. run the main processes overnight, for 30k books!

there is no easy way for a program to "inspect" several alternative versions of a book & tell you which one to keep - you could take a view that smaller file size is more likely to have stuff missing - but that is no guide to quality of formatting , lack of typos etc.

personally, I never bulk -add to an existing library.
if I come across i big collection of books that I "may" want to read someday I keep that outside of calibre as those collections often contain rubbish quality conersions.

with bulk -add, surely there is a risk that calibre merges a poorer quality version into where you used to have a better one ?
cybmole is offline   Reply With Quote
Old 01-23-2011, 09:10 AM   #6
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,596
Karma: 1065018
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Sony T2, Kindle Fire HDX 8.9 , Kindle for PC
ps isn't having over 30k books a tad extreme.

reading at 3 complete books per day, t'would take you over 30 years to read them all, and that's assuming you have no interest in reading anything that gets published in the next 30 years !

seems as daft as those folks who insist on putting 3000+ books onto their new Kindles then bitch about battery life & primitive collection management facilities - why would any sensible person do that...

now I will confess to having several thousand MP3 files but those songs have all been listened to, once at least. Several thousand books makes far less sense ???

Last edited by cybmole; 01-23-2011 at 09:12 AM.
cybmole is offline   Reply With Quote
Old 01-23-2011, 03:22 PM   #7
ApK
What did you call me?
ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.
 
Posts: 4,047
Karma: 33514047
Join Date: Feb 2010
Location: NJ, USA
Device: Kindle
Some people like choices, cybmole.

My entire library is only a thousand or so ebooks, but if it were 3 or 4 thousand, it would sure be neat to have them all with me whenever wanted one.

And I doubt it's a matter of needing to read all 30,000 books...it's a matter of having the ONE book you want when you want it.
ApK is offline   Reply With Quote
Old 01-23-2011, 04:20 PM   #8
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,596
Karma: 1065018
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Sony T2, Kindle Fire HDX 8.9 , Kindle for PC
Quote:
Originally Posted by ApK View Post
Some people like choices, cybmole.
...

And I doubt it's a matter of needing to read all 30,000 books...it's a matter of having the ONE book you want when you want it.
duh - that's why we have google books, & internet, & wi-fi ....

PS a personal stash of 30,000 books surely aint legal - let's be generous & say that half of them are free books from project gutenburg ( though why d/l them all as they are on the net forever already... ) , value the other 15k at say average Kindle book prices - well do the math....

If I'd spent that much on books I'd want them in a vault, plus a damn good home insurance policy, not just stored on a fragile hard drive!
cybmole is offline   Reply With Quote
Old 01-23-2011, 04:50 PM   #9
Philosopher
Enthusiast
Philosopher began at the beginning.
 
Philosopher's Avatar
 
Posts: 43
Karma: 12
Join Date: Jun 2010
Device: Kindle
I am a scholar - and so I use many books without necessarily reading them all serially. I have also spent years scanning in my books for a number of reasons in how I do my research.

My physical library is well over 10,000 books - and then I have scanned a large number of old texts from research libraries - so its not hard to get into that range.
Philosopher is offline   Reply With Quote
Old 01-23-2011, 04:51 PM   #10
Philosopher
Enthusiast
Philosopher began at the beginning.
 
Philosopher's Avatar
 
Posts: 43
Karma: 12
Join Date: Jun 2010
Device: Kindle
Also "books" is a loose term - because it also includes articles in pdf form.
Philosopher is offline   Reply With Quote
Old 01-23-2011, 05:07 PM   #11
DuskyRose
Evangelist
DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.DuskyRose ought to be getting tired of karma fortunes by now.
 
DuskyRose's Avatar
 
Posts: 443
Karma: 1364042
Join Date: Apr 2007
Location: Virginia
Device: Sony PRS-T1, -T2, Kindle PW, Nook Glow, Kobo Aura
Quote:
Originally Posted by Philosopher View Post
Also "books" is a loose term - because it also includes articles in pdf form.
Also, it could include fanfic.

I try to be more precise about calling them 'files' if I think it might make a difference, but I think of fanfic stories as "Books" because I treat them the same way in Calibre as I do my regular books. They all get titles, series authors and such, and can very in length from a 1 page drabble or pwp, or they could be full book sized in page count.

And with the programs that will take stories from places like fanfic.net and pull the story off into one HTML file. (So I don't have to cut and paste lots of chapters together.) They can add up quickly. The fandom I'm reading now can post anywhere from none to about 30 new stories a day, various sizes. And I read on average about 20 a night, depending on size and when I fall asleep.

I've been collecting various fanfic for years. I've probably got 16,000 easily. Just found a new fandom a couple of months ago, and have 1,476 last count in just a couple of months. A good chunk I've read already.

And I've pulled in a ton of free books from book stores.

So 'books' can be a very loose term. Especially since they're all basically treated the same in Calibre.
DuskyRose is offline   Reply With Quote
Old 01-23-2011, 06:56 PM   #12
ApK
What did you call me?
ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.ApK ought to be getting tired of karma fortunes by now.
 
Posts: 4,047
Karma: 33514047
Join Date: Feb 2010
Location: NJ, USA
Device: Kindle
Even my mere 1000 file library contains a lot of readme-type text files and instruction manuals. I call them all ebooks cuz I convert them all to mobi in Calibre for my Kindle, and cuz, well, this is an ebook forum.

Quote:
Originally Posted by cybmole View Post
duh - that's why we have google books, & internet, & wi-fi ....
You surely are a modern child if you trust The Cloud to always be available for you. I don't.

Besides, after the fall of civilization, I can hook a solar cell up to my Kindle and keep reading, like Burgess Meridith in that Twilight Zone episode....

Last edited by ApK; 01-23-2011 at 06:58 PM.
ApK is offline   Reply With Quote
Old 01-23-2011, 11:56 PM   #13
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,769
Karma: 12516053
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by kiwidude View Post
However I 100% agree that if like me you turn the preference on so that all books added will merge automatically (which is what you want for new formats of a book to be the same record in Calibre), then it does NOT handle the situation of the same format being added very well. The existing dialog telling you "after the event" that it "merged something" without even telling you which format it threw away is as you say not very useful.
I'm confused, what do you mean by which format it threw away? It will always retain the format already in calibre when automatically merging. I guess I'm confused because, with a few exceptions, I only keep ePubs in my library and don't see this often. When I do see it I know that it left the version in the library alone and dropped any book I was trying to add.
DoctorOhh is offline   Reply With Quote
Old 01-24-2011, 01:05 AM   #14
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by dwanthny View Post
I'm confused, what do you mean by which format it threw away?
In the situation of doing bulk adds where you have multiple formats for a book, some of which overlap with what you have in your library. I tend to add multiple formats as they may have come from different sources so one may be better to convert than the others. What the dialog only tells you is which titles it threw formats away for, not the formats that were duplicated.

So unless you do your books one by one and take note of what formats you had before you added, it is not easy to tell which formats need the manual investigation step to re-add/merge.

Basically it is a mess, and yes sticking to only a single format like EPUB as you do avoids it nicely if your EPUB sources are good enough
kiwidude is offline   Reply With Quote
Old 01-24-2011, 01:26 AM   #15
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,769
Karma: 12516053
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by kiwidude View Post
What the dialog only tells you is which titles it threw formats away for, not the formats that were duplicated.
True it adds the first x format it finds and rejects the rest. It does make things difficult.

Quote:
Originally Posted by kiwidude View Post
Basically it is a mess, and yes sticking to only a single format like EPUB as you do avoids it nicely if your EPUB sources are good enough
The only thing that avoids it for me is only adding 1 series or author at a time. I usually do a quick check of quality before I add any format to calibre. I then massage the covers, metadata and convert all the added books to epub. If I find a dud I search my book pile of xxxgigs to find a better source.

For the most part books recommended here are always at the top of my add list. This procedure isn't as ambitious as yours but it keeps me off the streets and out of trouble.
DoctorOhh is offline   Reply With Quote
Reply

Tags
duplicate

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Detection albill Calibre 2 10-26-2010 02:21 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
Device Detection doom Alberto Franches Calibre 6 06-24-2010 05:38 PM
Device detection? totanus ePub 1 12-17-2009 07:05 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 04:24 AM.


MobileRead.com is a privately owned, operated and funded community.