Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-24-2011, 02:16 AM   #16
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Yeah, sounds like you have a plan. I think the only way to retain sanity is to add books in a very disciplined fashion such as by author or series as you suggest.

My original plan was to just "get everything in there" and then clean it up. Unfortunately I keep getting distracted by writing plugins etc so my backlog keeps growing and I am betwixt and between...

I think I need to change my approach - I've written tools that do a lot of preprocessing outside of Calibre to workround the issues somewhat but it's really only bandaids and delays the inevitable. I think rather than the goal of getting everything into Calibre first, I will just start afresh with a new library and do author by author starting with the ones most likely to be read first. I'll obviously still have an enormous duplicated mish-mash mess of books for everything not yet processed, but I had that anyway before I found Calibre
kiwidude is offline   Reply With Quote
Old 01-24-2011, 09:52 AM   #17
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
If you do a search you will find plenty of other threads here discussing the problems, existing behaviour, workarounds, sql reporting etc. I won't rehash it all here.
Yes, there are lots of workarounds for specific issues and there is general agreement that improvements can be made. One problem is that this issue arises primarily during the initial import stage, when lots of books are being imported. Any developers in that first stage add fixes to solve teh wrost problems, but then move on to other things as their backlog of books to be added drops.

Quote:
I will say that Calibre does not match just on title - it is title and author, and there is a little bit of "fuzziness" in terms of things like leading "The" etc improving the match logic.
When the autosort/automerge option is on, you're correct (that's mostly my code), but when it's off, the initial post was right - Calibre looks only at title. In the autosort/automerge mode - ("similar author/title found" option in Prefs|Import/Export|Adding Books) it adds the first unique format found for any author/title combo, then skips any duplicate fuzzy matched author/title formats.

Quote:
However I 100% agree that if like me you turn the preference on so that all books added will merge automatically (which is what you want for new formats of a book to be the same record in Calibre), then it does NOT handle the situation of the same format being added very well. The existing dialog telling you "after the event" that it "merged something" without even telling you which format it threw away is as you say not very useful.
Smile when you say that pardnuh! That's my code! Actually, the notification is Kovid's code - mine was worse - it just dumped any duplicates with no warning. It was written to solve a very specific problem. I wanted to add my existing library and get one entry for each author/title book, and keep one of each format for that book. I really had no idea which of any duplicate formats was best, and insufficient time to look at every one. The problem was that the existing duplicate detection simply compared titles (and it still does if my autosort/automerge option is off). If titles matched, even for different formats and different authors, it asked if you wanted to add as separate entries. There wasn't even a manual merge function then. So adding the autosort/automerge option got me what I wanted - a structure of all unique author/title (fuzzy title matched) books where one of every format was kept for each book.

In my case, I usually had only one really good master format and most of my duplicates were converted from that master. The master format would always be added to Calibre.

My plan was to worry about the "best" format later. If I was unhappy with a format when I went to read it, I could look to see if I had a better one that was skipped during the import. Usually I would find one good master format in the record and could use Calibre's excellent conversion capabilities to get a copy that was even better than whatever had been skipped.

Quote:
It's on the list to be improved further I believe but as I understand it there are other priorities first.
Yes, but there's plenty of room for others to dive in and improve it now. For me, all of my original book library is in, and I now add only a few books a month. I don't need the bulk importing code that I spent so much time writing
Starson17 is offline   Reply With Quote
Old 01-24-2011, 11:18 AM   #18
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,070
Karma: 777825
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I find that as I am past the bulk input stage, one of the commonest cases for encountering a duplciae is that I add a newer (and normally better) version of a book. I would certainly like an easier way of saying it is the NEWEST one that is important and over-write the existing copy. At the moment I find I have to go through the Edit Metadata route to achieve this (or add as a duplicate and then merge) - and cannot simply do it via the standard Add Books route.

Hopefully this will be one of the issues that will be kept in mind if a more effective dialog on what to dump and what to keep is developed.
itimpi is offline   Reply With Quote
Old 01-24-2011, 12:23 PM   #19
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by itimpi View Post
I find that as I am past the bulk input stage, one of the commonest cases for encountering a duplciae is that I add a newer (and normally better) version of a book. I would certainly like an easier way of saying it is the NEWEST one that is important and over-write the existing copy. At the moment I find I have to go through the Edit Metadata route to achieve this (or add as a duplicate and then merge) - and cannot simply do it via the standard Add Books route.

Hopefully this will be one of the issues that will be kept in mind if a more effective dialog on what to dump and what to keep is developed.
When I wrote the autosort/automerge code I configured it so that new formats automatically overwrote older ones. During my bulk import stage, I really had no preference for new/old, and during later stages, I only added better copies. I found it convenient to improve a format, or locate a better one, outside of Calibre, then just drag it over and have it replace the existing one.

I was aware that my default of overwrite could cause loss of book formats you don't want overwritten. There was risk of an error by the code that identifies author/title, etc. of matching books, and pointed it out to Kovid when I uploaded the source, but for me, it just worked better, even with that risk (I always keep my source).

Kovid felt that the standard philosophy of Calibre is to always default to the lower risk option - in this case that option was to not overwrite existing formats. I understood why he preferred that. For a period of time, I ran custom code with my reverse default, which worked better in my work flow. I thought about offering an option switch or tweak to "default overwrite" to control the option switch to "autosort/automerge", but there are already so many options ...
Starson17 is offline   Reply With Quote
Old 01-24-2011, 01:38 PM   #20
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,070
Karma: 777825
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I understand Kovid's point of view about being paranoid about unexpected data loss. I was just making the point that if someone is designing/working on a replacement dialog or feature the reverse use case where newest wins is not that uncommon. The trick will be to come up with a user-friendly way of offering the user control without, as you say, an overwhelming number of choices.

If I saw a simple solution I would already be suggesting it . Maybe a user preference with a simple way of toggling it might be the answer as one tends to be working in one mode during bulk import and another in daily library maintenance.
itimpi is offline   Reply With Quote
Old 01-24-2011, 01:47 PM   #21
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by itimpi View Post
I understand Kovid's point of view about being paranoid about unexpected data loss.
So did I. I'd rather deal with a complaint that the interface is inconvenient for overwriting, than one that complains all their books just got wiped out!

Quote:
I was just making the point that if someone is designing/working on a replacement dialog or feature the reverse use case where newest wins is not that uncommon. The trick will be to come up with a user-friendly way of offering the user control without, as you say, an overwhelming number of choices.

If I saw a simple solution I would already be suggesting it . Maybe a user preference with a simple way of toggling it might be the answer as one tends to be working in one mode during bulk import and another in daily library maintenance.
I'm glad you made the point. I'm not sure if/when I might do some more work in that area, but I agree that it's worth keeping in mind that the reverse default is very useful even if there is some increased risk to that setting. It's not like Calibre is 100% risk free (thinking of the very useful, but risky S&R feature).
Starson17 is offline   Reply With Quote
Old 01-24-2011, 06:08 PM   #22
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Starson17 View Post
When the autosort/automerge option is on, you're correct (that's mostly my code), but when it's off, the initial post was right - Calibre looks only at title.
Oops - my bad, thanks for correcting me. I always have automerge on so "just assumed" the matching logic was the same regardless of the setting. You know what they say about ass-u-me...
Quote:
Originally Posted by Starson17 View Post
Smile when you say that pardnuh! That's my code! Actually, the notification is Kovid's code - mine was worse - it just dumped any duplicates with no warning. It was written to solve a very specific problem.
Sorry if I caused any offense, it certainly wasn't my intent - as you and I have debated this topic and the history in detail in other threads I just figured you would know where I was coming from. I was trying probably unsuccessfully to walk a balance between not getting into too much detail (having suggested to the OP to look at some other threads), but sympathising with what I took to be one of his points of dissatisfaction with the dialog. In a later response on this thread I clarified the scenario in which I find the dialog "least helpful" The matching logic which you did is great, it's just the handling of same format merges I would like some additions to.

Quote:
Originally Posted by Starson17 View Post
Yes, but there's plenty of room for others to dive in and improve it now.
Well to be honest in the last thread I discussed this with you my reading of the responses was that everyone was waiting for a new merge dialog. I have a number of suggestions/ideas for possible approaches and even some slightly better Python/Qt skills to be able to contribute. I figured there was no point in pursuing it further if there was no real interest in some "partial" solutions even if they do address the main issues I face. I completely understand your lack of motivation to be too involved given you no longer have a need to do bulk imports yourself
kiwidude is offline   Reply With Quote
Old 01-25-2011, 09:29 AM   #23
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
Oops - my bad, thanks for correcting me. I always have automerge on so "just assumed" the matching logic was the same regardless of the setting. You know what they say about ass-u-me...
I've done the same many times - run with a certain option on, and forgotten what happens with it off. Or I've made the alternative error of remembering an older limitation and forgotten that the rapid development of Calibre has already fixed it. I just thought I'd set the record straight for those who are reading this.

Quote:
Sorry if I caused any offense
I was not offended in the least - I should have put a bigger smiley on that. I completely agree with your comments.
Quote:
you and I have debated this topic and the history in detail in other threads I just figured you would know where I was coming from.
I did.
Quote:
I was trying probably unsuccessfully to walk a balance between not getting into too much detail (having suggested to the OP to look at some other threads), but sympathising with what I took to be one of his points of dissatisfaction with the dialog. In a later response on this thread I clarified the scenario in which I find the dialog "least helpful" The matching logic which you did is great, it's just the handling of same format merges I would like some additions to.
Did you notice the tiny change I made a while back to add a "Formats only manual merge"? That addition came from your comments. It allows people to keep a perfect metadata record and merge formats into it without also bringing in any additional tags, comments, etc.

Quote:
Well to be honest in the last thread I discussed this with you my reading of the responses was that everyone was waiting for a new merge dialog. I have a number of suggestions/ideas for possible approaches and even some slightly better Python/Qt skills to be able to contribute. I figured there was no point in pursuing it further if there was no real interest in some "partial" solutions even if they do address the main issues I face. I completely understand your lack of motivation to be too involved given you no longer have a need to do bulk imports yourself
I wrote a moderately long post urging you to feel free to write any partial solution you are interested in. Somehow it was lost before it got posted. I didn't want you to feel you were treading on my toes by changing/improving that area of code. There are several things standing in my way of doing much work there. The first is that I've got less time now. The second is that I'm beyond the bulk adding stage. The third is that Kovid has the fine-grain control of metadata downloading on the ToDo list. I suspect that it would help make things consistent if I wait for that dialog to be written and then hijack it to provide fine grained control over merging and importing of metadata. Merging metadata from an online source is very similar to merging it from another Calibre book record or during the import of a new book.

That said, however, it's quite possible that work is in the far future, and/or that it won't be all that useful, so if you are motivated to provide a partial solution to problems you face - feel free to work on that area and submit it to Kovid. We are in agreement that there's room for improvement there.
Starson17 is offline   Reply With Quote
Old 01-25-2011, 11:21 AM   #24
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,121
Karma: 5101571
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just so you know I'm working on a refactoring of the metadata download system right now, which will likely include a way to specify what metadata should be merged, both from individual sources and overall.
kovidgoyal is offline   Reply With Quote
Old 01-25-2011, 07:03 PM   #25
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Starson17 View Post
Did you notice the tiny change I made a while back to add a "Formats only manual merge"? That addition came from your comments. It allows people to keep a perfect metadata record and merge formats into it without also bringing in any additional tags, comments, etc.
I did indeed, many thanks for that, it certainly helps with the issue of merging duplicates once they are in Calibre and is one less local Calibre patch I can undo. However unless you intentionally/accidentally give files different names (or turn off automerge) you won't have two versions of the format to merge because the automerge has already thrown the latest one away

Appreciate the update from Kovid on the merge stuff etc, look forward to the results in future.

There were a couple of ideas I had a while ago in the absence of a full-on dialog. I think the dialog will be great for when you have multiple book records in Calibre and want to merge them.

However I am curious to see where it's involvement may be in the processing of actually adding books. IMHO I am not sure I *always* want to be interactively prompted when doing bulk adds. Importing can already be a fairly involved and time consuming process of cleaning up filenames, adding certain subdirectories of files, separating html folder imports of one per folder from multiple books per folder to use different add menus, deleting your input folders once in Calibre etc. If you were adding a lot of books with a lot of duplicates, any interactive dialog forcing you to make choices then and there might not be practical given how time consuming it can be to open each version up and decide a "winner".

Now if you are only adding a single or small number of books, an interactive choice might be desirable - don't put off until tomorrow what can be done today and all that.

But what if you need to stop/do something else partway through? What does Calibre do with all the "unresolved" conflicts? Any kind of "abort" in the process can leave you with a messy mish-mash of partially imported books from a subfolder tree, and an absolute nightmare to "continue on" from.

So one approach to this which I briefly mentioned in a previous thread would be an additional option for the automerge. Currently when you turn it on, any new formats for an existing book get merged, and any duplicate formats get thrown away. What I would like is the same behaviour for new formats, but that duplicate formats get created as a new book entry in Calibre, and that the two books then get marked as being duplicates. For instance just add a "Duplicate" tag to both entries.

Then on my rainy day when I finally get around to cleaning up my Calibre entries I can just do a search for the "Duplicate" tag. Sort by author/title to see the conflicts I need to resolve and go through a review/merge process in my own time with them. That way I have both formats safely stored in Calibre, can safely delete my source folders and can continue adding stuff in bulk. Additional duplications of the same format would create further "Duplicate" tagged books in Calibre.

Just random rambling thoughts. As I said in a previous post I think I am going to have to start again with a fresh library and change the way I add books as the way Calibre handles this "today" isn't quite working for me. An additional option such as I suggested above to the automerge would dramatically improve things. Then the final icing would be an addition to your "merge formats only" menu option to popup a dialog in the case of conflicts of formats, allows me to launch viewers for each duplicate format (a side-by-side mode in ebook-viewer.exe would be amazing but that's a pipe dream), select/tick which versions to keep, I remove the "Duplicate" tag and job done...

Last edited by kiwidude; 01-25-2011 at 07:13 PM. Reason: typos
kiwidude is offline   Reply With Quote
Old 01-26-2011, 10:16 AM   #26
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
I did indeed, many thanks for that, it certainly helps with the issue of merging duplicates once they are in Calibre and is one less local Calibre patch I can undo. However unless you intentionally/accidentally give files different names (or turn off automerge) you won't have two versions of the format to merge because the automerge has already thrown the latest one away
Yes, it's mostly useful to fix mistakes where automerge was off.

Quote:
IMHO I am not sure I *always* want to be interactively prompted when doing bulk adds.
There's always that balance between control and complexity. I'll also be interested to see what Kovid comes up with.
Quote:
Importing can already be a fairly involved and time consuming process of cleaning up filenames, adding certain subdirectories of files, separating html folder imports of one per folder from multiple books per folder to use different add menus, deleting your input folders once in Calibre etc. If you were adding a lot of books with a lot of duplicates, any interactive dialog forcing you to make choices then and there might not be practical given how time consuming it can be to open each version up and decide a "winner".
This mirrors my thoughts on the original automerge design. Prior to that I was getting questions about whether I wanted duplicate book titles added, and the question was asked without regard to whether the authors were the same. I wanted to answer "yes" for duplicate book titles with different authors, "no" for (nearly) duplicate titles with the same author, but if it was a new format for that author/title, add the new format to the existing record. I didn't want to have to answer a separate question for every case or look at book content. Automerge was basically just a way to automatically answer the "Do you want to add duplicate titles?" question the way I wanted it answered.

Quote:
Now if you are only adding a single or small number of books, an interactive choice might be desirable - don't put off until tomorrow what can be done today and all that.

But what if you need to stop/do something else partway through? What does Calibre do with all the "unresolved" conflicts? Any kind of "abort" in the process can leave you with a messy mish-mash of partially imported books from a subfolder tree, and an absolute nightmare to "continue on" from.

So one approach to this which I briefly mentioned in a previous thread would be an additional option for the automerge. Currently when you turn it on, any new formats for an existing book get merged, and any duplicate formats get thrown away. What I would like is the same behaviour for new formats, but that duplicate formats get created as a new book entry in Calibre, and that the two books then get marked as being duplicates. For instance just add a "Duplicate" tag to both entries.
Yes, I can see that option. Also it would be nice to have the "reverse" option of just overwrite the existing format with the new format for when I knew I was adding better copies. I think I'd actually rather have improved duplicate finding options for all of Calibre instead of tagging during automerge with "duplicate" that I would have to manage. If I think the new formats might be better, but I'm not sure, I could turn on the "keep all formats, and create new book record for duplicate formats" option, then run the improved duplicate finder with viewer, which would let me view duplicates located and merge or throw away those I didn't want. An improved duplicate finder would be great for existing libraries that needed work.
Starson17 is offline   Reply With Quote
Old 01-26-2011, 06:07 PM   #27
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Starson17 View Post
Yes, I can see that option. Also it would be nice to have the "reverse" option of just overwrite the existing format with the new format for when I knew I was adding better copies.
Yeah, I think you would want to have all three possibilities - discard duplicate formats (existing behaviour), overwrite duplicate formats and create new books for duplicate formats. Perhaps in the Preferences there is a top level "automerge" checkbox like you have now, with these three as the radio button sub-options.

One thing I don't understand (perhaps it is legacy code not yet addressed?) is why you would have different matching logic between having automerge turned on or off. Surely a duplicate is a duplicate - you either automatically merge it using the choice in Preferences, or you prompt the user what to do interactively (giving them the three choices)?

Quote:
Originally Posted by Starson17 View Post
I think I'd actually rather have improved duplicate finding options for all of Calibre instead of tagging during automerge with "duplicate" that I would have to manage. If I think the new formats might be better, but I'm not sure, I could turn on the "keep all formats, and create new book record for duplicate formats" option, then run the improved duplicate finder with viewer, which would let me view duplicates located and merge or throw away those I didn't want. An improved duplicate finder would be great for existing libraries that needed work.
Oh yes, totally agree that if there was a function in Calibre which could display books it considered as duplicates it could be a huge improvement over a tag approach. Particularly as you say for picking up legacy books, and that other case we discussed previously of a book not initially being seen as a duplicate due to bad naming but then "becoming one" after the automerge process when you clean the title up. The advantages the tags had was that it did not require analysing the entire database each time it was run, and you did not have to worry about false positives. However if the duplicate check was kept very fine grained (same logic as your automerge code does) then it could work quite nicely. It won't resolve everyone's duplicate problems from wide title/author variations, but it would cover the "known duplicate" space.

I wonder if running a duplicates search could be done as a GUI plugin. However I am hesitant to start investigating down that plugin route unless Kovid agrees (after all we can just deprecate the plugin later) as it seems like a feature that he perhaps may want built into Calibre to give wider user exposure. Plus he could obviously write it way better than I would anyway, though he has to find the precious time to do it first. The sub-options within automerge such as "create new book for duplicate formats" would require Calibre source changes of course.
kiwidude is offline   Reply With Quote
Old 01-27-2011, 08:37 PM   #28
jstavene
Junior Member
jstavene began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2010
Device: none
I am collecting books and stuff for a similar to project alice,, (massive data upload of literature seems interesting to me for a ai if based on vml, soo my collection is large but I also spidered webpages and pdf'ed them too, and well I just like have a private archive to search and sift as we get connection problems up here and websplits often. I sometimes play the ebooks as audio, (depending on the format which tools I use if at home I use a winxp machine with msreader read out loud via .lit files or, pdf read out loud (adobe acrobat though been considering trying some other stuff) my linux machine mainly a fedora 14 uses carnival/ festival, (I also must confess the linux machine is running my calibre, I initially had trouble around 6500 books or items when using winxp (even sp3) but with linux, WOW! my speed and stability of calibre is vastly improved, when importing -add more then 10000 items at a time, I do get a pile of errors from drm to interpretor stuff, but I have not crashed it yet, (with linux)

If I am adding a huge collection I break it to smaller chunks 10 gig seems nice, then run flint across to detect dupes then fdupes maybe depending on my patience,,

I like the idea of converting all to epub, then if something happens I can just copy or move out all epubs if I have to rebuild (plus this cleans out any drme'd mistakes which inadvertently got added (some tools for downloading or syncing do funny things from signatures to read write permissions,,, so,,hahah)

I really like calibre extremely ambitious, all in one swiss army knife for documents,, and last year or so stability has really spun up nicely,,, (not sure if I would try a windows server again but maybe if I was gonna suggest anything, laguage translation (tough as heck and nearly impossible but this would really round out my thoughts, (and perhaps a option to delete the original files on a import after successful copy? (just the option would be nice ) but great program actually a server in its own right, I would love to know how others test this,, ( I am sure a single file would be quick for testing import and conversion, but yes even huge archives seem stable now to me :-) even if they seem pointless, and if a Aritificial Intelligence was ever developed, hmm some of these archives would be fun to upload or recode?
jstavene is offline   Reply With Quote
Old 01-28-2011, 04:12 PM   #29
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kiwidude View Post
Yeah, I think you would want to have all three possibilities - discard duplicate formats (existing behaviour), overwrite duplicate formats and create new books for duplicate formats. Perhaps in the Preferences there is a top level "automerge" checkbox like you have now, with these three as the radio button sub-options.
I like that, but what seems most desired is some listing of "duplicates" with some way to view each and select between the three options.

Quote:
One thing I don't understand (perhaps it is legacy code not yet addressed?) is why you would have different matching logic between having automerge turned on or off.
Yes, it's legacy. Kovid's original code was to match on identical titles only - without any regard to author. It's useful where you don't even have an author for the incoming book (perhaps all you have is the filename and it's just the book title.). Automerge is sort of the opposite - it does exact matches on author, but is flexible/fuzzy on the title. Automerge really needs both. Exact match on author only isn't really very useful.

Quote:
Surely a duplicate is a duplicate - you either automatically merge it using the choice in Preferences, or you prompt the user what to do interactively (giving them the three choices)?
Yes, once you've got an option, you can do the "correct" thing. It's just that so far, the option to do different things hasn't been written. We've tried to set it up to do "bulk" handling of them all. With automerge off, you get a tiny option of adding all the "duplicate" titles as new books or not adding them at all, so you can consider duplicate titles to be "duplicates" even without an author, but you wouldn't want to consider them to be duplicates in the automerge setting without info on the author.

Quote:
I wonder if running a duplicates search could be done as a GUI plugin. However I am hesitant to start investigating down that plugin route unless Kovid agrees (after all we can just deprecate the plugin later) as it seems like a feature that he perhaps may want built into Calibre to give wider user exposure. Plus he could obviously write it way better than I would anyway, though he has to find the precious time to do it first. The sub-options within automerge such as "create new book for duplicate formats" would require Calibre source changes of course.
I'm comfortable in the code for creating records, deleting records, grabbing incoming files and associated metadata, moving metadata around, searching the library for dupes, etc. but haven't had time to look very far into user interface construction. My skill stops at adding an option box and storing/retireving the option or adding a warning dialog and allowing that warning to be turned off.

That's part of why I never did any of the "option to do something with duplicates" or "option to control fine-grain of metadata during merging." One has to present the results, provide selection boxes for what to do, provide an option to view the book, or metadata, etc. and I just haven't had time to play with QT enough to learn that stuff. Heck, I tried to just change the custom user recipes into alphabetical order, and couldn't get there, even with Kovid's sample code of alphabetical order built-in recipes. I ended up cheating and making the search produce an alphabetical order result dataset so I didn't have to sort the results in the GUI.
Starson17 is offline   Reply With Quote
Old 01-28-2011, 09:51 PM   #30
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,228
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Quote:
Originally Posted by Starson17 View Post
Yes, it's legacy. Kovid's original code was to match on identical titles only - without any regard to author...
Cool, hoped that was the case. As surely adding your exact match on author logic to trigger the user to decide what to do could esasily be merged into that. it is simple enough to test whether a book has an author to decide whether to match only on title or on title and author.
Quote:
I'm comfortable in the code for creating records, deleting records, grabbing incoming files and associated metadata, moving metadata around, searching the library for dupes, etc. but haven't had time to look very far into user interface construction. My skill stops at adding an option box and storing/retireving the option or adding a warning dialog and allowing that warning to be turned off.
Well maybe I can help with that - my Qt skills are still fairly newborn (or is that fairly stillborn?) but they have managed a little complexity with the configuration dialogs for Search the Internet for instance with grids, context menus, embedded widgets, signals etc.

There are two options to the GUI approach that I can think of. One is to modify the Calibre source code to reuse the library view. This would probably be the best long term option, but as it touches the very core of Calibre it would need pretty close Kovid supervision to gain any liklihood of patch acceptance if done by a Python muppet like myself.

Plan B would be to do it in a popup window as part of a GUI plugin. I reckon given enough time I could pretty much cope with writing that, though obviously you would be more "constrained" in functionality by not being on the official library view. The advantage is that you could happily add columns and right-clicks all related to just the task at hand (resolving duplicates) safely encapsulated within a plugin that Kovid doesn't have to worry about

The downside is that there are a number of things you take for granted in library view that would likely involve considerable duplication of code to offer. So it might start off pretty crude and basic. But IF the intent is just to list books that are duplicates, allow you to view formats and then merge the results it might be feasible?

I would presume you must already be doing what to me is the "hard part" of using the Calibre model/db to identify duplicates for a given book. So presumably rather than iterating over a collection of "adding" books you instead iterate over "all" books. Could be very slow, but I imagine you could do a few things like snapshot the results of the last time you "searched" and work with that until the user "refreshes" the duplicate search again. Again just thinking out loud before prematurely optimizing.

If I did all of that and it "worked" well enough, then the next step could be to "loosen the reigns" of that automerge option by adding the three sub-options I proposed and hence allowing the duplicate rows to be created when formats are duplicated.

Anyone have any thoughts on this? Bad idea/waste of time/etc?
kiwidude is offline   Reply With Quote
Reply

Tags
duplicate

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate Detection albill Calibre 2 10-26-2010 02:21 PM
Help with Chapter detection ubergeeksov Calibre 0 09-02-2010 04:56 AM
Device Detection doom Alberto Franches Calibre 6 06-24-2010 05:38 PM
Device detection? totanus ePub 1 12-17-2009 07:05 AM
Structure detection v5.5 and v6.2 AlexBell Calibre 2 07-29-2009 10:11 PM


All times are GMT -4. The time now is 10:02 AM.


MobileRead.com is a privately owned, operated and funded community.