View Single Post
Old 02-18-2010, 04:13 PM   #1
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Changing default Add Book behavior - Comments?

The default Add Book options seem to work like this:

Option 1 is to add books from a single directory. Each book is added as a separate book. You point and choose the books you want to add. Different formats of the same book are added as separate records.

Option 2 adds books from multiple directories. Each directory is assumed to be a single book, so different formats are added as the same book, regardless of file name.

Option 3 also adds books from multiple directories. It's labeled that it "assumes every ebook file is a different book." That doesn't actually seem to be true. If the filenames are the same, but the extensions differ, this option behaves like the first option. However, if a filename differs, it adds that file as a different book, and adds all other files with the same name, but a different format extension into that record.

That works well when adding a new book.

I've noticed, however, that it's not as effective for long-term when you try to add a new format for a book that's already in the database. For example, if you try to run option 3 a second time on the same directory, you get duplicate books. It nicely sorts the formats into the new records it creates, but it's not aware of the older records for the same books.

I've looked at the code, and basically, it looks at the title of the new books it is trying to add. If that title exists in the database, it's a duplicate, and you are asked if you want to add it as a duplicate record. It does not look at the author to see if this is an identical book in a different format that might logically be added as a new format for an existing record.

I've decided I need that feature as I try to bring my ebook library of the last 20+ years into Calibre. I'm past the halfway point and most of my books seem to already be in the library in text format (the first format I added). To address my personal needs, I've modified the code to function as follows:

When an attempt to add a book is made, it checks the database and finds all books by the same author. It then compares the title of the new book to the titles of those books. If the new title is sufficiently similar to an existing book title by the same author, it adds the book as another format of the existing book. "Sufficiently similar" means that it ignores case and any leading indefinite articles ("the", "a", "an", etc.)

This meets my needs and has greatly improved the speed of adding my existing books into calibre. I can drop a few hundred books onto the main screen and it sorts them into the existing records, creating new records when the title/author is new, and warning me that it will create duplicates only when the title matches, but not the author (this last is the remnant of the current behavior - I catch all the other true identical "duplicates" where both author and title match.)

My question is whether this would be useful for others, and if so, how should it be integrated into calibre? I overwrite an existing format whenever I add a new copy of the book in the same format. Some might hate that. Some may need to enter duplicate records of the same format.

I could certainly display a list of "identical books" the same way it currently displays a list of "duplicate books" and ask for permission first. I could not overwrite the same formats, but only add new ones. I could put in a variety of options that allow reversion to the old behavior, or that control format overwite, but I know Kovid hates option clutter (for good reason).

I'm just not sure what optimal behavior is for other needs.

Currently the code has some minor oddities (It produces multiple records when there are multiple formats of a new book that has the same title as a book by another author, but it should be easy to fix this when I've got time.)

I'm going to be out of action for a while, but if anyone has any thoughts on this, I'd love to hear them. If no one has any interest, or this would break current expectations, I won't spend the time to clean up the code. (Which I would definitely need to do so I don't look too bad if I send it to Kovid - he's seen enough of my hack jobs already!)

Thanks for any suggestions or comments.

Last edited by Starson17; 02-18-2010 at 04:15 PM.
Starson17 is offline   Reply With Quote