MobileRead Forums - View Single Post

ilovejedd · 10-13-2018, 05:19 PM

Quote:

Originally Posted by Tanjamuse

Is it possible to avoid the duplicate check when adding books to empty library?

It only checks the title and not both title and author or any other columns.

Could it be possible to either override this duplicate check or make sure that it compares more than one column to figure out if it's a duplicate.

I just set the ignore via command line. Not sure how to do the same in GUI.

I use the Find Duplicates plugin to check for duplicates after import. I've got plenty of fanfics that have changed author pseudonym or title so the url identifier (e.g. https://archiveofourown.org/works/1234567) is the most reliable method of catching duplicates.

I was actually quite curious about performance so I ran some import tests using part of my AO3 fanfic library.

SSD used is a 500GB Samsung 840 with planar TLC NAND. It's 5-6 years old and 90% full so probably quite slow by modern SSD standards as well as due to normal performance degradation.

HDD used is a brand new, empty 1TB 7200RPM Seagate Barracuda (found it in my box of spare parts).

Flash drive used is a 128GB Samsung Bar USB 3.1 (connected to USB 3.0 port).

Code:

calibredb add --duplicates --recurse --library-path "X:\Calibre Portable\TestLibrary"
"X:\ebooks\import"


Import Stats
               mm:ss.00  MB/min  books/min
SSD to SSD     13:54.10    178      355
SSD to HDD     16:55.52    147      291
HDD to HDD     20:19.20    122      243
Flash to HDD   17:37.79    141      280


Import Structure:
\Fandom\Authors\Authors - Title (id).ext

2.422 GB, 979 folders, 24,650 files

4,930 "unique" books (based on checksum)
* each book has epub, mobi, txt, opf & cover

2,411 unique titles (based on url identifier)


Options:
  -d, --duplicates      Add books to database even if they already exist.
                        Comparison is done based on book titles.

  -r, --recurse         Process directories recursively