|  05-06-2010, 07:31 PM | #1 | 
| Groupie   Posts: 191 Karma: 134 Join Date: May 2010 Device: IREX DR1000 | 
				
				First Requestes and POV of a Newcomer
			 
			
			Hello, it is the second day I am using calibre. Yesterday it crashed while importing my ebooks (over 1000). Today I have split the library in 10 parts to avoid the crash. It worked ! Here is the experience which results from the first 36h of use 1) There is no date-time of addition, only date. This is a problem: if you add files to a huge DB and then you want to rename whose have a bad import name you have to scan the whole DB. Having Date-Time together you can sort via this field and all the lastest entries will be at the top or at the bottom of the list. This will make finding bad book names easier imho. 2) As I have written the program crashed while importing 1041 books. This creates a dangerous situation. Calibre seems to recognize same file by bookname. You can have duplicates in 2 situations. The first: Filename is the Same; The second; Tags of the files are Unset and this lead to duplicates. If the program crashes and/or you reimport an already imported directory there will be both of those 2 kind of duplicates. Instead the program should be able to skip file which are physically the same from files with the same inferred tittle. Physical Sameness should be taken into account when importing files. It is easy to understand if the same physical file has been alread imported and so it need to be skipped: use a combination of CRC32 and SIZE. Add those field to the DB. It will give 99.9999999% of accurancy. 3) Auto Tagging: Many of us have already created some directory structure to categorize books: one dir = one or more tags, all the same. During import it would be good to automatically select one or more tags to be set into the imported books Ok, that's all. Giuseppe Chillemi | 
|   |   | 
|  05-06-2010, 08:41 PM | #2 | |||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | 
			
			Calibre stores date and time, it just doesn't display the time.  Sorting sorts by time as well as date. Quote: 
 Quote: 
 Quote: 
 | |||
|   |   | 
|  05-06-2010, 09:07 PM | #3 | 
| Wizard            Posts: 4,812 Karma: 26912940 Join Date: Apr 2010 Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet | 
			
			Just curious. How do you tell the file is physically the same.? File name and size or scanning the whole book? If file name and size that is not foolproof but pretty easy for a person to do manually.  Calibre seems to have a much more sophisticated approach which does not always find all books the same but a remarkable number. AFAIK it has not got all duplicates (impossible I think) but I imported 47,000 files and ended up with 21,000 ebooks. I had sorted out the obvious duplicates based on file name/size (took about an hour). Needless to say I was impressed at how much work Calibre did for me. And unlike you imply it might have not detected some, but does not seem to have mismatched any. And Calibre does not ever destroy your original copy. Nothing dangerous there that I can see. I don't imagine it will ever perform magic tasks such as figuring out each individual users file naming conventions and directory structures, but if you spend a little time using it it will make it easier for you to do this yourself. For instance you could add the files from one directory or group of directories and use bulk edit to put in the appropriate tags. BTW Calibre crashed for my first import try but did import my 47,000 files without crashing when I used my spare laptop solely for that purpose. Took about day to do it but it did it. Helen | 
|   |   | 
|  05-06-2010, 09:45 PM | #4 | 
| Curmudgeon            Posts: 3,085 Karma: 722357 Join Date: Feb 2010 Device: PRS-505 | 
			
			I had a problem with crashes when I was importing about 1000 files, but that was with a build from a few months back. It's behaved itself ever since for me, at least. To the OP, here's a way to get the right tags on the right files: 1. Mark all your existing books with a tag like the one I use: [processed] 2. Import your new books. 3. Search for those books that don't have a [processed] tag (I keep this as a saved search) 4. Control-A to select all the non-[processed] books you've found. 5. Bulk edit and put in your tag(s) of choice for that group. Repeat 2-5 until you've imported them all. Though ... thinking about it ... we have options to set the book title, etc., from the file name ... when importing books from a single folder, or a tree of folders, it would kind of be nice to have an option to have it automatically set a based on the folder name(s) starting with the one you selected. So if you start your import in \fiction, which has below it \mystery, \fantasy, and \sf, and below \sf you have \retro and \military, every book imported in that batch would get a "fiction" tag, plus, if relevant, "mystery", "fantasy", or "sf", and some of the "sf" books would get "retro" or "military" too. The idea of automatically assigning tags on import has been requested before. So how about options like: -------------------------------------------------------------- IMPORT AUTO-TAGGING OPTIONS [ ] strip all existing tags [ ] assign the following tag string: [______________________] [ ] assign tags by folder names -------------------------------------------------------------- So if I'm importing a bunch of books from PG, I could pick the first and third options, and it would ditch PG's crappy LoC tags and assign each one a tag of gutenberg, so I know where it came from. I think Giuseppe's idea, and mine (and several other people's in the past) might be worth following up on. It's less of an issue for those of us who have our collections in calibre already, but it would sure make life easier on someone with a few thousand books to import, and make the transition from filesystem-as-metadata to tags-as-metadata much easier for newer users. (and the latter might qualify as a "fewer users bugging Kovid" class of improvement) | 
|   |   | 
|  05-07-2010, 07:59 AM | #5 | ||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Quote: 
 | ||
|   |   | 
|  05-07-2010, 10:06 AM | #6 | 
| Curmudgeon            Posts: 3,085 Karma: 722357 Join Date: Feb 2010 Device: PRS-505 | 
			
			Hopefully a month or so from now, I'll have finished moving and be set up in my new place. Then I guess it'll be time to learn Python.
		 | 
|   |   | 
|  05-07-2010, 11:05 AM | #7 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | |
|   |   | 
|  05-07-2010, 12:02 PM | #8 | |
| Groupie   Posts: 191 Karma: 134 Join Date: May 2010 Device: IREX DR1000 | Quote: 
 CRC32 an filesize is the way software which search for duplicated files adopt to find duplicates on the hard drives (a good program for this is CSPY, "Clone Spy") Giuseppe Chillemi | |
|   |   | 
|  05-07-2010, 12:11 PM | #9 | |||
| Groupie   Posts: 191 Karma: 134 Join Date: May 2010 Device: IREX DR1000 | Quote: 
 The fact you were luky doesn't mean there isn't a bug inside the program :-) Quote: 
 CRC32 + FILESIZE gives a 100% accurate match on duplicates. It is very easy to implement, file size is already there and CRC32 can be 1) Calculated with a simple function available in phyton 2) Inherited form the file system which should have CRC32 stored in file header (if I am not wrong) Quote: 
 Giuseppe Chillemi | |||
|   |   | 
|  05-07-2010, 12:13 PM | #10 | |
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 It seems more useful to me to identify duplicates based on title and/or author, then ask. Most of my duplicates weren't 100% CRC duplicates anyway. | |
|   |   | 
|  05-07-2010, 12:22 PM | #11 | ||||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Quote: 
 Quote: 
 | ||||
|   |   | 
|  05-07-2010, 01:00 PM | #12 | |
| Groupie   Posts: 191 Karma: 134 Join Date: May 2010 Device: IREX DR1000 | Quote: 
 There is only a 0,00000001% of chance they are different. So 100% is 99,99999999 Giuseppe Chillemi | |
|   |   | 
|  05-07-2010, 01:22 PM | #13 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			The problem isn't that files with the same hash will be different, the problem is that files with different hashes may be the same. For example they may have slightly different metadata or have stored annotations, and still be logically the same book.
		 | 
|   |   | 
|  05-07-2010, 02:03 PM | #14 | |
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 I'm not saying it's useless information to know which have the same hash, but Calibre can't use that information to automatically do anything for me. It will still have to ask what I want done. Sometimes if the hash matches, I want it added anyway (multiple author situations) and other times, even with hash differences, I don't want it added (it's the same book, but an earlier version without my bookmarks or with scanning errors not yet corrected). | |
|   |   | 
|  05-07-2010, 03:19 PM | #15 | |
| Groupie   Posts: 191 Karma: 134 Join Date: May 2010 Device: IREX DR1000 | Quote: 
 However, if you are in the early stage of book inporting (for example, merging book collections), and you have not changed metadata, with CRC32 + SIZE you have a 100% hit. Thanks to your POV I whish to change a little my request. Here is the target scenario: Calibre crashes during inport. Part of files have been inported. Some of these files have metadata equal to other (I have found some CHM having the same "Generated by Unregistered Version"). If you discard duplicates, you discard false duplicates too. It does actually happen, I have ecnountered this problem just the first time I have used Calibre. Here Is the proposal: A two round check, the first is CRC32 + SIZE, the second is the actual mechanism. This would give you 3 lists: 1) physical duplicates, 2) Physical and Metadata Duplicates 3) Metadata Duplicates. Then you request the user: DUPLICATES FOUND, what you want to delete ? "Same Physical Files; Same Physical Files + Metadata; Only Metadata; None" What you think about this proposal ? Giuseppe Chillemi | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Newcomer | falchion | Introduce Yourself | 7 | 05-21-2010 01:56 PM | 
| Newcomer | PBook UA | PocketBook | 34 | 12-10-2009 01:19 PM | 
| Classic The Nook from a Kindler's POV | jxh11215 | Barnes & Noble NOOK | 11 | 10-22-2009 12:06 AM | 
| Hello from another clueless newcomer | pamur | Introduce Yourself | 11 | 06-26-2009 09:37 PM | 
| Another newcomer, signing in | ottocrat | Introduce Yourself | 2 | 11-23-2007 03:24 PM |