![]() |
#1 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9
Karma: 1234
Join Date: Jul 2012
Device: none
|
Renaming files using "download metadata" changes checksum
Hi. I have lots of downloaded books which I want to rename using the "download metadata" function of Calibre. It was OK at first, but then I realized that downloading metadata changes the checksum of the files. Is it possible to just rename the files without altering their checksum by writing metadata to them? As it is now, I may end up with a lot of duplicates without knowing (the only way to check whether I already have a book is by comparing its checksum with the file I haven't renamed yet, but since renaming them using Calibre's download metadata changes their checksum, that will no longer work). Can I rename files using calibre but without altering them or do I have to do it manually (open the file, see who wrote it and what it's called, search on google, go to amazon's page for the book, click to select and copy the title of the book, click to rename, paste the title, click OK to deal with the characters Windows doesn't like (colons, question marks, etc)?
|
![]() |
![]() |
![]() |
#2 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,077
Karma: 14079267
Join Date: Oct 2007
Location: Almere, The Netherlands
Device: Kobo Sage
|
Quote:
Could you explain what it is what you're trying to do? |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Handy Elephant
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
|
Let Calibre work as it does. And use it! Either surrender your soul to the Calibre way of doing things, or wander around in the barren wastelands outside. There is a plugin to Calibre that will let you search for duplicates. Both by a binary compare of the actual book and by comparing the file names. You can install the plugin from within Calibre.
So before you fetch any metadata to rename your new books, remove any books that already has a binary match (with correct metadata) already in the Calibre library. In order to get Calibre to bulk fetch metadata, at least book title and author has to be correct. You have to fix that somehow. Perhaps scan it from the filename when you import, unless it can be read from inside the file. Or you will have to actually open and manually copy the title and author from inside the book. Last edited by Adoby; 08-31-2012 at 07:00 AM. |
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
Quote:
You might also want to consider the Find Duplicates plugin. It does much more sophisticated checks than simple checksum (although it does have a bianry compare option) so is much better at finding duplicates. |
|
![]() |
![]() |
![]() |
#5 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 603120
Join Date: Aug 2012
Location: Monte Los Angeles (Califoggia)
Device: Android Tablet, Kindle Paperwhite
|
If the content of a fie change even for a single byte, the checksum will change, there's no way to stop that!
But if you only rename a file, the checksum doesn't change; and usually Calibre saves metadata and covers to a different files, so I think you can solve this issue.. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9
Karma: 1234
Join Date: Jul 2012
Device: none
|
Well suppose I downloaded lots of PDF files, (sometimes epubs) and I want them to rename them all like {author} - {title}, but without altering their checksums. Basically just use Calibre like a file renamer which does the hard work for me, but not altering them in anyway by writing metadata. Can this be done?
|
![]() |
![]() |
![]() |
#7 |
Handy Elephant
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
|
No.
But there are plenty of other utilities to rename files in bulk. My favorite is LibreOffice Calc. I create a file list in text format using ls (dir) and import that to Calc. Then I change it into a shell script (batch file) that copies the files with new names to another directory. Plenty of string functions and conditions and logic. And only the filenames are changed. Why do you mind that the checksums change? Can't you just update them? |
![]() |
![]() |
![]() |
#8 | |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9
Karma: 1234
Join Date: Jul 2012
Device: none
|
Quote:
If I have a folder like this I Love You More.mp3 Can I Bitch.mp3 Bully.mp3 Come On In.mp3 I can use one such program to add "Eminem" before each title without affecting it's checksum. Or use MP3tag. But with books this won't work. I have to open each pdf to see what it's called, who wrote it and then tediously c/p that for every file and rename manually. I was hoping that Calibre might help avoid all that. For instance if I have something like this in my sorted folder: Liz Neporent Suzanne Schlosberg - Weight Training for Dummies.pdf http://www.amazon.com/Weight-Trainin.../dp/076455168X But then I get a second bodybuilding pack from somewhere else and it includes the same book, but this time it's called Weight training guide.pdf (no author and wrong title), the duplicate checker won't be able to tell it's the same book checking by name, and comparing the checksum won't work either if the former was renamed using Calibre because its checksum will have changed. |
|
![]() |
![]() |
![]() |
#9 |
Handy Elephant
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
|
If you keep *all* your books in Calibre, you don't have to bother with checksums. You use Calibre to check for duplications. And you can fix metadata, covers and all other stuff.
Instead use the plugin "Find Duplicates" to look for binary matches, before(!) you do a bulk metadata lookup. And remove duplicated books BEFORE you change the metadata. That way you never risk adding the same book twice. After successful addition of metadata you can do another search for duplications, this time for the same book in different editions or in different formats. Sometimes you may wish to keep the duplicates, sometimes not. Calibre allows you to decide which. If you do things like this you can use Calibre to maintain your library of books. But you have to give up the checksums. I'm still not clear about why the checksums are so very important to you? Also you have to give up your sorted folders and any structure you have now. Instead you can use the features in Calibre to locate different books. Or even use the Content Server to browse you library using metadata like tags. If you wish you can at any time "dump" the whole library to a folder tree, using metadata to create the tree structure and organize the tree as you like. --- There also are some utilities out there that seems to do exactly what you want. Like MP3Tag. Not always free, though... If the author and book title isn't in the metadata, then it won't work. Not if the metadata is wrong, either. Or perhaps modified by Calibre earlier, by someone else. Google "epub metadata rename" or "pdf metadata rename" to find some examples. Last edited by Adoby; 09-01-2012 at 10:25 AM. |
![]() |
![]() |
![]() |
#10 |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,658
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Whether or not one believes thehawkman ought be concerned about whether calibre does or doesn't change checksums, there is an anomaly here.
If I rename or move or copy a file in Windows or Solaris the checksums don't change, and I'd bet London to brick on that Linux & OS/X are the same. The only things that change are the file modify date in the case of a rename or the create date in the case of a move or copy. When I add a book to calibre I thought I was simply doing a copy and possibly a rename. However if I drop a 'new book' into calibre, I tested a PDF and a prc, the checksums of the book file change --- but if I copy a book from a Calibre library into another folder, and rename it and drop that book into another Calibre library the checksums of the book file don't change. So why & what is it that calibre changes in data stream of a file the first time it sees the file - the file name is contained with the file system index and plays no part in computing checksums, which are computed from the file data fork/stream. BR |
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,247
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
calibre does not change checksums on add. The only exception to that is if you've installed third party plugins like DRM removal plugins.
|
![]() |
![]() |
![]() |
#12 |
Handy Elephant
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,737
Karma: 26785684
Join Date: Dec 2009
Location: Southern Sweden, far out in the quiet woods
Device: Samsung Galaxy Tab S8 Ultra
|
I believe this is what happens:
You add a book to Calibre, and some initial values are set for Author and Title. Either from existing metadata or from the file name. The book is stored in a folder named after author and title. Like this: Calibre Library/{authors1}/{title1} ({id})/{title1} - {authors1}.epub The contents of the file has not changed. Checksum unchanged. After a metadata download this may change to: Calibre Library/{authors2}/{title2} ({id})/{title2} - {authors2}.epub But still no change to the metadata inside the file. If you at this stage peek inside the Calibre Library and manually retrieve the book, you will find that it has been renamed, but the metadata inside the book has not been changed, so the checksum should be intact. The new metadata is saved by Calibre in a separate database file, metadata.db, and it is this information that is shown in the GUI of Calibre. But the metadata embedded inside the book file may have completely different values to those shown by Calibre. It is considered a bad(tm) thing to peek and rummage around inside the Calibre Library like this... People usually end up with missing books, corrupt database or other problems. There are stickies here about the dangers. But it *is* possible. Usually you write the books, with the updated metadata to disc, or send to device. During this process the old metadata inside the file is updated. This is when the checksum is changed. At the same time you can change the filename that is used. Last edited by Adoby; 09-02-2012 at 05:00 AM. |
![]() |
![]() |
![]() |
#13 | |
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 21,658
Karma: 29711016
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
It was DRM, I just happened to grab a prc that I purchased from mobipocket so it had DRM, it was in a pile of 200+ prc's & mobi's of which no more than a dozen had DRM ![]() The PDF came from the Boston Consulting Group website. AFAIK the stuff you download from their website isn't protected. I don't have an account at BCG and I certainly didn't pay them any money. I just tested a prc & pdf from different sources and the checksums didn't change, I checked another BCG PDF and the checksum changed. so there's something in BCG pdf's that Alf thinks is DRM. So I disabled the two PDF related plugins, didn't need them any way and added the same BCG PDF - and the checksum didn't change. The OP mentioned "a lot of PDFs", even if they weren't DRM'd they could be something like these I have from BCG which have something that apparently looks like DRM but isn't. <br> Last edited by BetterRed; 09-03-2012 at 07:02 AM. |
|
![]() |
![]() |
![]() |
#14 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 309
Karma: 1645952
Join Date: Jun 2012
Device: none
|
I personally wouldn't use checksums to compare ebooks anyway. I'm less interested in whether a book is byte-for-byte identical to another, than if they are just similar. An epub of Gulliver's Travels downloaded from Project Gutenberg is likely going to have a different checksum than an epub of Gulliver's Travels downloaded from somewhere else, and yet by pretty much any definition they're the same book.
|
![]() |
![]() |
![]() |
#15 | |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9
Karma: 1234
Join Date: Jul 2012
Device: none
|
Quote:
![]() See this? Same file, different locations. And that's the lucky case where it has the same name, in both locations. Since the checksums were identical I was able to find them, but if I had processed them using calibre, I would be unable to find them. That's two files, 17 MB wasted. But what if there are hundreds. To speak nothing of the mess of having the same file under multiple names. So. The way I am doing it is, drag the file to Calibre, click on it -> edit metadata -> download metadata individually -> Click the "download metadata button" -> Accept the proposed covers -> OK -> then "save to disk". Either writing the metadata to the file, or the saving to disk part changes the file. See here: ![]() The first file is the one I renamed using Calibre. The second is the original. The size is different: 6.50 MB for the renamed file, 6.52 for the original. For all intents and purposes this makes them two different files, which I can't spot in any way (checksum is different, file name is different). Unless Calibre can rename my files without altering them, I will be forced to rename everything manually. And if it can please tell me how. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
KT "Ghost covers/files" again at 670 books, "stale" image entries in firmware | VirgoGirl | Kobo Reader | 4 | 04-06-2012 02:10 PM |
Help with renaming for "send to device" | goldyman | Library Management | 9 | 11-18-2011 03:40 PM |
Does "Download Metadata & Covers" also download social metadata? | iridius | Library Management | 3 | 02-22-2011 12:50 PM |
"invalid PID checksum" when using mobidedrm, even though it IS valid. | Haidon | Kindle Formats | 141 | 11-05-2010 12:02 PM |
Author Metadata "Randomly" Missing from Files? | Flyweight | Calibre | 7 | 09-01-2010 01:25 PM |