MobileRead Forums - View Single Post

sethcohn · 12-31-2012, 02:51 PM

Ok, while the binary comparison in Calibre will eliminate true duplicate files, the use of different versions of Calibre (or other tools), and UUIDs mean that even identical version of an ebook prepared by 2 different people are not identical.

https://github.com/takahashim/epubdiff

is a great tool, that essentially unzips 2 epubs and diff compares them.
I've hacked on it (adding some command line options to diff) to ignore .opf (where the main trivial differences are), .otf (font differences), and .ncx (minor diffs), so that books which are otherwise similar will not show those differences, and thus be considered the same.

It would be _really_ nice, if a similar means was in Calibre, to compare if a book's _actual_ content (and not metadata, etc) was identical. A batch method, to run through an entire library (or 2) would be amazing.

How are other people dealing with this issue?

12-31-2012, 02:51 PM	#1
sethcohn Junior Member Posts: 6 Karma: 10 Join Date: Jun 2005	Epub comparision tools? Ok, while the binary comparison in Calibre will eliminate true duplicate files, the use of different versions of Calibre (or other tools), and UUIDs mean that even identical version of an ebook prepared by 2 different people are not identical. https://github.com/takahashim/epubdiff is a great tool, that essentially unzips 2 epubs and diff compares them. I've hacked on it (adding some command line options to diff) to ignore .opf (where the main trivial differences are), .otf (font differences), and .ncx (minor diffs), so that books which are otherwise similar will not show those differences, and thus be considered the same. It would be _really_ nice, if a similar means was in Calibre, to compare if a book's _actual_ content (and not metadata, etc) was identical. A batch method, to run through an entire library (or 2) would be amazing. How are other people dealing with this issue?