View Single Post
Old 07-31-2011, 08:22 AM   #114
rigolo
Junior Member
rigolo began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jul 2011
Device: prs-600
compare contents of epub files

I have been using your pluging to clean up the library that was pieced together here by our kids. It contained a lot of duplicates that I could find using the binary comparison option.

How ever I am also finding duplicate epub files that are not binary equal.

Looking at the files shows that they are the same size, but within the " epub zip" there are some differences in the opf file.

here an example:

Het loterijbriefje - Jules Verne.epub

this is a epub from the gutenberg project (ebook #30929)

in the metadata section of the opf there is a small change:
<dc:creator role="aut" file-as="Verne, Jules">Jules Verne</dc:creator>
<dc:identifier scheme="ISBN"></dc:identifier>

These 2 lines have been switched .. making it (from a binary standpoint) a different epub, but contents wise it is 100% identical.

Is there a way to also find these find of duplicates? just looking a the metadata alone will not garentee that the actual contents is the same.

I now used a trial version of altova diffdog to compare the contents of the two epub files. But it must be possible to do this automatically from within the plugin.

when doing the metadata compare, do you use the opf from the calibre library? or the opf as contained inside the epub?
rigolo is offline   Reply With Quote