Highlight differences between books?
Is there any convenient way to show the difference between two editions of the same book?
For example, say I have an epub from MobileRead (with MR edits) and one from Project Gutenberg and I want to see what was cleaned up, or added and so on.
To do that by hand, I would have to:
1) Unzip
2) Concatenate the XHTML files in order according to manifest
3) Strip out all the tags
4) Normalize quotes, punctuation and other characters that might appear as HTML entities versus unicode characters and so on
5) Use a diff program to see the difference in the text
I am trying to clean up my library, but when all I want to do is decide which version of a duplicated book in my library I want to keep, this is a lot of work for a single title.
Or as another example, for some SF stories I have, I am unsure which file is from the magazine version and which one is the novel version and so on.
|