Great detective work everyone.
Quote:
Originally Posted by Hitch
(Still thinking about what I'm going to smash. Hmmmm.)
|
*Dives and saves the Mattias Ergo Pro*
Quote:
Originally Posted by BetterRed
Suggestion : reduce the original and butchered epubs back to plain text and run them through the Beyond Compare washing machine to determine if the actual text of the book has been changed.
|
I am not a fan of the Plain Text comparison (or at least don't rely on that as your
only comparison). While it will catch typo corrections + added/removed sentences... its biggest problem is with formatting changes (such as changed bold/italic/blockquote/margins).
If you can't do a direct code comparison (in this case, the HTML was absolutely butchered), then I would rely on converting both Before/After documents using Calibre to an intermediate file type that retains some basic formatting (RTF/DOCX).
Then you run RTF/DOCX comparison tools on it.
Source: I work on a lot of books with multiple editions/versions floating out there with varying quality conversions. I do quite a few A/B/C compares to catch typos/formatting mistakes between them all + make sure mine beats out all the rest.
As you can imagine, the HTML is wildly different between editions, while the text is approximately the same.
Side Note: Depending on the book type, italics alone could be a large amount of changes with zero change to the text itself. Example: An author might go through and add lots of emphasis everywhere throughout a novel's dialogue.
I have also come across cases where some editions removed italics on foreign words ("
coup d'état"), or removed italics on a portion of a word such as "
unfavorable" (this style has heavily fallen out of favor in more modern typography).