10-20-2017, 03:17 PM | #1 |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Highlight differences between books?
Is there any convenient way to show the difference between two editions of the same book?
For example, say I have an epub from MobileRead (with MR edits) and one from Project Gutenberg and I want to see what was cleaned up, or added and so on. To do that by hand, I would have to: 1) Unzip 2) Concatenate the XHTML files in order according to manifest 3) Strip out all the tags 4) Normalize quotes, punctuation and other characters that might appear as HTML entities versus unicode characters and so on 5) Use a diff program to see the difference in the text I am trying to clean up my library, but when all I want to do is decide which version of a duplicated book in my library I want to keep, this is a lot of work for a single title. Or as another example, for some SF stories I have, I am unsure which file is from the magazine version and which one is the novel version and so on. |
10-20-2017, 03:59 PM | #2 |
Grand Sorcerer
Posts: 6,206
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
One thing you could try is to convert both epubs to TXT using calibre. Then use your Diff utility to compare the two TXTs. At least it would reduce the number of steps.
|
10-22-2017, 11:25 PM | #3 |
Obsessively Dedicated...
Posts: 3,200
Karma: 34977896
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
|
@radius --- take a look at this thread. WinMerge might be the answer.
https://www.mobileread.com/forums/sh...light=WINMERGE All you should have to do is change the file extension from epub to .ZIP. But for the kind of details you are examining, that might not be enough "pre-exam" preparation. Keep us posted on what works for you. Last edited by GrannyGrump; 10-22-2017 at 11:27 PM. |
10-23-2017, 11:17 AM | #4 |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
|
10-23-2017, 11:21 AM | #5 | |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Quote:
WinMerge would highlight all of the HTML markup and obscure the difference in the actual text of the book. Edit: oh yeah, forgot to mention that my day to day machines are a Linux box and a Mac laptop so Windows is an inconvenience too haha Last edited by radius; 10-23-2017 at 11:25 AM. |
|
10-23-2017, 12:03 PM | #6 | |
Grand Sorcerer
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
If you prefer GUI tools, I'd recommend Meld and Beyond Compare. |
|
10-23-2017, 12:59 PM | #7 |
Resident Curmudgeon
Posts: 73,887
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The only way to do this is to convert each book to text and compare the two text files.
|
11-01-2017, 10:57 PM | #8 | |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Quote:
|
|
11-01-2017, 10:58 PM | #9 |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Yes. I've been thinking that I will script this myself. After thinking about it, I realized that I don't need a 100% solution, but rather an 80% solution would work fine. After all, this is only for my personal convenience.
|
11-02-2017, 05:16 AM | #10 | |
Grand Sorcerer
Posts: 5,583
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
|
|
11-05-2017, 11:28 AM | #11 |
Wizard
Posts: 2,985
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
I can't remember if you (@radius) use linux or not, but if so, you can use unzip to extract the contents of the epub file, and then use lynx -dump -nomargins to extract the text from the htm/html files. You can then use diff to compare the extracted text.
Last edited by rkomar; 11-05-2017 at 11:31 AM. |
11-09-2017, 11:54 AM | #12 |
Lector minore
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
|
Ah pandoc looks like an excellent general purpose tool! Thanks! I guess trying to normalize plain text is easier than html so that might be the way to go.
@rkomar, I'm actually an elinks user and I've tried the dump command before. My problem with it was that it tried to preserve too much formatting from the html. I don't think elinks supports a nomargin option unfortunately. That's a good tip. |
11-09-2017, 03:36 PM | #13 |
Wizard
Posts: 2,985
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
|
Have you tried using the "-b" option with diff? It suppresses differences in white space, so maybe it will be less sensitive to formatting between the text outputs.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Discrepancy in >Differences between books< | chaot | Editor | 32 | 01-11-2017 02:18 PM |
Differences between books - Markings (2) | chaot | Editor | 3 | 06-03-2016 05:47 AM |
HDX differences between doc and books backgrounds | stumped | Kindle Fire | 1 | 06-01-2016 04:05 AM |
Differences between books - Markings | chaot | Editor | 6 | 05-26-2016 11:19 AM |
Differences between books - Color of markers | chaot | Calibre | 4 | 05-19-2016 01:09 PM |