Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 10-20-2017, 03:17 PM   #1
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Highlight differences between books?

Is there any convenient way to show the difference between two editions of the same book?

For example, say I have an epub from MobileRead (with MR edits) and one from Project Gutenberg and I want to see what was cleaned up, or added and so on.

To do that by hand, I would have to:

1) Unzip
2) Concatenate the XHTML files in order according to manifest
3) Strip out all the tags
4) Normalize quotes, punctuation and other characters that might appear as HTML entities versus unicode characters and so on
5) Use a diff program to see the difference in the text

I am trying to clean up my library, but when all I want to do is decide which version of a duplicated book in my library I want to keep, this is a lot of work for a single title.

Or as another example, for some SF stories I have, I am unsure which file is from the magazine version and which one is the novel version and so on.
radius is offline   Reply With Quote
Old 10-20-2017, 03:59 PM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,198
Karma: 16228558
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
One thing you could try is to convert both epubs to TXT using calibre. Then use your Diff utility to compare the two TXTs. At least it would reduce the number of steps.
jackie_w is offline   Reply With Quote
Old 10-22-2017, 11:25 PM   #3
GrannyGrump
Obsessively Dedicated...
GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.GrannyGrump ought to be getting tired of karma fortunes by now.
 
GrannyGrump's Avatar
 
Posts: 3,200
Karma: 34977556
Join Date: May 2011
Location: JAPAN (US expatriate)
Device: Sony PRS-T2, ADE on PC
@radius --- take a look at this thread. WinMerge might be the answer.

https://www.mobileread.com/forums/sh...light=WINMERGE

All you should have to do is change the file extension from epub to .ZIP.
But for the kind of details you are examining, that might not be enough "pre-exam" preparation.

Keep us posted on what works for you.

Last edited by GrannyGrump; 10-22-2017 at 11:27 PM.
GrannyGrump is offline   Reply With Quote
Old 10-23-2017, 11:17 AM   #4
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Quote:
Originally Posted by jackie_w View Post
One thing you could try is to convert both epubs to TXT using calibre. Then use your Diff utility to compare the two TXTs. At least it would reduce the number of steps.
Ah, I didn't know you could dump text from calibre. I might install and try that.
radius is offline   Reply With Quote
Old 10-23-2017, 11:21 AM   #5
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Quote:
Originally Posted by GrannyGrump View Post
@radius --- take a look at this thread. WinMerge might be the answer.
Thanks for the suggestion. If I was comparing two edits of the same book this would work great. But I am actually trying to compare "editions" of the same work. That is, maybe something I had marked up from plain text versus a new Distributed Proofreaders version or a GrannyGrump edition

WinMerge would highlight all of the HTML markup and obscure the difference in the actual text of the book.

Edit: oh yeah, forgot to mention that my day to day machines are a Linux box and a Mac laptop so Windows is an inconvenience too haha

Last edited by radius; 10-23-2017 at 11:25 AM.
radius is offline   Reply With Quote
Old 10-23-2017, 12:03 PM   #6
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by radius View Post
Edit: oh yeah, forgot to mention that my day to day machines are a Linux box and a Mac laptop so Windows is an inconvenience too haha
In that case, have a look at this Linux post by SBT.

If you prefer GUI tools, I'd recommend Meld and Beyond Compare.
Doitsu is offline   Reply With Quote
Old 10-23-2017, 12:59 PM   #7
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,841
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
The only way to do this is to convert each book to text and compare the two text files.
JSWolf is offline   Reply With Quote
Old 11-01-2017, 10:57 PM   #8
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Quote:
Originally Posted by Doitsu View Post
In that case, have a look at this Linux post by SBT.

If you prefer GUI tools, I'd recommend Meld and Beyond Compare.
Good ideas. I like Diffmerge myself. The last time I tried Meld it felt slow to me, and AFAIK Beyond Compare is commercial. I don't think they solve the "look inside a zip archive" and "ignore HTML" issues though.
radius is offline   Reply With Quote
Old 11-01-2017, 10:58 PM   #9
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Quote:
Originally Posted by JSWolf View Post
The only way to do this is to convert each book to text and compare the two text files.
Yes. I've been thinking that I will script this myself. After thinking about it, I realized that I don't need a 100% solution, but rather an 80% solution would work fine. After all, this is only for my personal convenience.
radius is offline   Reply With Quote
Old 11-02-2017, 05:16 AM   #10
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by radius View Post
I don't think they solve the "look inside a zip archive" and "ignore HTML" issues though.
They don't. BTW, you also might be able to use pandoc to convert epubs to plain text files before comparing them. (Use -t plain to force plain text output.)
Doitsu is offline   Reply With Quote
Old 11-05-2017, 11:28 AM   #11
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,981
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
I can't remember if you (@radius) use linux or not, but if so, you can use unzip to extract the contents of the epub file, and then use lynx -dump -nomargins to extract the text from the htm/html files. You can then use diff to compare the extracted text.

Last edited by rkomar; 11-05-2017 at 11:31 AM.
rkomar is offline   Reply With Quote
Old 11-09-2017, 11:54 AM   #12
radius
Lector minore
radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.radius ought to be getting tired of karma fortunes by now.
 
radius's Avatar
 
Posts: 649
Karma: 1738720
Join Date: Jan 2008
Device: Aura One, Samsung Galaxy Tab S5e, Google Pixel Slate
Ah pandoc looks like an excellent general purpose tool! Thanks! I guess trying to normalize plain text is easier than html so that might be the way to go.

@rkomar, I'm actually an elinks user and I've tried the dump command before. My problem with it was that it tried to preserve too much formatting from the html. I don't think elinks supports a nomargin option unfortunately. That's a good tip.
radius is offline   Reply With Quote
Old 11-09-2017, 03:36 PM   #13
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,981
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
Quote:
Originally Posted by radius View Post
@rkomar, I'm actually an elinks user and I've tried the dump command before. My problem with it was that it tried to preserve too much formatting from the html. I don't think elinks supports a nomargin option unfortunately. That's a good tip.
Have you tried using the "-b" option with diff? It suppresses differences in white space, so maybe it will be less sensitive to formatting between the text outputs.
rkomar is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Discrepancy in >Differences between books< chaot Editor 32 01-11-2017 02:18 PM
Differences between books - Markings (2) chaot Editor 3 06-03-2016 05:47 AM
HDX differences between doc and books backgrounds stumped Kindle Fire 1 06-01-2016 04:05 AM
Differences between books - Markings chaot Editor 6 05-26-2016 11:19 AM
Differences between books - Color of markers chaot Calibre 4 05-19-2016 01:09 PM


All times are GMT -4. The time now is 10:36 AM.


MobileRead.com is a privately owned, operated and funded community.