View Full Version : Compare ePubs??


carmenchu
09-01-2011, 11:08 AM
After a search on this ropic and finding only "dead" threads (sorry if I skipped a "live" one), I am posting a WISH SOFTWARE with a

WISH LIST OF FEATURES:


Pick some.epub and other.epub, which you know to be the SAME BOOK, although with different <file name>/<file size> (the last defeats all the file comparison tools I have tried)

Show comparison window of both internal files (see folder comparison tools for the kind of display I have in mind).

Allow compare <selected file left | selected file right> (default: same extension, i.e: .ncx, or like extension, i.e., .html to .xhtml) with file comparison tool (ExamDiff, WinMerge ...) -- ideally internally, but SendTo external tool would be great.


Very Advanced: Compare both files stripped of <tags>: raw text option.

More Advanced Still: Pick <several files left | several files right> (which you suspect to be, i.e.: Chapter One differently split), merge into <temp xhtml left | temp xthml right> and file-compare.

Allow to save .diff/.proyect file on exit.


Maybe I am asking for the moon on a platter but ... all these separated features are avaible as single open source libraries!!

So, NEEDED programmer who would undertake to link them toghether in a GUI ...

dynabook
09-01-2011, 12:45 PM
I SO need something like this. Especially item 3.1
--MH

Toxaris
09-01-2011, 01:55 PM
3.2 could be difficult and result in broken files. Due to the character of the format, the files can be quite different, but render almost the same.

However, it could be useful to be able to create a diff script. Perhaps someone will pick up the gauntlet.

carmenchu
09-02-2011, 05:48 AM
3.1 and 3.2 are both intended for generated temporary files (txt/html) -- save only at your own risk, and not into ePub!

My notion for 2.3 would be to merge from <body> to </body> (tags excluded!) and put into an <html><body> ... </body></html> container: option save to "plain html" -- never ePub.
As a matter of fact, I cannot conceive of an ePub Merge tool which would work ... maybe somebody else can??
My notion only for ePub diff -- and maybe save bits for further editing with a dedicated tool.

carmenchu
09-04-2011, 06:56 PM
After some searching for references/playing with under-used tools:


7-zip (http://www.7-zip.org/) has a feature View --> 2 panels allowing to browse separately to 2 different compressed files, and look at them as folders: and it recognizes .epub! (file header versus extension).
Thus, open each ePub on a different pane, open META-INF, OEBPS and subfolders, and do a first comparison.
Each single file can be double-clicked to default associated program ... or sent to default editor (notepad, SciTe...)

DRAWBACK: have tried to call either WinDiff or WinMerge (option configure Diff program) and both rejected the path to the temp files -- maybe some further configuring needed?

WinMerge (http://winmerge.org/), after some configuration (exportable, can be provided as example) did the job as well ... through 7-zip plugin (can be downloaded and installed from the program, if absent from the distribution).

ADVANTAGE: Automatic call to compare files present in both ePubs, with quite sophisticated comparison options -- also a Merge ... but use at own risk!
DRAWBACK: nothing to do with files in only one folder, even if you suspect foo.xhtml to be an alias for bar.htm, or the lacking fragment in one of two files just compared...
MORE: WinMerge has plugins (.dll) for MSOffice and OpenOffice documents: why not for ePub??



Would priogramming guru take the thing in hand?

Anyway, some level of comparison is available...

afv011
09-04-2011, 10:02 PM
You could try BeyondCompare. You may have to rename the ePubs to zip, as i am not sure the extension is recognized.

carmenchu
09-05-2011, 11:07 AM
Extension recognized alright by 7-zip, and therefore, so by WinMerge.
Of course one can get further -- some manipulations, like replacing the cover with a new one (same name) and editing the appropriate file for new sizes, are straightforward enough. Or editing the .css to change some value...
But other manipulations (badly needed, at times) as merging two badly split pages (i.e., Chapter number in one, Chapter title and text in the other) are sure to mess the ePub structure -- so, for the time being, open the thing in Sigil and do it there.
Or merge from both sources, save to a new file ... and see if Sigil can open it, correct mishaps and validate.
My opinion, at least.