Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 02-13-2014, 08:33 AM   #31
Moonraker
Addict
Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.
 
Moonraker's Avatar
 
Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
Beyond Compare

Quote:
For digital books... if there are a massive amount of typos, I typically use a code comparison program (I personally use Beyond Compare). And I compare the before/after. Then I would email that to the publisher + your fixed version + an explanation.
I had never heard of Beyond Compare and reading your post I immediately wanted it. Not only would I like to compare html files but I have other uses for the programme as well. I have paid for it and downloaded the trial 30 day version and waiting for my registration code at the moment.

I am finding it rather overwhelming at present. I have managed to compare two folders, each containing thousands of files ok and was very pleased with the result.

But I don't know where to start with comparing two html files.

Is there any chance you could do a quick idiots guide howto on your method of working?

Thank you
Moonraker is offline   Reply With Quote
Old 02-13-2014, 08:37 AM   #32
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Quote:
Originally Posted by Moonraker View Post
But I don't know where to start with comparing two html files.
You can use Calibre's "Compare books" facility to identify changes between two books, or two versions of a book.
HarryT is offline   Reply With Quote
Old 02-13-2014, 11:19 PM   #33
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Moonraker View Post
I had never heard of Beyond Compare and reading your post I immediately wanted it. Not only would I like to compare html files but I have other uses for the programme as well. I have paid for it and downloaded the trial 30 day version and waiting for my registration code at the moment.
Fantastic! I am a programmer, so I also use it for its original purpose (comparing/merging code). It is an extremely powerful comparison program, and has much more powerful capabilities (such as merging files, versioning, etc. etc.).

Definitely is extremely complex/powerful for your typical user.

Quote:
Originally Posted by Moonraker View Post
I am finding it rather overwhelming at present. I have managed to compare two folders, each containing thousands of files ok and was very pleased with the result.
I tend to do just this:
  • I open/save both files in Sigil, so that they have consistent folder schemes.
  • Unzip the EPUBs using your favorite zipping program (I personally use 7-zip).
  • Open up Beyond Compare, Choose Folder Compare.
  • Then press the little "Browse for Folder" button (highlighted in orange). I put the "Old" folder on the Left, and the "New" folder on the Right.

Click image for larger version

Name:	BeyondCompareStep1.png
Views:	209
Size:	47.0 KB
ID:	119053
  • Right Click on the file you want to compare.
    • Or if you want to compare multiple files, do your typical Ctrl+Click or Shift+Click to highlight many
  • After you right click the files, you can press "File Compare Report..."

Click image for larger version

Name:	BeyondCompareStep2.png
Views:	191
Size:	47.2 KB
ID:	119054

Click image for larger version

Name:	BeyondCompareStep3.png
Views:	195
Size:	9.8 KB
ID:	119055
  • After you click on "File Compare Report", this window pops up. (These are my settings)
    • Important: Make sure you have a checkmark in "Ignore unimportant". (See Side Note for explanation).
  • HTML Report will generate a Report just like in my attached files.
  • Press "Save As..." and save the HTML Report in an easy to reach location. (Perhaps your Desktop)
    • Alternate: If you press the "View in Browser" button, it will just open up the a temporary HTML file in your default browser, so you can quickly compare files to eachother.
  • If you save the Report, then you can easily attach it to an email, attach it to a post, send it to someone else, etc. etc.

Side Note: Ignoring Whitespace means this:

Code:
   <p>This is a sample paragraph</p>
and this

Code:
<p>This is a sample paragraph</p>
will be treated the same (notice in the first example, there is a few SPACES before "<p>". This space does NOTHING to the final output of the file. Whitespace just makes it look nicer to human eyes when they are reading the code.

Quote:
Originally Posted by Moonraker View Post
But I don't know where to start with comparing two html files.

Is there any chance you could do a quick idiots guide howto on your method of working?
If you wanted to just compare two files against eachother, press the "Text Compare" button in the main menu instead. Then you can click that little button up top (in the similar position to the "Browse for Folder" button in the first image above). Then you just choose the Left, and the Right files. (I personally think it is easier if the "Older" file is on the left, and the "Newer" file is on the right).

Quote:
Originally Posted by Moonraker View Post
Thank you
You are welcome. Please let me know if you need any more explanation, I would be glad to help (Although I probably should then split it off into a Tutorial in the Workshop).

Quote:
Originally Posted by HarryT View Post
You can use Calibre's "Compare books" facility to identify changes between two books, or two versions of a book.
I will definitely have to check it out one of these days. (Comparing two versions to see what is different is EXTREMELY helpful, and it is fantastic to see this is added to Calibre's Editor).

Side Note: One code comparison case in particular that was very interesting, is a book that I converted, "An Essay on the Nature of and Significance of Economic Science" by Lionel Robbins. I accidentally EPUBed the First Edition of the book, and then I noticed that a scan of the Second Edition was ALSO available. So I tackled an EPUB of the Second Edition as well. Then I was able to easily code compare between the two to see EXACTLY what was added/changed in the Second Edition.

Attached is:
  • A ZIP of the Comparison Report generated from Beyond Compare.
    • This is where you can see EXACTLY what text was added in the Second Edition of the book

I tried a lot of other "code comparison" programs many moons ago, and many of them seemed to fail MISERABLY when there was a large amount of text in <p>, or there were new paragraphs inserted in between other paragraphs. Beyond Compare seemed to work the best for me (although it still messes up on a few large paragraphs, or when there are a HUGE amount of major changes).
Attached Files
File Type: zip First vs. Second.zip (168.1 KB, 121 views)

Last edited by Tex2002ans; 02-14-2014 at 06:32 AM.
Tex2002ans is offline   Reply With Quote
Old 02-14-2014, 06:55 AM   #34
Moonraker
Addict
Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.Moonraker ought to be getting tired of karma fortunes by now.
 
Moonraker's Avatar
 
Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
Quote:
Fantastic! I am a programmer, so I also use it for its original purpose (comparing/merging code). It is an extremely powerful comparison program, and has much more powerful capabilities (such as merging files, versioning, etc. etc.).

Definitely is extremely complex/powerful for your typical user.
Tex2002ans thank you for your brilliant instructions and the time you must have taken on this.

I am committed to playing at the bridge club this afternoon (although I would much rather be playing with Beyond Compare using your methods). But I shall get straight on it when I arrive home tonight.

Many thanks and much appreciation to you.

Moonraker is offline   Reply With Quote
Old 02-16-2014, 04:36 PM   #35
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
@Tex:

I just wanted to say: when it comes to instructions, manuals, and toots, you really are the best. Not really concise, mind you: but as thorough as the day is long. I'll bet your books really rock. The books you make, that is. They must be a treat on the eyes and to the innards, when one looks at the guts.

I think you make an invaluable contribution here. I already K'ed you for this post, but wanted to say this where other people could see it.

Hitch
Hitch is offline   Reply With Quote
Old 02-17-2014, 06:47 AM   #36
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Moonraker View Post
Tex2002ans thank you for your brilliant instructions and the time you must have taken on this.


Quote:
Originally Posted by Hitch View Post
I just wanted to say: when it comes to instructions, manuals, and toots, you really are the best. Not really concise, mind you: but as thorough as the day is long.
Concise... BAH! Who needs that, when you need to learn HOW to use tools, WHAT they do, and WHY you would do it a certain way. (And also point out potential areas where you may be able to fork off/explore).

When you find some of these tutorials that just say explain how to do something along the lines of "Step 1: type this in, Step 2: DONE." Well then sure, you can do that ONE thing, but you don't know how it works or why, only that it DOES.

I guess that is part of the reason why a PDF -> EPUB tutorial would take me so long to write, it is an immense topic, and I want it to be thorough. I guess each post would have many forks/side notes/explanations... and that stuff takes time to write/organize.

I also spend about half an hour rereading the post AFTER it is posted, and iterate with little edits here and there.... and of course, after a shower/nap, you look back at your post and think of all this extra stuff to add. I would never be able to write a book...

Quote:
Originally Posted by Hitch View Post
I'll bet your books really rock. The books you make, that is. They must be a treat on the eyes and to the innards, when one looks at the guts.
Thanks, I take pride in the cleanliness/maintainability/readability of the code. Definitely much better for the long-run of the books (as I always stress, think of not just the EPUB/MOBI, but also for the formats BEYOND).

Another advantage of clean code is that it allows it to be easily code compared to other sources (like Project Gutenberg, pulling an HTML version off of a site, comparing it to a purchased edition, ...). I typically do A/B comparisons with other sources, and sometimes even C comparisons! For example, last week, I:
  • A: Pulled the Gutenberg version of the book
    • It was an early conversion which was riddled with errors
  • B: OCRed a PDF version derived from the Gutenberg source ~4 years ago
    • This company copyedited/removed a lot of mistakes

Now when I was on a final spellchecking pass, I noticed a lot of "typos". So I hunted down an archive.org scan of the original book... and I saw that these typos were actually French words, that were missing the accents.

So now, hopefully I get the goahead to d a thorough C pass, to see what missing accents I can catch. I already fixed up a bunch of the blatant mistakes... but this is one type of TYPOS that I just can't stand. I mean, missing accents in French words is a huge red flag in my book! So accents should be readded.

If you read a bunch of public domain stuff... you just can't trust that some of these conversions were done properly!

TYPOS MUST BE DESTROYED!

Quote:
Originally Posted by Hitch View Post
I think you make an invaluable contribution here. I already K'ed you for this post, but wanted to say this where other people could see it.
Thank you so much, I try to help where I can... I know that this information/discussion is also helpful to lurkers as well.

They don't post any questions, BUT, they do read/absorb. And maybe some of them were running into the same problems, or having the same sorts of ideas, and this discussion helps flesh them out. I know that many of my ideas were definitely effected by reading online debates as a third party lurker.

Quote:
Originally Posted by Hitch View Post
(Although, I don't think that plays into how I feel about the whole, "I'm gonna report a TYPO! And they'll run and FIX it!" thing. {Thinks}. Nope. I view that as a reader.)
You know what also popped into my head... Marvin, Mantano, Bluefire, (programs that are used to read on internet connected devices).... perhaps having a way to hash (CRC32? MD5?) a given EPUB/MOBI to get a Unique ID (of course, Amazon already has this centralized information when delivering files to Kindles).

Now, while the reader is reading, perhaps having a way to "tag" typos (just like highlighting). You should then be able to leave a comment, maybe be able to leave what you believe the fix should be. Then THIS gets submitted back to a database automatically. Sort of like submission of bug reports in a program.

Then this can be organized on a site somewhere for easy access (perhaps organize the hashes according to the metadata in the book as well. A given Book + Author, can have multiple hashes underneath).

Something similar: I use a site called AniDB. (It is used to organize episodes of anime). People initially submit hashes of episodes + metadata for them (information can become verified through trusted mods/users).

Click image for larger version

Name:	AniDBBeforeExpansion.png
Views:	203
Size:	31.7 KB
ID:	119193 Click image for larger version

Name:	AniDBExpansion.png
Views:	168
Size:	119.7 KB
ID:	119194

Image #1: This specific anime has 12 episodes.
Image #2: Each episode can be expanded to see the multiple versions available (from all different release groups) + much more in depth information on each file.

As a user of AniDB, you can then hash your files. If there is a match in the database, you know that you have the EXACT same file, and then you can just pull all the metadata from the database automatically. You can set up the client on your end to do things like:
  • Rename the files:
    • Include the anime/episode title in (Japanese, English, ....)
    • Include CRC32 (or a few other hashes)
    • Include the episode #.
  • Organize all your files into a given folder structure.
    • Maybe you prefer "\Title of Show\## - Title of Episode.mkv"
    • Maybe I prefer "\Title of Show\Title_of_Show_-_##_[CRC32].mkv"

So lets say something you downloaded a crappy file named "1.mkv". This file can be hashed, compared to the database, and renamed into something like this:

Before: 1.mkv
User Formula: Title_of_Anime_-_##_-_Title_of_Episode_in_English_[RelaseGroup][CRC32].mkv
After Renaming: Spice_and_Wolf_2_-_01_-_Wolf_and_an_Inadvertent_Rift_[a-S][FFD19242].mkv

In the typo database I propose, you would hash the book to see if you match any other previous database submissions, and then push your typos TO the server.

Maybe on the client side, the reading program can be built so that IF it finds a matched hash, AND there are typo/fix submissions, you can have it automatically apply them to the book. (Maybe have some rating system, where people vote stars on if this "typo fix" is correct, you can decide to apply ONLY 4 stars and above, or maybe only if three or more people ran into this same typo, ...)

I think that would be pretty cool. Of course this is all stuff that popped into my head tonight, it would need MAJOR fleshing out. One of the open source reading programs might be a fantastic place to start. Although knowing the people who Calibre convert their books (this would definitely mess up a hash system). It would mostly have to deal with ORIGINAL files (like untouched purchased files, ebooks DIRECTLY from Gutenberg/archive.org/Amazon/B&N, etc. etc.).

Last edited by Tex2002ans; 02-17-2014 at 06:55 AM.
Tex2002ans is offline   Reply With Quote
Old 02-17-2014, 03:49 PM   #37
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
I'm trying to decide if you're a mad genius, or if I should just hunt you down and kill you now, to save the world.

Hitch
Hitch is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to fix typos and bad formatting Uschiekid Kobo Reader 19 03-28-2013 09:09 PM
How to edit ebooks (fix typos etc.) while reading? MCSmarties Reading and Management 6 07-28-2012 05:08 PM
Fix for the annoying enter key "l" North19 enTourage eDGe 5 06-08-2011 09:02 AM
EU watchdogs descend on French publishers suspected of collusion to fix ebook prices Polyglot27 News 11 03-03-2011 02:15 PM
MS Fix for Sigil 3.0 crutledge Sigil 3 09-26-2010 03:56 AM


All times are GMT -4. The time now is 08:21 PM.


MobileRead.com is a privately owned, operated and funded community.