|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#31 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
|
Beyond Compare
Quote:
I am finding it rather overwhelming at present. I have managed to compare two folders, each containing thousands of files ok and was very pleased with the result. But I don't know where to start with comparing two html files. Is there any chance you could do a quick idiots guide howto on your method of working? Thank you ![]() |
|
![]() |
![]() |
![]() |
#32 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
|
![]() |
![]() |
![]() |
#33 | ||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
Definitely is extremely complex/powerful for your typical user. Quote:
Side Note: Ignoring Whitespace means this: Code:
<p>This is a sample paragraph</p> Code:
<p>This is a sample paragraph</p> Quote:
You are welcome. Please let me know if you need any more explanation, I would be glad to help (Although I probably should then split it off into a Tutorial in the Workshop). Quote:
Side Note: One code comparison case in particular that was very interesting, is a book that I converted, "An Essay on the Nature of and Significance of Economic Science" by Lionel Robbins. I accidentally EPUBed the First Edition of the book, and then I noticed that a scan of the Second Edition was ALSO available. So I tackled an EPUB of the Second Edition as well. Then I was able to easily code compare between the two to see EXACTLY what was added/changed in the Second Edition. Attached is:
I tried a lot of other "code comparison" programs many moons ago, and many of them seemed to fail MISERABLY when there was a large amount of text in <p>, or there were new paragraphs inserted in between other paragraphs. Beyond Compare seemed to work the best for me (although it still messes up on a few large paragraphs, or when there are a HUGE amount of major changes). Last edited by Tex2002ans; 02-14-2014 at 06:32 AM. |
||||
![]() |
![]() |
![]() |
#34 | |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 314
Karma: 1002965
Join Date: Mar 2006
Location: UK
Device: ILiad. Gen 3, PocketBook 360, Kobo Aura HD, Kindle Oasis 2
|
Quote:
I am committed to playing at the bridge club this afternoon (although I would much rather be playing with Beyond Compare using your methods). But I shall get straight on it when I arrive home tonight. Many thanks and much appreciation to you. ![]() |
|
![]() |
![]() |
![]() |
#35 |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
@Tex:
I just wanted to say: when it comes to instructions, manuals, and toots, you really are the best. Not really concise, mind you: but as thorough as the day is long. I'll bet your books really rock. The books you make, that is. They must be a treat on the eyes and to the innards, when one looks at the guts. I think you make an invaluable contribution here. I already K'ed you for this post, but wanted to say this where other people could see it. Hitch |
![]() |
![]() |
![]() |
#36 | |||||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
Quote:
![]() Quote:
When you find some of these tutorials that just say explain how to do something along the lines of "Step 1: type this in, Step 2: DONE." Well then sure, you can do that ONE thing, but you don't know how it works or why, only that it DOES. I guess that is part of the reason why a PDF -> EPUB tutorial would take me so long to write, it is an immense topic, and I want it to be thorough. I guess each post would have many forks/side notes/explanations... and that stuff takes time to write/organize. I also spend about half an hour rereading the post AFTER it is posted, and iterate with little edits here and there.... and of course, after a shower/nap, you look back at your post and think of all this extra stuff to add. I would never be able to write a book... ![]() Quote:
Another advantage of clean code is that it allows it to be easily code compared to other sources (like Project Gutenberg, pulling an HTML version off of a site, comparing it to a purchased edition, ...). I typically do A/B comparisons with other sources, and sometimes even C comparisons! For example, last week, I:
Now when I was on a final spellchecking pass, I noticed a lot of "typos". So I hunted down an archive.org scan of the original book... and I saw that these typos were actually French words, that were missing the accents. So now, hopefully I get the goahead to d a thorough C pass, to see what missing accents I can catch. I already fixed up a bunch of the blatant mistakes... but this is one type of TYPOS that I just can't stand. I mean, missing accents in French words is a huge red flag in my book! So accents should be readded. ![]() If you read a bunch of public domain stuff... you just can't trust that some of these conversions were done properly! TYPOS MUST BE DESTROYED! ![]() Quote:
They don't post any questions, BUT, they do read/absorb. And maybe some of them were running into the same problems, or having the same sorts of ideas, and this discussion helps flesh them out. I know that many of my ideas were definitely effected by reading online debates as a third party lurker. Quote:
Now, while the reader is reading, perhaps having a way to "tag" typos (just like highlighting). You should then be able to leave a comment, maybe be able to leave what you believe the fix should be. Then THIS gets submitted back to a database automatically. Sort of like submission of bug reports in a program. Then this can be organized on a site somewhere for easy access (perhaps organize the hashes according to the metadata in the book as well. A given Book + Author, can have multiple hashes underneath). Something similar: I use a site called AniDB. (It is used to organize episodes of anime). People initially submit hashes of episodes + metadata for them (information can become verified through trusted mods/users). Image #1: This specific anime has 12 episodes. Image #2: Each episode can be expanded to see the multiple versions available (from all different release groups) + much more in depth information on each file. As a user of AniDB, you can then hash your files. If there is a match in the database, you know that you have the EXACT same file, and then you can just pull all the metadata from the database automatically. You can set up the client on your end to do things like:
So lets say something you downloaded a crappy file named "1.mkv". This file can be hashed, compared to the database, and renamed into something like this: Before: 1.mkv User Formula: Title_of_Anime_-_##_-_Title_of_Episode_in_English_[RelaseGroup][CRC32].mkv After Renaming: Spice_and_Wolf_2_-_01_-_Wolf_and_an_Inadvertent_Rift_[a-S][FFD19242].mkv In the typo database I propose, you would hash the book to see if you match any other previous database submissions, and then push your typos TO the server. Maybe on the client side, the reading program can be built so that IF it finds a matched hash, AND there are typo/fix submissions, you can have it automatically apply them to the book. (Maybe have some rating system, where people vote stars on if this "typo fix" is correct, you can decide to apply ONLY 4 stars and above, or maybe only if three or more people ran into this same typo, ...) I think that would be pretty cool. Of course this is all stuff that popped into my head tonight, it would need MAJOR fleshing out. One of the open source reading programs might be a fantastic place to start. Although knowing the people who Calibre convert their books (this would definitely mess up a hash system). It would mostly have to deal with ORIGINAL files (like untouched purchased files, ebooks DIRECTLY from Gutenberg/archive.org/Amazon/B&N, etc. etc.). Last edited by Tex2002ans; 02-17-2014 at 06:55 AM. |
|||||
![]() |
![]() |
![]() |
#37 |
Bookmaker & Cat Slave
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
|
I'm trying to decide if you're a mad genius, or if I should just hunt you down and kill you now, to save the world.
Hitch |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Trying to fix typos and bad formatting | Uschiekid | Kobo Reader | 19 | 03-28-2013 09:09 PM |
How to edit ebooks (fix typos etc.) while reading? | MCSmarties | Reading and Management | 6 | 07-28-2012 05:08 PM |
Fix for the annoying enter key "l" | North19 | enTourage eDGe | 5 | 06-08-2011 09:02 AM |
EU watchdogs descend on French publishers suspected of collusion to fix ebook prices | Polyglot27 | News | 11 | 03-03-2011 02:15 PM |
MS Fix for Sigil 3.0 | crutledge | Sigil | 3 | 09-26-2010 03:56 AM |