Quote:
Originally Posted by Moonraker
Tex2002ans thank you for your brilliant instructions and the time you must have taken on this.
|
Quote:
Originally Posted by Hitch
I just wanted to say: when it comes to instructions, manuals, and toots, you really are the best. Not really concise, mind you: but as thorough as the day is long.
|
Concise... BAH! Who needs that, when you need to learn HOW to use tools, WHAT they do, and WHY you would do it a certain way. (And also point out potential areas where you may be able to fork off/explore).
When you find some of these tutorials that just say explain how to do something along the lines of "Step 1: type this in, Step 2: DONE." Well then sure, you can do that ONE thing, but you don't know how it works or why, only that it DOES.
I guess that is part of the reason why a PDF -> EPUB tutorial would take me so long to write, it is an immense topic, and I want it to be thorough. I guess each post would have many forks/side notes/explanations... and that stuff takes time to write/organize.
I also spend about half an hour rereading the post AFTER it is posted, and iterate with little edits here and there.... and of course, after a shower/nap, you look back at your post and think of all this extra stuff to add. I would never be able to write a book...
Quote:
Originally Posted by Hitch
I'll bet your books really rock. The books you make, that is. They must be a treat on the eyes and to the innards, when one looks at the guts.
|
Thanks, I take pride in the cleanliness/maintainability/readability of the code. Definitely much better for the long-run of the books (as I always stress, think of not just the EPUB/MOBI, but also for the formats BEYOND).
Another advantage of clean code is that it allows it to be easily code compared to other sources (like Project Gutenberg, pulling an HTML version off of a site, comparing it to a purchased edition, ...). I typically do A/B comparisons with other sources, and sometimes even C comparisons! For example, last week, I:
- A: Pulled the Gutenberg version of the book
- It was an early conversion which was riddled with errors
- B: OCRed a PDF version derived from the Gutenberg source ~4 years ago
- This company copyedited/removed a lot of mistakes
Now when I was on a final spellchecking pass, I noticed a lot of "typos". So I hunted down an archive.org scan of the original book... and I saw that these typos were actually French words, that were missing the accents.
So now, hopefully I get the goahead to d a thorough C pass, to see what missing accents I can catch. I already fixed up a bunch of the blatant mistakes... but this is one type of
TYPOS that I just can't stand. I mean, missing accents in French words is a huge red flag in my book! So accents should be readded.
If you read a bunch of public domain stuff... you just can't trust that some of these conversions were done properly!
TYPOS MUST BE DESTROYED!
Quote:
Originally Posted by Hitch
I think you make an invaluable contribution here. I already K'ed you for this post, but wanted to say this where other people could see it.
|
Thank you so much, I try to help where I can... I know that this information/discussion is also helpful to lurkers as well.
They don't post any questions, BUT, they do read/absorb. And maybe some of them were running into the same problems, or having the same sorts of ideas, and this discussion helps flesh them out. I know that many of my ideas were definitely effected by reading online debates as a third party lurker.
Quote:
Originally Posted by Hitch
(Although, I don't think that plays into how I feel about the whole, "I'm gonna report a TYPO! And they'll run and FIX it!" thing. {Thinks}. Nope. I view that as a reader.)
|
You know what also popped into my head... Marvin, Mantano, Bluefire, (programs that are used to read on internet connected devices).... perhaps having a way to hash (CRC32? MD5?) a given EPUB/MOBI to get a Unique ID (of course, Amazon already has this centralized information when delivering files to Kindles).
Now, while the reader is reading, perhaps having a way to "tag" typos (just like highlighting). You should then be able to leave a comment, maybe be able to leave what you believe the fix should be. Then THIS gets submitted back to a database automatically. Sort of like submission of bug reports in a program.
Then this can be organized on a site somewhere for easy access (perhaps organize the hashes according to the metadata in the book as well. A given Book + Author, can have multiple hashes underneath).
Something similar: I use a site called AniDB. (It is used to organize episodes of anime). People initially submit hashes of episodes + metadata for them (information can become verified through trusted mods/users).
Image #1: This specific anime has 12 episodes.
Image #2: Each episode can be expanded to see the multiple versions available (from all different release groups) + much more in depth information on each file.
As a user of AniDB, you can then hash your files. If there is a match in the database, you know that you have the EXACT same file, and then you can just pull all the metadata from the database automatically. You can set up the client on your end to do things like:
- Rename the files:
- Include the anime/episode title in (Japanese, English, ....)
- Include CRC32 (or a few other hashes)
- Include the episode #.
- Organize all your files into a given folder structure.
- Maybe you prefer "\Title of Show\## - Title of Episode.mkv"
- Maybe I prefer "\Title of Show\Title_of_Show_-_##_[CRC32].mkv"
So lets say something you downloaded a crappy file named "1.mkv". This file can be hashed, compared to the database, and renamed into something like this:
Before: 1.mkv
User Formula: Title_of_Anime_-_##_-_Title_of_Episode_in_English_[RelaseGroup][CRC32].mkv
After Renaming: Spice_and_Wolf_2_-_01_-_Wolf_and_an_Inadvertent_Rift_[a-S][FFD19242].mkv
In the typo database I propose, you would hash the book to see if you match any other previous database submissions, and then push your typos TO the server.
Maybe on the client side, the reading program can be built so that IF it finds a matched hash, AND there are typo/fix submissions, you can have it automatically apply them to the book. (Maybe have some rating system, where people vote stars on if this "typo fix" is correct, you can decide to apply ONLY 4 stars and above, or maybe only if three or more people ran into this same typo, ...)
I think that would be pretty cool. Of course this is all stuff that popped into my head tonight, it would need MAJOR fleshing out. One of the open source reading programs might be a fantastic place to start. Although knowing the people who Calibre convert their books (this would definitely mess up a hash system). It would mostly have to deal with ORIGINAL files (like untouched purchased files, ebooks DIRECTLY from Gutenberg/archive.org/Amazon/B&N, etc. etc.).