Originally Posted by cjallan
I clean up with Word, and that works well most of the time, and I prefer to "clean up", rather than resort to "clear all" -- which gets rid of all clutter, but also cleans out needed codes -- but I had a file some time ago with errors that eluded detection.
I wondered how you might have handled it... how would you have found, and how would you correct errors such as this:
I "attempted" to attach a scrap of the file, showing the error, but am not sure it stuck.
It is copied below... although of course the different text does not show in this message, but it does describe the problem.
Text from file:
The day Angel came for him, Walter Winkler was standing with Leggs outside the Mary Magdalene Shelter for Homeless Youth. They were watching Medic One pull away with Abra inside.
The word “outside” in the sentence above, was a correction entered by the author, using a different font (11 pt Times) from the font of the existing text (11 pt Calibri).
The font of the word “outside” is actually slightly smaller than the rest of the sentence, but being “almost invisible”, it was not noticed until after the book was published… then was found by a purchaser of the book.
How would you find errors like this?
One error, of course, is not a problem... but how would you handle a file that has many errors such as this?
Hey, Ceej: ;-)
Actually--for us, it would never have been an issue. (n.b.: when working from Word, for most folks it would be--I'm not sure how you'd know easily that it was there, but before I was working purely in html, I used to put up the Styles Box and use Options-->display styles in use--> and then set it to display fonts, to find out if there were other mystery fonts in play. You can also do that for paragraph-level formatting, which helps spot special text for formatting, like newspaper clippings, etc. If you do that on your sample, you'll see the 4 font level styles, and you can see where it's used, and whether it's intentional.)
Anyway: we scan initially (in html, of course) for fonts, to determine if there's a potential problem--we search for font tags--because of course, if an author does not know that K7 doesn't support multiple fonts, and has used same, we have to discuss it before it goes to my Crews. We would have seen it off the bat. If the font screw-ups are inadvertent, we run macros (actually: these are NoteTabPro clips and PERL scripts) that "mark" or "tag" all the font STYLES, i.e., italics, bold, etc., but nuke everything else and sets the paragraph style to "normal," which we then style however we intend. These also do a lot of "other" cleanup; searches for broken paras and repairs them; removes all the span tags that, Lord love a duck, show up around EVERY word when we get one of those "We Can Convert Your PDF To Word, No Problem!" end-products, clean out tabs and replace them with whatever is needed (or naught), or the usual span tags we get from either Word, Adobe Acrobat, or INDD PDF's. I honestly forget now everything it searches for and cleans--quote marks, apostrophes (to named entities), etc., but we added each, item-by-item, over the last 3 years, as we all got tired of having to clean them manually (or worse--forgetting about them!) or edit post-prod. OH! Of course--optional hyphens, bye-bye. ;-)
On intake, for quoting--before the Crews ever see the book, I pop it open and scan it for the usual bizarreness. For fiction/plain paragraphical narration: Broken paras, line breaks instead of paras, any type of additional paragraph type that needs annotation (song lyrics, poems, journal entries, newspaper clippings, what-have-you), fonts, multiple instances in sequence of tab characters, multiple instances of empty paragraphs (4 in a row or more), any sort of fleurons, other graphics, yadda. I asses it at quoting and I make fairly thorough notes for each title. I do this +/- 20 times a day, about this time of night, in fact, when the crews are gone for the day, the phones are quiet and I can hear myself think.
For moderate or complex formatting that's a whole other scan; I look for percentages, mostly; does the book have more or less than 20% or 40% of pages that have design elements, i.e., bulleted lists, charts, graphics, tables, columns (oh, joy), worksheets (ditto), video, audio, equations, recipes, inline images, FONTS, any non-supported characters (for example: we just did a book in Romanian; we had to test the K7, to ensure that the Latin-extended font we embedded would show up, which mercifully, it did, and hell, no, don't ask me how that worked!), Outline-numbered items (multi-level indents, in other words), and all the other mysterious stuff that can show up in print books that are neck-breaking to put into eBooks, particularly MOBI format.
Does that help at all? Did the stuff I told you about using the Styles box in Word make sense to you, to find the fonts and also to see the number of paragraph styles?
(ETA: Oddly enough, of all the weird things, this very
thing happened tonight; I had a file in, clean as a WHISTLE, turned on the Styles and lo, there were 3 odd fonts showing up, which had been applied to one character, 3 characters and two empty paras, respectively. Haven't seen it in ages. Synchronicity is a very, very