View Single Post
Old 01-12-2014, 03:38 PM   #40
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by skreutzer View Post
Well, I would at least still like to know the following:



You don't have to answer if you don't like, but with the experience and the working environment you're in, I would indeed consider some hints about it valuable.
I don't seem to be able to communicate very effectively with you. I don't know if that's my failure, or your obdurancy. I'm truly not trying to be argumentative, but I don't much like wasting my time, either. You've pretty cavalierly blown off my five years of experience in dealing with the very marketplace you say you're aiming at, and in another post blow off the 90% of users on Windows software. I'm just not sure, at this point, how my input can be useful.

And when you say, "For the topic of my initial question, I would really like to hear how you clean the input from authors in terms of structure, and if/how the added structural information is used for a later automated processing," I'm not really sure what you're asking me. Are you asking me, generally, what do we do, or proprietarily, what do we do?

Generally, we do what everyone else does, I presume, that's competent. The first step is an "either or."
  1. Depending upon the Word file, either I take a first pass at in IN Word, sometimes using Tox's ePUBTools, or we export it directly to HTML;
  2. Then we use proprietary clips/programs, to clean the HTML further.
  3. If it's Barb, who loves her some PERL, she uses a series of PERL clips that she's compiled, essentially, in the "accrued" sense of the word, to clean the HTML; if it's pretty much anyone else, we use a series of clips in NTPro to clean the HTML.
  4. Basically, either searches for "the usual;" we search for garbage spans, eyeball them, remove them if indeed garbage, or replace them if they are the typical "span class = italic" with inline italic/em tags.
  5. We search for broken dialogues and broken paragraphs, and all the usual crap that everbody searches for. Section breaks, page breaks, multiple uses of the enter key to create vertical whitespace, all the normal stuff. Places where the user used "Normal 18pt Bold" for a header, instead of a header class.
  6. Then, once the base file is cleaned up, we do the eyeball portions.
  7. The eyeball portions are looking for those styles that the user may or may not have created, and implementing them in the XHTML.
  8. For example, we get a LOT of books that have a style tag of "MSONormal," but in the actual manuscript, the user hit the "tab" key 3 times to create an indented text message. That won't export into HTML, about...75% of the time.
  9. We find those and manually "fix" them to have the correct CSS. If the user created a half-baked style for it, we do a simple F&R with a class of CSS.
  10. Once the file is basically cleaned and ready to go, we import it into Sigil, and finish it there. Divide the chapters at the chapter markers (those are all put in with our clips/PERL during pre-processing); eyeball the NCX; eyeball anything else; and then finish up the file.
  11. Once that phase is done, and the client has approved the ePUB, we tweak it some more (proprietary) and prep it to be fed to KindleGen to make a MOBI.
  12. That's it. Nothing more or less. Just boring, repetitive, tedious human work.

It's not rocket science, it's just repetitive tedium. The problem is, as I see it, no two authors make the SAME mistake in the SAME way each time. Sometimes, errata is "MSONormal center bold" for a header; sometimes it's "MSONormal 18pt Bold" for that same header. Sometimes, we get files that are simply inexplicable, as to how styling that's in there doesn't show up in Styles, and doesn't show up in exported HTML (very common with any Pages-->Word "conversions" output).

It would all be swell and good if authors had 5 styles to pick from, used 'em, and that was that. You could automate the process and Bob's yer uncle. But that's not what they are accustomed to, and that's not what they want. As Tox points out better than I could in his post, there are SCADS of tools out there already, that would work better (from a conversion standpoint), that authors already don't want to use. {shrug}.

I just think that you are expecting right-brainers to somehow magically see the advantages of working in a left-brained environment, and my experience, for what it's worth, is that that ain't ever gonna happen. Not only are they utterly disinterested in what is going on behind the scenes, they don't WANT to know, don't CARE to know, and somehow, think it makes them less creative if they understand the "how." This is my experience. Feel free to ignore it. However, of all the phone calls I take, I cannot tell you how many tell me either, that 'I'm not good with computers," or, even worse, "I'm really very tecchie but I need help with this," the latter of which means that when the time comes for that person to download a file from a browser interface, the s**t will hit the fan. Nor do they know where their downloads folder is, or how to drag-and-drop. That's what that last sentence means. I'm not disparaging them, but your idea just seems to utterly ignore the reality of a writer's inner world. That's how I interpret what you've said thus far. Sort of, "well, I'm making this tool for myself (which is fine; that doesn't faze me), and for writers, and if they don't want to learn it, the hell with them."

And if that's the entire gist--that you'll make it for yourself, and if anyone else wants to use it, they can--then great. But if you're asking everyone here for input and assistance and feedback, for a tool to be used widely, that would, purportedly, make OUR jobs easier, then you need to ALSO be open to the fact that maybe some of us might have a little more experience in the real-world environment in which you expect this to function. Just an idea.

Quote:
Mixing ideology with functionality is almost always a recipe for disaster--whether in open or proprietary projects. I, myself, am not really interested in coding projects that can't (relatively) easily be compiled and/or run on the three major platforms. That's why I've always appreciated Sigil so much.
+100.

Hitch
Hitch is offline   Reply With Quote