View Single Post
Old 08-10-2016, 01:00 AM   #2
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by AlexBell View Post
I've been sent a doc file to turn into an ebook. My usual practice is to run the doc file through Atlantis Word Processor to turn it in to an HTML file, and go on from there.

But the author, for reasons best known to her, has ended every line of text with a carriage return, so Atlantis turns every line of text into a paragraph.

Also, the author has not separated 'real' paragraphs in the text.

Can anyone suggest a way to remove to get rid of the excess carriage returns/paragraphs? I vaguely remember a tool with which one could select a block of text, then press a key combination to remove all the <p> and </p> tags except the ones starting and finishing the block. But I can't remember which software it was in. Any suggestions?
Oh, dear.

We see that quite often. I just saw several like that. By the time I finish explaining about broken paragraphs, what it takes to clean them, etc., the prospective client's eyes have glazed over, and they usually leave to find some other bookmaker that doesn't bother them with that codswallop (to quote one rather infamous near-miss client).

nb: I'm actually afraid to ask you about the nature/topic of the book. I'm afraid one of those that came through my door have ended up at yours!!!

We use in-house regex. It's the best way. Do one pass for those that have two in a row--(last line of a real para, and an empty para), and then for one, and then, sadly, you have to do the rest by hand/eye. Particularly surrounding those that break across pages, of course (if this was a scan). Can't really be done automatically.

On a commercial note, I hope that a) you asked them what crappy "auto-convert" program they used to give you this utterly FAKE Word file ($5 says it is an export from Adobe Acrobat--"save as Word"), or it's the output from a scan, or some bollocks like that, and b) that you are CHARGING to do all this extra work. That stuff is total nonsense. Your rates, presumably, are like ours--from a CLEAN source file, if using a word-processing file, right?

Seriously--if you're like us, you charge one rate for "from Word" and something a lot more expensive "from PDF" and so on. We frequently get this "faux-Word files," with prospectives thinking that we can't TELL that it was a PDF five minutes ago. Ask for the actual source--probably easier for you, and more expensive for them, but you should be paid for the actual time you're putting in.

Sheesh.

Hitch
Hitch is offline   Reply With Quote