View Single Post
Old 02-24-2018, 05:01 PM   #15
sjfan
Addict
sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.sjfan ought to be getting tired of karma fortunes by now.
 
Posts: 281
Karma: 7724454
Join Date: Sep 2017
Location: Bethesda, MD, USA
Device: Kobo Aura H20, Kobo Clara HD
Quote:
Originally Posted by JSWolf View Post
If you are converting from PDF, then the solution to that is don't do it. It's not worth the hassle.

The problem is that if you have split dialog, you have other split lines. The only way to sort of fix some of it is to check for lack of punctuation at the end of the line and combine that with the next line. But the only way to fix it is to find a good PDF or pBook source and s/b compare. That's how it will be fixed.
I’m not sure why you’re hung up on PDFs in particular; creating a clean marked up copy—be it epub, HTML, LaTeX, whatever—from a PDF is no worse than doing so from a paper copy or scanned source. It may be better, depending on where the letterforms came from and whether the alternative requires OCR. It might even be better than starting from a plain text file, depending on how the latter handles line and page breaks and such.

It’s certainly true that you need to do a manual check of things to get everything right. But it’s still worth automating what you can. There are certainly cases like:
Quote:
“Call the general,” he said.

“We’re moving forward with the plan immediately.”
that are virtually impossible to automate; whether to merge those two lines depends on whether they're meant to be spoken by the same person or not. And some paragraphs will incorrectly split on a sentence border by happenstance; that you need to check by hand as well.

But you still want to automate what you can. It saves a lot of work and reduces error rates. If you take care of 90% cases mechanically, you’re mentally freer as you're reading through and proofing things. It allows you to focus your energy on the cases that really need some thought. And it reduces the chances that you’ll miss something.
sjfan is offline   Reply With Quote