View Single Post
Old 04-07-2010, 03:14 PM   #11
AbominableDavid
Enthusiast
AbominableDavid began at the beginning.
 
Posts: 25
Karma: 10
Join Date: Sep 2009
Location: Tennessee
Device: Kobo Aura HD
Quote:
Originally Posted by tyche View Post
Another big one is attaching broken paragraphs. The obvious detection is the fact that a hard return is followed by a lowercase letter (and vice versa). Using Word, a regex you can find these and either mass change the results or just find and fix. For example, things like lyrics or special messages would get caught in this find but you wouldn't want to attach the lines.

ex. a regex search for a hard return, ^13, and then a lowercase letter [a-z] would be ^13([a-z])

With the parenthesizes you can do a macro replace of what it found. Replace like this would be ' \1' --without the '. i.e. a space then \1. This will remove the return with a space and the lowercase letter it found will be added back to make the line join up.

Doing the search the other way, ([a-z])^13 helps find broken lines as well as missing endings like periods. It's replace format would be '\1 ' without the '
I've found many times badly formatted books that break paragraphs on commas, semicolons, and other punctuation marks. I use a regex something like this in Notepad++: [^."!?]</p>

This finds paragraphs that end in anything other than a period, a quote mark, an exclamation point, or a question mark. Of course, it has to be modified if the text uses curly quote marks, single quotes, or some other tag (like </span>, for instance) between the end of the text and the </p>
AbominableDavid is offline   Reply With Quote