MobileRead Forums - View Single Post

kiwidude · 01-06-2011, 02:44 AM

Quote:

Originally Posted by Vanguard3000

Hi, all. I've found out through a few other threads how to fix broken sentences left by conversions from PDF to ePub formats. Currently, I'm using:

Find: ([a-z])\s+
Replace: \1_

(The _ being a space)

I was wondering if there was a way to add something to skip over breaks where the first letter of the second line is a capital?

For example, I'd like to find this:

...blahblah

blahblah...

But not this:

...blahblah

Blahblah...

Basically, this would help me a lot while trying to fix things like scripts or screenplays, or books with multi-line chapter titles, such as:

CHAPTER 6: The Plot Thickens
Ottawa

Any help would be much appreciated. Thanks in advance.

No problem to do. Do this:
Find: ([a-z])\s+([a-z])
Replace: \1_\2
(Replacing underscore with a space). With matching case turned on of course

Note you may also want to join sentences ending in commas, colons, etc etc. That is why some of the other expressions in threads here are more complex than just looking for paragraphs ending with [a-z].