Quote:
Originally Posted by Vanguard3000
Hi, all. I've found out through a few other threads how to fix broken sentences left by conversions from PDF to ePub formats. Currently, I'm using:
Find: ([a-z])</p>\s+<p class="calibre2">
Replace: \1_
(The _ being a space)
I was wondering if there was a way to add something to skip over breaks where the first letter of the second line is a capital?
For example, I'd like to find this:
...blahblah</p>
<p class="calibre2">blahblah...
But not this:
...blahblah</p>
<p class="calibre2">Blahblah...
Basically, this would help me a lot while trying to fix things like scripts or screenplays, or books with multi-line chapter titles, such as:
CHAPTER 6: The Plot Thickens
Ottawa
Any help would be much appreciated. Thanks in advance.
|
No problem to do. Do this:
Find: ([a-z])</p>\s+<p class="calibre2">([a-z])
Replace: \1_\2
(Replacing underscore with a space). With matching case turned on of course
Note you may also want to join sentences ending in commas, colons, etc etc. That is why some of the other expressions in threads here are more complex than just looking for paragraphs ending with [a-z].