Quote:
Originally Posted by remltr
I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.
An example of this would be:
The line ends here
but there was a page break or something else that caused the sentence to be split.
Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.
Any ideas?
|
set the line UnWrap factor to a slightly lower number and try again.
I cheat and just fix problems like that in Sigil

REGEX in Code view
Code:
([a-z])</p>\s+<p.+>
(set Case sensitive)
matches lower case letter just before a closing P tag followed by white spaces (newline incl) and a opening P tag
does not work with closing Quote marks,closing Span or DIV tags,
There is a trailing space.
Not perfect, you need to tune to what you see in your code view