|06-22-2011, 11:35 PM||#1|
Join Date: Jun 2011
Closing up line endings that occur in the middle of a sentence
I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.
An example of this would be:
The line ends here
but there was a page break or something else that caused the sentence to be split.
Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.
Last edited by remltr; 06-22-2011 at 11:42 PM.
|06-23-2011, 01:22 AM||#2|
Join Date: Apr 2009
Device: PRS-650, iPhone
Try reading the sticky at the top of this sub-forum - it covers this and many other points.
pdf conversion already does this for you, but there is a setting called the line unwrap factor in the pdf conversion options - for some books the unwrap factor isn't aggressive enough, just reduce the number a bit.
|06-23-2011, 01:25 AM||#3|
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
I cheat and just fix problems like that in Sigil
REGEX in Code view
matches lower case letter just before a closing P tag followed by white spaces (newline incl) and a opening P tag
does not work with closing Quote marks,closing Span or DIV tags,
Not perfect, you need to tune to what you see in your code view
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Converting RTFs with "\" line endings to Epub.||Archon||Calibre||3||01-16-2011 02:13 PM|
|Suggestions for Happy Endings?||jenieliser||Reading Recommendations||27||10-06-2010 12:07 PM|
|Punctuation||Dresden||Calibre||7||08-31-2010 06:14 AM|
|removing hard line endings||Mostly Math||Calibre||2||06-02-2010 12:18 AM|
|Punctuation||jgray||Workshop||10||04-14-2010 08:38 AM|