Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-22-2011, 11:35 PM   #1
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
Closing up line endings that occur in the middle of a sentence

I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.

An example of this would be:

The line ends here

but there was a page break or something else that caused the sentence to be split.


Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.

Any ideas?

Last edited by remltr; 06-22-2011 at 11:42 PM.
remltr is offline   Reply With Quote
Old 06-23-2011, 01:22 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Try reading the sticky at the top of this sub-forum - it covers this and many other points.

pdf conversion already does this for you, but there is a setting called the line unwrap factor in the pdf conversion options - for some books the unwrap factor isn't aggressive enough, just reduce the number a bit.
ldolse is offline   Reply With Quote
 
Advertisement
Old 06-23-2011, 01:25 AM   #3
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,292
Karma: 6022733
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by remltr View Post
I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.

An example of this would be:

The line ends here

but there was a page break or something else that caused the sentence to be split.


Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.

Any ideas?
set the line UnWrap factor to a slightly lower number and try again.

I cheat and just fix problems like that in Sigil
REGEX in Code view
Code:
([a-z])</p>\s+<p.+>
(set Case sensitive)
matches lower case letter just before a closing P tag followed by white spaces (newline incl) and a opening P tag
does not work with closing Quote marks,closing Span or DIV tags,

Code:
\1
There is a trailing space.

Not perfect, you need to tune to what you see in your code view
theducks is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting RTFs with "\" line endings to Epub. Archon Calibre 3 01-16-2011 02:13 PM
Suggestions for Happy Endings? jenieliser Reading Recommendations 27 10-06-2010 12:07 PM
Punctuation Dresden Calibre 7 08-31-2010 06:14 AM
removing hard line endings Mostly Math Calibre 2 06-02-2010 12:18 AM
Punctuation jgray Workshop 10 04-14-2010 08:38 AM


All times are GMT -4. The time now is 04:49 PM.


MobileRead.com is a privately owned, operated and funded community.