Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 06-22-2011, 10:35 PM   #1
remltr
Junior Member
remltr began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Jun 2011
Device: Kindle
Closing up line endings that occur in the middle of a sentence

I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.

An example of this would be:

The line ends here

but there was a page break or something else that caused the sentence to be split.


Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.

Any ideas?

Last edited by remltr; 06-22-2011 at 10:42 PM.
remltr is offline   Reply With Quote
Old 06-23-2011, 12:22 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Try reading the sticky at the top of this sub-forum - it covers this and many other points.

pdf conversion already does this for you, but there is a setting called the line unwrap factor in the pdf conversion options - for some books the unwrap factor isn't aggressive enough, just reduce the number a bit.
ldolse is offline   Reply With Quote
Advert
Old 06-23-2011, 12:25 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,791
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by remltr View Post
I have been converting a pdf book series and the only thing left to do to clean it up properly, without using Sigil line by line, would be an expression that would find the line endings that are in the middle of a sentence, thus not having punctuation, except for hyphens, usually caused by a page break in the pdf.

An example of this would be:

The line ends here

but there was a page break or something else that caused the sentence to be split.


Having an expression that would ignore punctuation that would either be a natural line ending or at least be natural looking (excepting hyphens of course) and then a replacement with a word space that closes the line up.

Any ideas?
set the line UnWrap factor to a slightly lower number and try again.

I cheat and just fix problems like that in Sigil
REGEX in Code view
Code:
([a-z])</p>\s+<p.+>
(set Case sensitive)
matches lower case letter just before a closing P tag followed by white spaces (newline incl) and a opening P tag
does not work with closing Quote marks,closing Span or DIV tags,

Code:
\1
There is a trailing space.

Not perfect, you need to tune to what you see in your code view
theducks is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting RTFs with "\" line endings to Epub. Archon Calibre 3 01-16-2011 01:13 PM
Suggestions for Happy Endings? jenieliser Reading Recommendations 27 10-06-2010 11:07 AM
Punctuation Dresden Calibre 7 08-31-2010 05:14 AM
removing hard line endings Mostly Math Calibre 2 06-01-2010 11:18 PM
Punctuation jgray Workshop 10 04-14-2010 07:38 AM


All times are GMT -4. The time now is 02:47 AM.


MobileRead.com is a privately owned, operated and funded community.