Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 05-17-2009, 07:30 AM   #1
Halk
Fanatic
Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.
 
Halk's Avatar
 
Posts: 513
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
Removing excess carriage returns

I have some old txt files that I'm trying to switch to ebooks.

Many of them have sentences broken by carriage returns.
E.g.
"The sentence is fine, and in most cases paragraphs are in tact, but perhaps one
in every five sentences contains a carriage return in the middle, which is mildly annoying when reading on my Cybook."

The common factor is that there's a no punctuation before the carriage return. Is there any way to sort this out? I was thinking perhaps if I could get Calibre to delete any carriage returns that were not preceeded by .!? or ." !" ?"
Halk is offline   Reply With Quote
Old 05-17-2009, 07:54 AM   #2
comtrjl
Connoisseur
comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.comtrjl ought to be getting tired of karma fortunes by now.
 
Posts: 75
Karma: 204999
Join Date: Aug 2006
Location: London
The 'common factor' is probably that these misplaced carriage returns are followed by lowercase letters (not necessarily every single time - but mostly).
If you have MSWord or similar, you could try doing a search, or search and replace, for ^13[a-z].
bob
comtrjl is offline   Reply With Quote
 
Enthusiast
Old 05-17-2009, 08:44 AM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,438
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
This bit of python code should work for what you want:

Code:
>>> f = open('test', 'rb+wb')
>>> text = f.read()
>>> text = text.replace('\n\r', '\n')
>>> text = text.replace('\r', ' ')
>>> text = text.replace('\n', '\n\r')
>>> f.seek(0)
>>> f.truncate(0)
>>> f.write(text)
>>> f.close()
\n\r is the standard newline indicator for Windows system. We replace it with \n which is used on Unix systems. This is so we can replace all single occurances of \r with a single space. Then we put the \n's back to \n\r's.
user_none is offline   Reply With Quote
Old 05-17-2009, 08:55 AM   #4
gwynevans
Wizard
gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.gwynevans ought to be getting tired of karma fortunes by now.
 
gwynevans's Avatar
 
Posts: 1,343
Karma: 1065246
Join Date: Nov 2007
Location: UK
Device: Sony 505 (retired), iPad2, iPhone 3GS & Nexus 7 3G
Quote:
Originally Posted by Halk View Post
The common factor is that there's a no punctuation before the carriage return. Is there any way to sort this out? I was thinking perhaps if I could get Calibre to delete any carriage returns that were not preceeded by .!? or ." !" ?"
Frankly, a decent text editor is all you'd need for this - something like UltraEdit or TextPad or similar that can handle regular expression replacements. In UE, it'd be something like Replace "\r\n([a-z])" with " \1".
gwynevans is offline   Reply With Quote
Old 05-17-2009, 12:47 PM   #5
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Don't forget to add a space! Or you'll be spell checking for days because you stuck two words together at each join.

It's easy to switch all occurrences of multiple spaces to one space, though, if you happen to double up. So first...

Find:
Code:
\r\n([a-z])
Replace:
Code:
\s$1
which will concatenate your lines, then...

Find:
Code:
([a-z])\s+
Replace:
Code:
$1\s
The second regex above should preserve punctuation that has two spaces (or more) following it. It also won't find the extra space that follows should a hyphen, colon, semi-colon, etc., or some other non-lower-case-letter, somehow be at the end of a line that's joined.

Try it on a copy first.

m a r
rogue_ronin is offline   Reply With Quote
Old 05-17-2009, 02:35 PM   #6
Halk
Fanatic
Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.Halk ought to be getting tired of karma fortunes by now.
 
Halk's Avatar
 
Posts: 513
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
Thanks folks!
Halk is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
The Prodigal Returns Prince Hal Amazon Kindle 2 03-03-2010 02:45 PM
PRS-500 Returns! squeakywheel Sony Reader 7 02-02-2010 03:27 PM
Removing Returns, Preserving Paragraphs Gideon Workshop 41 06-19-2009 05:07 AM
Forcing carriage returns KindleHog Amazon Kindle 3 05-01-2009 01:14 PM
iRex returns imagitronics iRex 0 01-03-2009 08:56 AM


All times are GMT -4. The time now is 04:22 AM.


MobileRead.com is a privately owned, operated and funded community.