|05-17-2009, 08:30 AM||#1|
Join Date: Feb 2008
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
Removing excess carriage returns
I have some old txt files that I'm trying to switch to ebooks.
Many of them have sentences broken by carriage returns.
"The sentence is fine, and in most cases paragraphs are in tact, but perhaps one
in every five sentences contains a carriage return in the middle, which is mildly annoying when reading on my Cybook."
The common factor is that there's a no punctuation before the carriage return. Is there any way to sort this out? I was thinking perhaps if I could get Calibre to delete any carriage returns that were not preceeded by .!? or ." !" ?"
|05-17-2009, 08:54 AM||#2|
Join Date: Aug 2006
The 'common factor' is probably that these misplaced carriage returns are followed by lowercase letters (not necessarily every single time - but mostly).
If you have MSWord or similar, you could try doing a search, or search and replace, for ^13[a-z].
|05-17-2009, 09:44 AM||#3|
Sigil & calibre developer
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
This bit of python code should work for what you want:
>>> f = open('test', 'rb+wb') >>> text = f.read() >>> text = text.replace('\n\r', '\n') >>> text = text.replace('\r', ' ') >>> text = text.replace('\n', '\n\r') >>> f.seek(0) >>> f.truncate(0) >>> f.write(text) >>> f.close()
|05-17-2009, 09:55 AM||#4|
Join Date: Nov 2007
Device: Sony 505 (retired), iPad2, iPhone 3GS & Nexus 7 3G
|05-17-2009, 01:47 PM||#5|
Join Date: Sep 2008
Device: Nokia 770 (fbreader)
Don't forget to add a space! Or you'll be spell checking for days because you stuck two words together at each join.
It's easy to switch all occurrences of multiple spaces to one space, though, if you happen to double up. So first...
Try it on a copy first.
m a r
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|The Prodigal Returns||Prince Hal||Amazon Kindle||2||03-03-2010 03:45 PM|
|PRS-500 Returns!||squeakywheel||Sony Reader||7||02-02-2010 04:27 PM|
|Removing Returns, Preserving Paragraphs||Gideon||Workshop||41||06-19-2009 06:07 AM|
|Forcing carriage returns||KindleHog||Amazon Kindle||3||05-01-2009 02:14 PM|
|iRex returns||imagitronics||iRex||0||01-03-2009 09:56 AM|