![]() |
#1 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 538
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
|
Removing excess carriage returns
I have some old txt files that I'm trying to switch to ebooks.
Many of them have sentences broken by carriage returns. E.g. "The sentence is fine, and in most cases paragraphs are in tact, but perhaps one in every five sentences contains a carriage return in the middle, which is mildly annoying when reading on my Cybook." The common factor is that there's a no punctuation before the carriage return. Is there any way to sort this out? I was thinking perhaps if I could get Calibre to delete any carriage returns that were not preceeded by .!? or ." !" ?" |
![]() |
![]() |
![]() |
#2 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 75
Karma: 204999
Join Date: Aug 2006
Location: London
|
The 'common factor' is probably that these misplaced carriage returns are followed by lowercase letters (not necessarily every single time - but mostly).
If you have MSWord or similar, you could try doing a search, or search and replace, for ^13[a-z]. bob |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
This bit of python code should work for what you want:
Code:
>>> f = open('test', 'rb+wb') >>> text = f.read() >>> text = text.replace('\n\r', '\n') >>> text = text.replace('\r', ' ') >>> text = text.replace('\n', '\n\r') >>> f.seek(0) >>> f.truncate(0) >>> f.write(text) >>> f.close() |
![]() |
![]() |
![]() |
#4 |
Wizzard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,402
Karma: 2000000
Join Date: Nov 2007
Location: UK
Device: iPad 2, iPhone 6s, Kindle Voyage & Kindle PaperWhite
|
Frankly, a decent text editor is all you'd need for this - something like UltraEdit or TextPad or similar that can handle regular expression replacements. In UE, it'd be something like Replace "\r\n([a-z])" with " \1".
|
![]() |
![]() |
![]() |
#5 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Don't forget to add a space! Or you'll be spell checking for days because you stuck two words together at each join.
It's easy to switch all occurrences of multiple spaces to one space, though, if you happen to double up. So first... Find: Code:
\r\n([a-z]) Code:
\s$1 Find: Code:
([a-z])\s+ Code:
$1\s Try it on a copy first. m a r |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 538
Karma: 469999
Join Date: Feb 2008
Location: Scotland
Device: Sony PRS-650 (PRS+ alpha - thanks Kartu!)
|
Thanks folks!
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Prodigal Returns | Prince Hal | Amazon Kindle | 2 | 03-03-2010 02:45 PM |
PRS-500 Returns! | squeakywheel | Sony Reader | 7 | 02-02-2010 03:27 PM |
Removing Returns, Preserving Paragraphs | Gideon | Workshop | 41 | 06-19-2009 05:07 AM |
Forcing carriage returns | KindleHog | Amazon Kindle | 3 | 05-01-2009 01:14 PM |
iRex returns | imagitronics | iRex | 0 | 01-03-2009 08:56 AM |