|05-19-2014, 07:56 PM||#1|
Lovin' the e-life!!!
Join Date: Jul 2009
Location: Pacific Northwet
Device: iPad Mini and Samsung Galaxy Tab 2
cleaning up extra returns formatting mess
I know this is a frequently discussed problem: Extra returns and returns in the middle of paragraphs
I'm a speed reader and choppy pages are a pet peeve!
I searched and found several good suggestions for cleaning these up (i.e. using Find & Replace in Word, using ^p for finding paragraph returns.)
This works fantastically in most cases; however, what "replace code" can I use when the paragraph return symbol is a little crooked down arrow and not the normal paragraph symbol used in Word.
I converted an .lrf to an .rtf file in Calibre, and all the returns, including the extra ones, use the arrow symbol. I tried cut&paste, but it doesn't register.
Is there a secret code for the arrow, like ^p for paragraph?
|05-19-2014, 11:57 PM||#2|
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Why not try the Edit Book feature in calibre? Once you get the hang of it, you will never look back. It is infinitely more controllable.
|05-20-2014, 01:58 AM||#3|
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
|05-20-2014, 02:44 AM||#4|
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
You are not looking for MSWord, you are looking for "Regular Expressions". Word has only limited abilities comparing to other tools, like Calibre. With the Regular expressions you can say:
Begin group one '\('
find one character that is not from this list: [.?!"']
End of group one '\)'
followed by an end-of-line-symbol \n
Replace with the contents of the group one followed by a space. '\1 '
In regular expression syntax that is something like
Unfortunately there are several dialects of RE, you will have to look it up in documentation. For example "begin a group that I will later refer to as '\1' (or '\2' and so on if it is second or a third group) is sometimes '\(' and sometimes just '('
You see, most of the linebreaks that do not follow: [.?!"] are not at the end of paragraph.
This is very quick and dirty, but can clean an OCRed book from unwanted line breaks with 99% accuracy
Regular expressions can look very intimidating if you just look at a complex one, but they are well worth learning. Calibre and many other advanced tools support them and you can start with a very simple ones and gradually write more and more complex REs. They will still be relatively difficult to read, because the metacharacter set is very dense so they can fit inside "search" and "replace" fields, but much easier to write after a bit of practice.
|05-20-2014, 08:23 AM||#6|
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-300, PRS-T1
@momtodogs: If you want to clean Word documents from OCR mess, I can advise you to look at my addin (see signature). It will catch a lot of OCR mistakes and either repairs them automatically or manually. It has various steps, one of them is a large list of S&R requests (an example list is available). It would also catch things like smarten punctuation and missing dialogue marks (and many other things). As a bonus, you can even export it to ePUB.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Check Library, Extra Authors, Extra Titles||copyrite||Calibre||2||08-03-2012 01:35 PM|
|Calibre extra unwanted carriage returns in PCB file conversion||gragradownunder||Conversion||0||05-12-2011 06:57 AM|
|Why define a paragraph as a span with no different or extra formatting?||bfollowell||ePub||7||03-16-2011 10:30 PM|
|Stripping extra line returns||jwhayn||Sony Reader||3||02-27-2010 06:46 PM|