|
![]() |
|
Thread Tools | Search this Thread |
![]() |
#1 |
Hmm.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 124
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
|
How to make regex to replace 2 spaces between words, with one space?
Sigil 0.8.7 on Windows 8.
When I paste a text file into Sigil it does lots of formatting for me, that's great. But many times I end up with 2 spaces between words, and in the Sigil preview window, these 2 spaces are not compressed into 1 space, which I thought XHTML would do, just like HTML. So, I want to replace 2 spaces (or more) that are between words, with one single space. Example where _ is a space. Code:
Good__morning_today. Code:
Good_morning_today. Code:
_____<p>This is the beginning of a paragraph.</p> |
![]() |
![]() |
![]() |
#2 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,361
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
First off: Sigil Preview should render multiple space characters as a single space. That's the way all (x)html works. If it's not, it means that 1) the double spaces are inside a <pre> tag which indicates all whitespace is to be preserved; 2) same as #1, but "pre" is assigned through css (adobe products are notorious for this); or 3) the space characters are special no-breaking unicode characters. A fourth scenario is that spaces are being converted to entities when pasting with formatting. Look in code view to check.
But all that aside ... the Captain Overkill in me, would use something like: Code:
(*UCP)\b[^\S\p{Zl}\p{Zp}\n\r\t]{2,}\b But that's just me. ![]() And even that's still not going to work for situations like: Code:
Sometimes_punctuation_like_this,__will_screw_things_up. Code:
(*UCP)(\b|\p{P})[^\S\p{Zl}\p{Zp}\n\r\t]{2,}(\b|\p{P}) Though that may not always achieve the desired result--depending on the text. The bottom line is: don't blindly do a replace all. Step through each instance and verify the replace. It basically looks for word boundaries (\b - made unicode aware by (*UCP)) and looks for two or more consecutive whitespace characters (not including any newlines, returns, tabs, or unicode paragraph/line separators) between them. ** And yes ... that's "NOT not whitespace" logic in there. ![]() People who think they don't have to worry about any possible unicode characters or punctuation issues could probably get away with: Code:
\b[^\S\n\r\t]{2,}\b Last edited by DiapDealer; 10-29-2015 at 09:11 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
A Hairy Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,313
Karma: 20171571
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 15/11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Then there is the very simplified version:
Search: " " (two spaces) Replace: " " (one space) When you save, it'll put all the organizing white space back in there so your paragraphs all line up. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,361
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
But that doesn't meet the OP's requirement that the solution must leave indentation alone. Yes, pretty-print will put indentation back when you save, but what if the user has unchecked the Clean on Save/Open option(s) or disabled pretty-print altogether? (Did I mention Captain Overkill?
![]() Last edited by DiapDealer; 10-29-2015 at 11:46 AM. |
![]() |
![]() |
![]() |
#5 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,912
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
right-click Reformat HTML: clean source
will put back any missing code indention The space before opening Block tags (<p <div <h# ...) is for Humans, just like the blank lines between Code:
<p>stuff</p>p>more stuff</p> |
![]() |
![]() |
Advert | |
|
![]() |
Tags |
regex, space |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Missing spaces between words | giwqnbha | Calibre | 2 | 10-18-2015 05:24 AM |
regex - issue with spaces? | cybmole | Editor | 43 | 12-31-2013 12:49 PM |
Regex Find and Replace - Spaces | essayhead | Sigil | 2 | 08-10-2012 07:41 PM |
Troubleshooting can't make any spaces between words in my novel. | fantaxy | Amazon Kindle | 2 | 08-03-2011 10:38 AM |
RegEx: Removing Page Numbers that have Spaces | captainslow | Conversion | 2 | 02-27-2011 04:14 PM |