View Single Post
Old 03-11-2014, 05:28 PM   #12
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Quote:
Originally Posted by kovidgoyal View Post
@arspr: You should never use initial and ending spaces. For one thing, how would you know how many spaces characters are there at the start, or if there are space characters at the end? Use \s+ or if for some reason you want to match only the single space character and not tabs or other white space characters, use [ ]
Why Kovid? Why are spaces "forbidden"? Why are they different from any other character?

Let's explain my usage and you'll see that not using spaces is pretty weird, while using them is pretty simple and straightforward. More over spaces DO work perfectly, what doesn't work is the history saving of my strings in the Find and Replace text boxes, just because those spaces are ignored and deleted.

In Spanish books, dialogue sentences like
Code:
<p>"Bla, Bla, Bla," John said. "More bla, bla, bla."</p>
are usually written in this way:
Code:
<p>—Bla, Bla, Bla, —John said—. More bla, bla, bla.</p>
And the problem is that between "—" and "John" or "said" and "—", (or even "—" and "."), you can get horrible line wraps if you are unlucky. Dashes are considered as possible wrap points in HTML, not as indivisible punctuation marks. (I suppose HTML was thought over English grammar, not Spanish one ). And it's just a matter of pages finding one of that horrible renderings.

So I use the next two searches and replacements systematically in my Spanish ebooks (I use quotes to make spaces visible):
  1. " (—[^ <]+)( |</p>)" replaced by " <span class="nw">\1</span>\2".
  2. " ([^ >]+—)(\.|\.\.\.|,|;|:|&hellip;|…)? " replaced by " <span class="nw">\1\2</span> ".
Where .nw{white-space:nowrap;}

And it works, I promise you. My only trouble is that my saved strings in the history have lost the initial/final spaces. I have to re-type or fix them every single time I use them.

More over I can understood your trick in the Find string where, instead of " ", you suggest using "[ ]", but, how do I type the replacement string? Are you suggesting:
  1. "([ ])(—[^ <]+)( |</p>)" replaced by "\1<span class="nw">\2</span>\3"
  2. "([ ])([^ >]+—)(\.|\.\.\.|,|;|:|&hellip;|…)?([ ])" replaced by "\1<span class="nw">\2\3</span>\4".
Why should I make it so complicated in order to avoid directly typing a space? What is so "dangerous" about spaces?
arspr is offline   Reply With Quote