View Single Post
Old 12-04-2017, 09:37 PM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,625
Karma: 3120635
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Writing spaces for regex

Hi

I am looking for advice on this.

\x20 Plain space

Recently a friend of mine told me it could be convenient to use \x20 in regex to refer to plain spaces.

The \x20 code means just plain space in hexadecimal speak. It's just slightly more restrictive than \s.
In regex mode you can use it litterally in both search and replacement fields which means that you can publish regex exactly as you use them while:
- using \s does not work in the replacement field
- mimicking space (for example with underscore _) when you publish a regex means that you must add a warning about it
- a plain blank space can easily be forgotten specially if it's set at the end line of a regex

The \x20 code is recognized with Sigil (PCRE) and the Calibre editor (Python) in regex mode and you can test it on regex 101 site (https://regex101.com/) in both flavours.

\xa0 No-break space

The hexadecimal \xa0 code refers to the no-break space. Sigil has a problem with it.

It works with the Calibre editor in both search and replacement fields in regex mode. The replacement from &#_160; to \xa0 using PCRE flavour is also working on regex101.

Unhappily it becomes just a plain space when used in the replacement field of a regex with Sigil. This is quite dangerous because it means that just one replacement could make all your no-break spaces disappear (Oldest users have been there before..).

As the \x20 (see above) seems to work quite well, I wonder why the \xa0 does not play well with Sigil. Is this due to the infamous upstream Qt bug? Is it possible to make it work?

Last edited by roger64; 12-04-2017 at 09:47 PM. Reason: Oldest
roger64 is offline   Reply With Quote