View Single Post
Old 11-11-2014, 06:28 PM   #1
Psymon
Chief Bohemian Misfit
Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.Psymon ought to be getting tired of karma fortunes by now.
 
Psymon's Avatar
 
Posts: 571
Karma: 462964
Join Date: May 2013
Device: iPad, ADE
Using regex for more elegant hyphenation and word wrap

Wow, I only just learned today what "regex" means -- I've seen it here and there in different programs, but never had a clue what exactly it was for until now (duh) -- and what world of possibilities it might open up for me in simplifying a couple of things that I've been very laboriously doing "manually" so far. I've been reading up all afternoon on regex, though, and I'm still confused on how to go about doing what I want to do, so I hope someone out there can help me come up with the right regex expressions to use.

Basically there's two separate things that I've been doing in order to make my books a little more "elegant," typographically.

PROBLEM #1 - More Selective Hyphenation

I'm primarily an iPad user (forgive me), but I hate the way that it automatically hyphenates words willy-nilly all over the place, even shorter words that didn't need to be, and so what I did to counter that was initially turn hyphenation off in my book completely, by adding this in my styles (wherever I wanted hyphenation to be turned off)...

Code:
-webkit-hyphens:none;
-epub-hyphens:none;
-moz-hyphens:none;
adobe-hyphenate: none;
hyphens:none;
As far as I know, that covers "everything," i.e. whatever devices would allow me to turn hyphenation off to begin with. And then I created a class so that I could selectively turn hyphenation on wherever I would find that it's problematic (that is, where I would find that certain lines would have a ridiculous amount of white space between words, because longer words wouldn't hyphenate)...

Code:
.hyph {
hyphens:auto;
-webkit-hyphens:auto;
-epub-hyphens:auto;
-moz-hyphens:auto;
adobe-hyphenate: auto;
}
And then, call me crazy, but I actually went through my whole, entire book(s) looking for "problematic words", and wrapping them with that class...

Code:
<p>Here's a paragraph with an <span class="hyph">unreasonably</span> long word.</p>
As you can imagine, this is an enormous amount of work to pore over the entire book, almost word-for-word, but now I'm thinking that there surely must be an easy way to do a simple search & replace using regex -- but after spending the whole afternoon trying to figure out how, I can't seem to come up with the right expression to use.

What I'd like to search for is something to the effect of this...

[space] + [a word with at least 8 characters] + [a space OR any number of alphanumeric characters]

...and then for the replace function I want to wrap <span class="hyph"></span> around the 8+ character word and -- if it's not too ridiculous a thing to ask -- ALSO any number of punctuation marks that might come after it, but NOT if it's a space, then just close the span right after the word. If this latter is getting too weird, then wrapping it around just the word would be fine, too. The point of searching for a [space] before the word is because if, say, it's a long word at the very beginning of a paragraph (<p>), then obviously that doesn't need to be hyphenated (unless the first word happened to be "supercalifragilisticexpialidocious" or something).

Does that make sense, what I'm trying to do here? I'm having some problems grasping this regex stuff more generally, just for starters, but the biggest thing I can't figure out is how to search for words that would be 8 characters or longer (and ignore all shorter words).

PROBLEM #2 - Selectively Preventing Word Wrap

Another "typographically-annoying" thing is whenever a line happens to end with the first word of a new sentence (or a phrase after punctuation mark) which starts with a single-letter word -- which, as far as I can come up with, would be "I" or "A" or "a", or, in rarer instances, "O".

Here's a made-up example of an especially annoying paragraph...

Quote:
This is an example paragraph for you. I
hate having the "I" at the end of the line
and want it to wrap with the next word.
This should also take into account punc-
tuation, too, for example if I said, "O,
how nice this would be!", or if, say, I
was using a colon or semi-colon: a
phrase starting with "a" (and coming at
the end of a line, just like I just had here)
would be annoying, too.
So what would be nice to have a regex expression for would be to search for...

[any punctuation mark] + [space] + "I" + [space OR punctuation mark + a space] + [word of 5 characters or less, but not longer]

...and then replace that by wrapping the "I" and the following word (if it's 5 characters or less) with <span class="nowrap"></span>, where the "nowrap" class is...

Code:
.nowrap {
white-space: nowrap;
}
The reason that I only want to include words (i.e. the second word) that are only 5 characters or less is because if they're longer than that, well, then you're running into the same potential issue as with the hyphenation issue outlined above, and you'd be better off just letting the "I" (or "a" or whatever) alone, and put up with it being at the end of the line, if it turns out that way.

I hope you all don't think I'm crazy for nit-picking over hyphenation and word wrap like this, but, well, maybe I actually am crazy. Nevertheless, I've been doing this "manually" all along so far, and wow, what an enormous time saver it would make if I could come up with a regex expression that could do this with a simple search & replace instead! I spent the whole afternoon trying to figure this out, though, I just can't seem to come up with how to do it, though, what expressions I would use.

Can anyone help?

Last edited by Psymon; 11-11-2014 at 06:31 PM.
Psymon is offline   Reply With Quote