MobileRead Forums - View Single Post

rogue_ronin · 05-17-2009, 01:31 PM

You sure did mention that. But I wasn't paying enough attention...

I was thinking about starting a regex thread -- but you should do it. I may have a couple to contribute, after seeing yours. For instance:

Quote:

([a-zI]+)’([a-z]+)

which I replace with:

Quote:

$1'$2

because it's an apostrophe, not a quote mark in words like: I'm, we'll, could've...

Is ' well-supported now?

I'm of the mind that you should quote and apostrophe, etc. with either the entity-name tags (in HTML) or with the ascii/unicode character (in text) and not mix them up. But I cannot find that in real life much. Since a blanket search replace of ' with ’ does visually improve a text, I understand why it happens.

As for the '{space}" layout you mention, I do try to change things to that -- but the texts I find are not always so neat. Therefore, I have to do it the hard way sometimes.

Your regex was a little difficult to use on one text I did this afternoon: it used ’.” ’?” ’!” at the end of sentences and both ‘ ’ and “ ” unicode characters. A simple search/replace on individual characters probably would have been smarter -- and in fact I had to do that at the end. Then switch the rsquo and the punctuation. I've been rushing a bit to complete a goal, so I'm not taking enough time to figure it out ahead. (A set of 56 short stories [some are novellas] by a single author.)

But it might not be the regex -- I'm running Win2k in a virtual machine to support NoteTab, and who knows what that can lead to. At one point I was getting only part of what I would copy to the clipboard. (Restarted, of course.)

It's funny -- someone went to a lot of trouble to use curly quotes in this text, but did no work on mdashes vs. hyphens, ellipses, or to clean up blockquotes, or even spell-check thoroughly. Quite a haphazard use of italics, too. Weird.

m a r