View Single Post
Old 05-17-2009, 12:31 PM   #30
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
You sure did mention that. But I wasn't paying enough attention...

I was thinking about starting a regex thread -- but you should do it. I may have a couple to contribute, after seeing yours. For instance:
Quote:
([a-zI]+)’([a-z]+)
which I replace with:
Quote:
$1'$2
because it's an apostrophe, not a quote mark in words like: I'm, we'll, could've...

Is ' well-supported now?

I'm of the mind that you should quote and apostrophe, etc. with either the entity-name tags (in HTML) or with the ascii/unicode character (in text) and not mix them up. But I cannot find that in real life much. Since a blanket search replace of ' with does visually improve a text, I understand why it happens.

As for the '{space}" layout you mention, I do try to change things to that -- but the texts I find are not always so neat. Therefore, I have to do it the hard way sometimes.

Your regex was a little difficult to use on one text I did this afternoon: it used ’.” ’?” ’!” at the end of sentences and both ‘ ’ and “ ” unicode characters. A simple search/replace on individual characters probably would have been smarter -- and in fact I had to do that at the end. Then switch the rsquo and the punctuation. I've been rushing a bit to complete a goal, so I'm not taking enough time to figure it out ahead. (A set of 56 short stories [some are novellas] by a single author.)

But it might not be the regex -- I'm running Win2k in a virtual machine to support NoteTab, and who knows what that can lead to. At one point I was getting only part of what I would copy to the clipboard. (Restarted, of course.)

It's funny -- someone went to a lot of trouble to use curly quotes in this text, but did no work on mdashes vs. hyphens, ellipses, or to clean up blockquotes, or even spell-check thoroughly. Quite a haphazard use of italics, too. Weird.

m a r
rogue_ronin is offline   Reply With Quote