View Single Post
Old 06-14-2012, 08:32 AM   #7
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,659
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by elibrarian View Post
When I use regex to search for the full danish alphabet, I usually use [a-zæøå] or [A-ZÆØÅ]. Which of course doesn't find any other characters, accented or not, but they would not be part of the danish alphabet anyway ...
I find characters in english language books that are not from the english alphabet all the time... does this never happen in the danish books?

Why not just use \p{L} and catch all potential unicode letters? That's more than likely what people are intending to catch when they use [A-Za-z] anyway (whether they consciously realize it or not). Or do people purposely mean to exclude certain characters that occur in words like café or façade or naïve? Just a thought.

I just know I've found that when using "letters" for search criteria in a regexp on an english language text... thinking strictly in terms of "english letters" will often produce results I didn't really intend. The original topic of this thread is a perfect example of this. So I've learned to approach Regex Find & Replace from a "unicode first" frame of mind when it comes to ebooks.

Last edited by DiapDealer; 06-14-2012 at 08:36 AM.
DiapDealer is offline   Reply With Quote