Quote:
Originally Posted by jackie_w
LOWERS = list('abcdefghijklmnopqrstuvwxyz')
UPPERS = uppercase equivalent of LOWERS
DIGITS = list('0123456789')
I think that should work OK for European, Greek, Cyrillic alphabet languages but probably not for CJK and other Eastern alphabets.
|
I don't see how that would work for Greek or Cyrillic, given that there's no Greek or Cyrillic in LOWERS. Unless you mean extending your definition to include other alphabets.
Quote:
You mentioned 'unicode properties'. I'm open to suggestions for a better simple algorithm to include a wider variety of languages.
|
If you use python you could start
here. That basically tells you the same as your LOWERS, UPPERS and DIGITS. I haven't really used that stuff, but it looks pretty straightforward. Some additional though might be needed to scramble non-ascii characters to other non-ascii characters in their same "group", I think it's easier to just scramble anything into ascii.
EDIT: Scrambling to non-ascii characters will probably cause problems with fonts: a font may a character for "é", but not for "þ" (even though they are in the same group). And any scrambling will cause problems with subset fonts.