MobileRead Forums - View Single Post - Scrambling copyright ebooks to help troubleshoot problems ???

Jellby · 10-22-2015, 09:30 AM

Quote:

Originally Posted by jackie_w

LOWERS = list('abcdefghijklmnopqrstuvwxyz')
UPPERS = uppercase equivalent of LOWERS
DIGITS = list('0123456789')

I think that should work OK for European, Greek, Cyrillic alphabet languages but probably not for CJK and other Eastern alphabets.

I don't see how that would work for Greek or Cyrillic, given that there's no Greek or Cyrillic in LOWERS. Unless you mean extending your definition to include other alphabets.

Quote:

You mentioned 'unicode properties'. I'm open to suggestions for a better simple algorithm to include a wider variety of languages.

If you use python you could start here. That basically tells you the same as your LOWERS, UPPERS and DIGITS. I haven't really used that stuff, but it looks pretty straightforward. Some additional though might be needed to scramble non-ascii characters to other non-ascii characters in their same "group", I think it's easier to just scramble anything into ascii.

EDIT: Scrambling to non-ascii characters will probably cause problems with fonts: a font may a character for "é", but not for "þ" (even though they are in the same group). And any scrambling will cause problems with subset fonts.