MobileRead Forums - View Single Post - Scrambling copyright ebooks to help troubleshoot problems ???

jackie_w · 10-22-2015, 09:09 AM

Quote:

Originally Posted by Jellby

That may be OK for English (or even other languages with Latin-based alphabet), but applying that rule for Arabic, Japanese or Greek books will mean almost nothing is scrambled.

I say scramble all letters and digits (use Unicode properties to determine what's a letter or digit), at least by default. If those characters were part of the problem, then the scrambled book will not show the problem, and that can be used for debugging.

The current logic for scrambling is:

if a char has different upper- and lower-case versions - scramble to a value from LOWERS, adjusting to retain case.
if a char is a digit - scramble to a value from DIGITS
otherwise leave as-is

where
LOWERS = list('abcdefghijklmnopqrstuvwxyz')
UPPERS = uppercase equivalent of LOWERS
DIGITS = list('0123456789')

I think that should work OK for European, Greek, Cyrillic alphabet languages but probably not for CJK and other Eastern alphabets. I don't have any detailed knowledge about non-Latin alphabets.

If/when calibre-plugin-ised, I could envision giving the user limited control of what's in the LOWERS list (e.g. single character, word, phrase) if that might be useful. In which case, adding some special type-able chars of choice should also be possible.

You mentioned 'unicode properties'. I'm open to suggestions for a better simple algorithm to include a wider variety of languages.