unicode characters problem
I have been having a problem with many txt files I try to run through this, where somewhere in the txt file is a unicode character the program cannot process.
I usually get a message like this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc6' in position 25014: ordinal not in range(128)
which I guess is a python related error? (I know very little about programming)
I can sometimes track down the character if I can understand the unicode for that character, I use Charactermap to find and replace it in Word. But often with an error like this I don't know how to locate the character.
Does anyone have any tips for me on either a macro I can run in word to remove characters txt2lrf cannot process,or how to use the error message to locate the offending character?
Are there any plans to "pre-screen" text files in txt2lrf to help remove those characters?
So far I have been finding them manually by deleting half the text, seeing if the error remains, then keep deleting half until I find it. Which is super slow.
Any help would be appreciated!
|