MobileRead Forums - View Single Post

DiapDealer · 11-18-2020, 11:13 AM

The confusing part is that I'm reading the KindleUnpack OPF file in as utf-8 encoded, and I'm writing it back as utf-8 encoded String.encode('utf-8'). Since encode defaults to "error" for characters it can't deal with, I don't understand how the bad characters are getting written to the file in the first place. Yet whenever I open the file in a text editor, there's the warning--bigger than life--that it contains characters that are incompatible with the encoding. .encoding(str, 'replace') has no effect, and .encoding(str, 'error') causes no abend. So I'm really at a loss as to how these illegal characters are getting written to the opf in the first place.

I can replace the \0 and \1 characters easily enough (they seem to be the principle offenders), but that doesn't strike me as a very thorough or future-proof approach.

11-18-2020, 11:13 AM	#172
DiapDealer Grand Sorcerer Posts: 28,908 Karma: 207182180 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	The confusing part is that I'm reading the KindleUnpack OPF file in as utf-8 encoded, and I'm writing it back as utf-8 encoded String.encode('utf-8'). Since encode defaults to "error" for characters it can't deal with, I don't understand how the bad characters are getting written to the file in the first place. Yet whenever I open the file in a text editor, there's the warning--bigger than life--that it contains characters that are incompatible with the encoding. .encoding(str, 'replace') has no effect, and .encoding(str, 'error') causes no abend. So I'm really at a loss as to how these illegal characters are getting written to the opf in the first place. I can replace the \0 and \1 characters easily enough (they seem to be the principle offenders), but that doesn't strike me as a very thorough or future-proof approach.