View Single Post
Old 11-18-2020, 10:13 AM   #172
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,652
Karma: 205022288
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
The confusing part is that I'm reading the KindleUnpack OPF file in as utf-8 encoded, and I'm writing it back as utf-8 encoded String.encode('utf-8'). Since encode defaults to "error" for characters it can't deal with, I don't understand how the bad characters are getting written to the file in the first place. Yet whenever I open the file in a text editor, there's the warning--bigger than life--that it contains characters that are incompatible with the encoding. .encoding(str, 'replace') has no effect, and .encoding(str, 'error') causes no abend. So I'm really at a loss as to how these illegal characters are getting written to the opf in the first place.

I can replace the \0 and \1 characters easily enough (they seem to be the principle offenders), but that doesn't strike me as a very thorough or future-proof approach.
DiapDealer is offline   Reply With Quote