View Single Post
Old 06-29-2010, 02:41 PM   #17
nyrath
Addict
nyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfoldednyrath reads XML... blindfolded
 
nyrath's Avatar
 
Posts: 281
Karma: 52007
Join Date: Jun 2010
Device: nook
I recently had another similar problem, which was not Sigil's fault.
I imported a text file, and it was full of bizarre character. There was no BOM marker at the start of the text file, and translating the file into UTF-8 did nothing.

So I looked at one of the garbage characters with a hex editor, and did some research on Google.

As it turns out, the file was in this obsolete format called "Mac OS Roman" or "Macroman".

Lucky for me, Python comes with a codec that understands MacRoman. So I wrote a quick script that opened the file with codecs.open('myfile.txt', "r", 'mac-roman'), read it in, and wrote it out into an output file created with codecs.open('myfile_CLEAN.txt', 'w', 'utf-8') Problem solved.
nyrath is offline   Reply With Quote