MobileRead Forums - View Single Post

KIE18 · 02-28-2023, 12:54 AM

Quote:

Originally Posted by isarl

If you are on Linux then the iconv utility is likely already installed for you and can perform character encoding conversions. An outline of the process would probably be something like:

1. Explode/unzip your ebook into a temporary directory.

2. For each file in the temporary directory which has the wrong encoding, convert it to UTF-8. NOTE: if you are not careful about which encodings you use, iconv can introduce more encoding errors. Example follows. Note that the curly quotes I use are UTF-8 encoded characters; after running the echo command, “example” is UTF-8 encoded.

Code:

# This example demonstrates an encoding error! Be careful not to do this by mistake.
$ echo '“Hello, world!”' >example
$ iconv -f cp1252 -t utf-8 example
â€œHello, world!â€iconv: illegal input sequence at position 18

Here is some bash-like shell code that might work, if you have chardetect available (on my system, this executable is provided by the python-chardet package):

Code:

for file in $(find . -type f) ; do
    if [[ $(chardetect "$file" --minimal) == "Windows-1252" ]] ; then
        iconv -f cp1252 -t utf-8 "$f" -o "$f.utf8"
        mv "$f.utf8" "$f"
    fi
done

3. Zip your book back up.

Character encodings can be a pain in the butt. I hope this helps.

I am using windows 10.