View Single Post
Old 04-23-2017, 12:43 PM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,739
Karma: 24031403
Join Date: Dec 2010
Device: Kindle PW2
@CalibUser: I tested the plugin with a UTF-8 text file and it didn't decode it correctly.
Since Sigil comes with bs4, I'd recommend using soup.original_encoding to detect the original encoding.

For example:

Code:
from sigil_bs4 import BeautifulSoup

def run(bk):
    # more code...
    with open(fHandle.name, "rb") as binary_file:
        data = binary_file.read()
        soup = BeautifulSoup(data) 
        print(soup.original_encoding)
        return -1
        # more code...
The above code correctly identified my UTF-8 test file. Of course, if you use bs4 as a filter, you might as well use str(soup) to convert an input file with unknown encoding to UTF-8.
Doitsu is offline   Reply With Quote