View Single Post
Old 11-19-2019, 10:33 AM   #16
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,805
Karma: 6000000
Join Date: Nov 2009
Device: many
The very first time an epub file is loaded we run it through the following code to figure out its encoding, convert it to utf-8, and change the line endings ...

See: Sigil/src/Misc/HTMLEncodingResolver.cpp

Code:
// Accepts a full path to an HTML file.
// Reads the file, detects the encoding
// and returns the text converted to Unicode.
QString HTMLEncodingResolver::ReadHTMLFile(const QString &fullfilepath)
{
    QFile file(fullfilepath);

    // Check if we can open the file
    if (!file.open(QFile::ReadOnly)) {
        std::string msg = file.fileName().toStdString() + ": " + file.errorString().toStdString();
        throw (CannotOpenFile(msg));
    }

    QByteArray data = file.readAll();

    if (IsValidUtf8(data)) {
        data.replace("\xC2\xA0", " ");
    }

    return Utility::ConvertLineEndings(GetCodecForHTML(data)->toUnicode(data));
}
I think this is the culprit. It is what is special casing the nbsp. We could remove this manual conversion and instead pass it through PreserveEntities here instead to always set the files on first input to have only the entities the user specified.

I think it was a holdover from an earlier time that we never saw since we used to always run mend on every file to do the universal updates which always ran things through PreserveEntities.

How do you want to handle this? If we add in PreserveEntities code here instead of manuall setting that one, at least the epub will present itself with the entities the user expects.

KevinH



Quote:
Originally Posted by DiapDealer View Post
Has anyone else been able to duplicate this issue? Is the no-break-space being special-cased here?
KevinH is offline   Reply With Quote