The very first time an epub file is loaded we run it through the following code to figure out its encoding, convert it to utf-8, and change the line endings ...
See: Sigil/src/Misc/HTMLEncodingResolver.cpp
Code:
// Accepts a full path to an HTML file.
// Reads the file, detects the encoding
// and returns the text converted to Unicode.
QString HTMLEncodingResolver::ReadHTMLFile(const QString &fullfilepath)
{
QFile file(fullfilepath);
// Check if we can open the file
if (!file.open(QFile::ReadOnly)) {
std::string msg = file.fileName().toStdString() + ": " + file.errorString().toStdString();
throw (CannotOpenFile(msg));
}
QByteArray data = file.readAll();
if (IsValidUtf8(data)) {
data.replace("\xC2\xA0", " ");
}
return Utility::ConvertLineEndings(GetCodecForHTML(data)->toUnicode(data));
}
I think this is the culprit. It is what is special casing the nbsp. We could remove this manual conversion and instead pass it through PreserveEntities here instead to always set the files on first input to have only the entities the user specified.
I think it was a holdover from an earlier time that we never saw since we used to always run mend on every file to do the universal updates which always ran things through PreserveEntities.
How do you want to handle this? If we add in PreserveEntities code here instead of manuall setting that one, at least the epub will present itself with the entities the user expects.
KevinH
Quote:
Originally Posted by DiapDealer
Has anyone else been able to duplicate this issue? Is the no-break-space being special-cased here?
|