11-19-2019, 10:33 AM | #16 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
The very first time an epub file is loaded we run it through the following code to figure out its encoding, convert it to utf-8, and change the line endings ...
See: Sigil/src/Misc/HTMLEncodingResolver.cpp Code:
// Accepts a full path to an HTML file. // Reads the file, detects the encoding // and returns the text converted to Unicode. QString HTMLEncodingResolver::ReadHTMLFile(const QString &fullfilepath) { QFile file(fullfilepath); // Check if we can open the file if (!file.open(QFile::ReadOnly)) { std::string msg = file.fileName().toStdString() + ": " + file.errorString().toStdString(); throw (CannotOpenFile(msg)); } QByteArray data = file.readAll(); if (IsValidUtf8(data)) { data.replace("\xC2\xA0", " "); } return Utility::ConvertLineEndings(GetCodecForHTML(data)->toUnicode(data)); } I think it was a holdover from an earlier time that we never saw since we used to always run mend on every file to do the universal updates which always ran things through PreserveEntities. How do you want to handle this? If we add in PreserveEntities code here instead of manuall setting that one, at least the epub will present itself with the entities the user expects. KevinH |
11-19-2019, 10:43 AM | #17 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Actually running it through PreserveEntities without gumbo being involved will not help. Anything that was an entity will stay an entity.
So perhaps we simply just delete this forced entity conversion and let the user decide when to run Mend to get only the entities they want in every file. KevinH |
Advert | |
|
11-19-2019, 11:01 AM | #18 | |
Bibliophagist
Posts: 35,238
Karma: 145277352
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
What? This isn't a democracy??? |
|
11-19-2019, 11:43 AM | #19 |
Grand Sorcerer
Posts: 27,542
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
11-19-2019, 11:43 AM | #20 |
Grand Sorcerer
Posts: 27,542
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
|
Advert | |
|
11-19-2019, 11:53 AM | #21 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
will do
|
11-19-2019, 12:47 PM | #22 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
change pushed to master
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Metadata oddities | MSWallack | Marvin | 3 | 11-20-2014 01:55 AM |
Catalog oddities | tamhas | Library Management | 7 | 07-25-2014 10:55 AM |
decimal entities in ePub instead of character entities | epub4ever | Calibre | 4 | 04-20-2012 02:27 AM |
Anachronism or other oddities | Hellmark | General Discussions | 34 | 05-03-2011 01:28 PM |