View Single Post
Old 07-19-2014, 04:30 PM   #38
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,887
Karma: 6120478
Join Date: Nov 2009
Device: many
Hi DiapDealer

The problem is the QWebView auto converts all internal named and numeric entries on the fly as soon as it is loaded. So in wysiwyg mode you will never see a nbsp as it will appear as just a space.

After pulling the html back out of qwebview you will have lost all entities (ie they were converted to their full unicode counterpart.

To partially deal with that, Sigil uses the CleanSource class which can invoke tidy, pretty print, etc. zone method of that class is given below:

Code:
QString CleanSource::NbspToEntity(const QString &source)
{
    QString new_source = source;
    new_source.replace(QChar(160), "*");  that * should be ampersand # 160;
    return new_source;
}
I think you can easily change that to the named entity if you want but I will have to look closer to be sure.

My idea would be to create a Prefs dialog with the 10 or so most common named entities which the user can enable or disable and expand this routine to replace the selected QChar values with those desired named entities (ie it would only preserve those entities as named entities).

A similar approach could be used for numeric entities as well.

The key is that this routine is invoked a lot and so must run quickly so supporting all possible entities would be hard.

I am thinking of a hashtable (dictionary) with char lookup value and replacement value or a simple range check and then subtraction of base value to get an offset to its replacement value.

That should make things fast even for 10 to 20 named entities.


What do you think? Again, I have just eyeballed this briefly, so I could be all wet here.

Kevin



Quote:
Originally Posted by DiapDealer View Post
Hey Kevin,

If you do look into preventing entity replacement, take a peek at this thread that documents the steps to reproduce 0.7.4 eating markup (regarding the   entity). Post #13 has the steps to duplicate.

I've just not been able to get my head around the codebase (C++ and Qt double-whammy). The problem originated in the changes from 0.7.3 to 0.7.4--where it was decided that   would be replaced with   to fix the problem of 0.7.3 barking about missing DOCTYPES (because Sigil would replace the unicode non-breaking space character with   in documents it was opening/importing when Tidy was turned off).

That bug is one of the main reasons I've held off on using 0.7.4. It's easy to make sure my epubs have a doctype before editing ... not so easy figuring out whether markup got silently eaten.

Maybe your work will make it go away.

Last edited by KevinH; 07-19-2014 at 04:39 PM.
KevinH is offline   Reply With Quote