02-16-2014, 08:07 AM | #1 |
Dead account. Bye
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
|
Stress tests with unicode "special" characters
Just for fun, I've started playing a little with unicode characters and I think I've found some "bugs"/possible enhancements.
First one. It really seems a true bug. Example given here (in Wikipedia>Unicode in Spanish). Some characters, like accented letters or Spanish "ñ" can be built as a single character ("ñ" is U+00F1) or as a combination of "n" (U+006E) plus a combining tilde (U+0303). Ok, the problem is that if you do the second option, the next character is also rendered ABOVE that compound "ñ" in the Editor. Look at the attached screenshot where the "<" from "</p>" is rendered over the last "ñ". The first "ñ" is a true single one (U+00F1). Second one. 50% of being a bug. If I type "ñ" in the search text box, I Find/Reaplace one match: the true "ñ" (U+00F1) but not the combined one. BUT if I Count All/Replace All, I get both matches. Incoherency here? Third one. Clearly a feature request. Imagine you've solved issue #1. Nevertheless this kind of situations, (or the presence of a soft hyphen (U+00AD), and possibly of other "hidden" characters), is a PITA for editing, because you get no visual clue of their presence. For example in the starting "cubierta" word I've added a soft hyphen which you cannot see. So if I try looking for "cubierta" I don't get any match... If possible, I would like to have a toggle option much like the "paragraph icon" button in MS-Word. This button/preference would cause all this kind of "hidden" symbols to be explicitly rendered with arbitrary but related characters. (I say arbitrary because their "true" rendering is invisible). As possible examples:
What do you think about these issues/possible enhancements? |
02-16-2014, 08:38 AM | #2 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Dont ever type combining symbols like letters and their accents separately. Just because it is possible in unicode does not mean it is a good idea. It causes all the problems you just discovered. In fact, the editor automatically normalizes all such cases into their combined form when first loading the text.
As for invisible characters, there are lots and lots of those. The problem with replacing them with visible characters is that you then have to keep track of when those visible characters come from the replacement and when they are part of the original document. That's not really do-able robustly. |
Advert | |
|
02-16-2014, 01:28 PM | #3 | ||
Dead account. Bye
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
|
Quote:
Quote:
Please remember that I don't want a full replacement but just for viewing. I mean the real HTML text always contains a soft hyphen (or any other one) despite how it is rendered in the editor. But nevertheless, maybe even just ONE symbol for telling the user "invisible character here" would be just great. That symbol and the current identification tool we have in the lower right corner would be more than enough. And maybe you cannot keep track of all the possible invisible items, but maybe you could create a "list" with the ones you currently know and update it as best as you can. If the editor catches one symbol and it is in the "list", the "invisible character here" replacement would be rendered (if selected by the user in the related preference or whichever other method). Of course, there's a risk of missing some or lots of them. But showing just some of them is much times better (about infinity) than showing none. Just thoughts, (possibly nonsense...) |
||
02-16-2014, 02:26 PM | #4 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
I thought of that too, but it could be pretty heavily colored with some of the junk I work with. It is worthy of some consideration though.
|
02-16-2014, 07:34 PM | #5 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
There is no "view" in the editor. The only way to get the editor to display hidden characters is to actually replace the hidden characters with whatever symbolic character you need. At that point you have to keep track of every such replacement, so you can undo them when saving the text.
Or re-implement the entire QPlainTextEdit control from scratch to support a special rendering mode for non-visible characters. One hackish way that you *may* be able to achieve this is to use a special font that has a symbol glyph for hidden characters. However, that will only work if Qt text layout engine allows the font to determine the rendering of non-visible characters, which may not be the case. |
Advert | |
|
02-17-2014, 04:29 PM | #6 |
Dead account. Bye
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
|
Continuing noob brainstorming mode, so expect more nonsense per square inch than in your favourite sitcom...
(But just in case something of the stupid things I'm going to say fires up some of your brain gears by sheer luck). I start with a question and a needed hypothesis. Whenever any change in the text is made, do you know what modifications are made just as "absolute positions" in the text? I mean, if I type and insert "whatever", are you able to know if "whatever" is inserted in the 98th position of the whole text? Or if I copy/paste "whatever" erasing "whatever other text", are you able to detect that the positions 101st to 120th of the text are being erased and substituted by a new set of just 8 characters? Because if the answer is "Yes, I (Kovid) can easily know the positions of the modifications that are constantly being made through any of the possible editing actions", then maybe the approach to implement this useful feature could be the next one:
In this way you wouldn't need to keep track of the exact substitutions made between "real" and "visibilized" copies. You just have to be careful about replicating exactly the same modifications on both copies of the text. Please do not be too harsh with your favourite buffoon... |
02-17-2014, 09:54 PM | #7 |
creator of calibre
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No, QPlainTextEdit does not expose detailed change information. You know when the text changes, not exactly what the change was. And in any case maintaing two copies of the text that would need to be synchronized would double memory consumption and be very slow.
The only workable solution is for this is to change the rendering code to simply render invisible characters visibily. But that requires modifying the guts of Qt's text rendering system and is waaaay too much work. |
02-19-2014, 10:08 AM | #8 |
Dead account. Bye
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
|
Thx for your explanations Kovid.
A pity, it's so impossible... |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Support of Special Unicode Characters? | gawl | ePub | 6 | 03-27-2013 02:41 PM |
Support of Special Unicode Characters in EPUB? | gawl | PocketBook | 1 | 03-24-2013 05:12 AM |
Error: "can only concatenate list (not "unicode") to list" | bmuesse | Library Management | 2 | 01-11-2013 03:50 PM |
PDF to WORD/HTML conversion, "special characters and marks" errors | chengyibo | 3 | 11-06-2010 12:43 AM | |
Request "Post-it" or sticker cannot anchor without selecting tests | henry_moh | enTourage Archive | 0 | 04-14-2010 12:34 AM |