Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 02-16-2014, 08:07 AM   #1
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Stress tests with unicode "special" characters

Just for fun, I've started playing a little with unicode characters and I think I've found some "bugs"/possible enhancements.



First one. It really seems a true bug. Example given here (in Wikipedia>Unicode in Spanish). Some characters, like accented letters or Spanish "ñ" can be built as a single character ("ñ" is U+00F1) or as a combination of "n" (U+006E) plus a combining tilde (U+0303).

Ok, the problem is that if you do the second option, the next character is also rendered ABOVE that compound "ñ" in the Editor. Look at the attached screenshot where the "<" from "</p>" is rendered over the last "ñ". The first "ñ" is a true single one (U+00F1).



Second one. 50% of being a bug. If I type "ñ" in the search text box, I Find/Reaplace one match: the true "ñ" (U+00F1) but not the combined one. BUT if I Count All/Replace All, I get both matches. Incoherency here?



Third one. Clearly a feature request. Imagine you've solved issue #1. Nevertheless this kind of situations, (or the presence of a soft hyphen (U+00AD), and possibly of other "hidden" characters), is a PITA for editing, because you get no visual clue of their presence.

For example in the starting "cubierta" word I've added a soft hyphen which you cannot see. So if I try looking for "cubierta" I don't get any match...

If possible, I would like to have a toggle option much like the "paragraph icon" button in MS-Word. This button/preference would cause all this kind of "hidden" symbols to be explicitly rendered with arbitrary but related characters. (I say arbitrary because their "true" rendering is invisible). As possible examples:
  • Soft hyphens replaced by · (as this is how they usually appear in printed dictionaries), but with a "red" background to make it different from a true ·. (In a similar way as non breaking spaces and other symbols currently use a yellow background).
  • The combined "ñ" would be split into its components: a common "n" and a ~ after it, (also in red background to make it different from a true "~" symbol).

What do you think about these issues/possible enhancements?
Attached Thumbnails
Click image for larger version

Name:	Combining tilde.jpg
Views:	381
Size:	249.4 KB
ID:	119142  
arspr is offline   Reply With Quote
Old 02-16-2014, 08:38 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Dont ever type combining symbols like letters and their accents separately. Just because it is possible in unicode does not mean it is a good idea. It causes all the problems you just discovered. In fact, the editor automatically normalizes all such cases into their combined form when first loading the text.

As for invisible characters, there are lots and lots of those. The problem with replacing them with visible characters is that you then have to keep track of when those visible characters come from the replacement and when they are part of the original document. That's not really do-able robustly.
kovidgoyal is offline   Reply With Quote
Advert
Old 02-16-2014, 01:28 PM   #3
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Quote:
Originally Posted by kovidgoyal View Post
Dont ever type combining symbols like letters and their accents separately. Just because it is possible in unicode does not mean it is a good idea. It causes all the problems you just discovered. In fact, the editor automatically normalizes all such cases into their combined form when first loading the text.
Perfect. #1 and #2 solved as it becomes just a masochist geek issue.


Quote:
Originally Posted by kovidgoyal View Post
As for invisible characters, there are lots and lots of those. The problem with replacing them with visible characters is that you then have to keep track of when those visible characters come from the replacement and when they are part of the original document. That's not really do-able robustly.
Entering brainstorming noob mode.

Please remember that I don't want a full replacement but just for viewing. I mean the real HTML text always contains a soft hyphen (or any other one) despite how it is rendered in the editor.

But nevertheless, maybe even just ONE symbol for telling the user "invisible character here" would be just great. That symbol and the current identification tool we have in the lower right corner would be more than enough.

And maybe you cannot keep track of all the possible invisible items, but maybe you could create a "list" with the ones you currently know and update it as best as you can. If the editor catches one symbol and it is in the "list", the "invisible character here" replacement would be rendered (if selected by the user in the related preference or whichever other method). Of course, there's a risk of missing some or lots of them. But showing just some of them is much times better (about infinity) than showing none.

Just thoughts, (possibly nonsense...)
arspr is offline   Reply With Quote
Old 02-16-2014, 02:26 PM   #4
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
I thought of that too, but it could be pretty heavily colored with some of the junk I work with. It is worthy of some consideration though.
mrmikel is offline   Reply With Quote
Old 02-16-2014, 07:34 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is no "view" in the editor. The only way to get the editor to display hidden characters is to actually replace the hidden characters with whatever symbolic character you need. At that point you have to keep track of every such replacement, so you can undo them when saving the text.

Or re-implement the entire QPlainTextEdit control from scratch to support a special rendering mode for non-visible characters.

One hackish way that you *may* be able to achieve this is to use a special font that has a symbol glyph for hidden characters. However, that will only work if Qt text layout engine allows the font to determine the rendering of non-visible characters, which may not be the case.
kovidgoyal is offline   Reply With Quote
Advert
Old 02-17-2014, 04:29 PM   #6
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Continuing noob brainstorming mode, so expect more nonsense per square inch than in your favourite sitcom...

(But just in case something of the stupid things I'm going to say fires up some of your brain gears by sheer luck).


I start with a question and a needed hypothesis. Whenever any change in the text is made, do you know what modifications are made just as "absolute positions" in the text? I mean, if I type and insert "whatever", are you able to know if "whatever" is inserted in the 98th position of the whole text? Or if I copy/paste "whatever" erasing "whatever other text", are you able to detect that the positions 101st to 120th of the text are being erased and substituted by a new set of just 8 characters?

Because if the answer is "Yes, I (Kovid) can easily know the positions of the modifications that are constantly being made through any of the possible editing actions", then maybe the approach to implement this useful feature could be the next one:
  • Two synchronized copies of the text are needed. One is the "real" HTML text and the other one is the "visibilized" one.
  • The "visibilized" one is constructed just once from a copy of the "real" one where the invisible characters are substituted by any other convenient one (or ones).
  • The Editor only knows about the "visibilized" one, but the important one is the "real" one.
  • Whenever the "visibilized" one, which is managed through the Editor, is changed, as you know the position of those changes, those very same changes are replicated in the "real" one in those exact positions.
  • If any of the changes contains a new invisible character, it is automatically substituted but just in the "visibilized" copy.
  • Any kind of search process is always made on the "real" text, not in the "visibilized" one.
  • Any kind of replacement process is originally done on the "real" text and then replicated in the "visibilized" one through the very same procedure about absolute positions (but in the opposite direction).

In this way you wouldn't need to keep track of the exact substitutions made between "real" and "visibilized" copies. You just have to be careful about replicating exactly the same modifications on both copies of the text.

Please do not be too harsh with your favourite buffoon...
arspr is offline   Reply With Quote
Old 02-17-2014, 09:54 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
No, QPlainTextEdit does not expose detailed change information. You know when the text changes, not exactly what the change was. And in any case maintaing two copies of the text that would need to be synchronized would double memory consumption and be very slow.

The only workable solution is for this is to change the rendering code to simply render invisible characters visibily. But that requires modifying the guts of Qt's text rendering system and is waaaay too much work.
kovidgoyal is offline   Reply With Quote
Old 02-19-2014, 10:08 AM   #8
arspr
Dead account. Bye
arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.arspr ought to be getting tired of karma fortunes by now.
 
Posts: 587
Karma: 668244
Join Date: Mar 2011
Device: none
Thx for your explanations Kovid.

A pity, it's so impossible...
arspr is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Support of Special Unicode Characters? gawl ePub 6 03-27-2013 02:41 PM
Support of Special Unicode Characters in EPUB? gawl PocketBook 1 03-24-2013 05:12 AM
Error: "can only concatenate list (not "unicode") to list" bmuesse Library Management 2 01-11-2013 03:50 PM
PDF to WORD/HTML conversion, "special characters and marks" errors chengyibo PDF 3 11-06-2010 12:43 AM
Request "Post-it" or sticker cannot anchor without selecting tests henry_moh enTourage Archive 0 04-14-2010 12:34 AM


All times are GMT -4. The time now is 08:52 PM.


MobileRead.com is a privately owned, operated and funded community.