![]() |
#1 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
accented chars are weirdly managed
Might I chase for an help here?
I'm facing a problem when I edit an italian epub using Sigil which shows me stuff like: "èacquistabile" "accessibilitàalle" It seems that the accented characters ruin the rendering; the correct ones should be: "è acquistabile" "accessibilità alle"
I own a recent Sony Vaio with updated NVidia GEForce, Windows 7, Sigil 0.7.4 Thank in advance for any help p.s. sorry my poor English, I did my best ![]() |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
I suspect there is a character after the accent that is causing this to happen. You might check the following in code view. Position the cursor before the accented character. Press now the right arrow key twice. Where is the cursor now? Did it appear to move after the accented characater?
Also, how does it look in code view? |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
Ciao Toxaris, sorry I did not explained it clearly.
I always use Sigil in code view; the example I posted had been got by the code view (I did not know book view btw). I can add now that onto the book view chars are correct, there is a blank and the cursor move with no" gap" from left to right with the right arrow. On the contrary, in code view, I do not see the blank; while moving from left to right I need to hit the arrow twice to pass the accented vowel! Using this example, "èacquistabile" (code view), I need two arrow hits to move from "è" to "a"... |
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
That is what I expected. There is a hidden character between the accented letter and the non-accented letter. What is the source of the document?
It could be a thin space or a zero width joiner. |
![]() |
![]() |
![]() |
#5 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,681
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
e U+0065 LATIN SMALL LETTER E Code:
̀̀ U+300 COMBINING GRAVE ACCENT Most likely your display issues are caused by these and other combining characters. Try replacing them with combined characters. For example, replace U+0065 & U+0300 with: Code:
è U+00E8 LATIN SMALL LETTER E WITH GRAVE EDIT: To identify the problematic Unicode characters, visit this website, paste a paragraph with accented characters and missing spaces into the Unicode Text box, click Convert and post the results here. Last edited by Doitsu; 11-28-2013 at 10:36 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
If you don't see a space in code view, I'd say that's a bug (probably in some Qt element). If the problem is what Doitsu says (é encoded as two characters), I would expect this to happen
|è acquistabile (press right arrow) è| acquistabile (press right arrow) è| acquistabile (again!) (press right arrow) è |acquistabile but: |è acquistabile (press right arrow) è| acquistabile (press "o" key) eò| acquistabile (press right arrow) eò| acquistabile (press "o" key) eòo| acquistabile which makes sense if you consider the encoding sequence is actually: e` acquistabile But the combining accent causing the following space to disappear doesn't look correct... unless there's some other catch, like the space being some kind of special space. |
![]() |
![]() |
![]() |
#7 |
Color me gone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
You can right click open with to a hex viewer like pspad at this particular point and see what it shows in the hex viewer. You can look to see if there is a character number there that is not between other characters in the document and delete it and see what happens (after making sure you have a backup.)
Sigil, in the past, has not shown all characters in code view. This may be changed now that it is using the updated version of QT. This have never occurred for me when it was created in Sigil. But it has happened when I scraped something off a web page. |
![]() |
![]() |
![]() |
#8 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
If you don't see a space in code view! My issue is only by code view! While using book view the text is perfect...
- CODE VIEW |èacquistabile (press right arrow) |èacquistabile (nothing happened) (press right arrow) è|acquistabile (seems there's a blank using "|", but the two vowels are linked together btw) - HEX 65CC8020616371756973746162696C65 (continuos string, no special html inside) "CC80"???? They are two, definitely; tea for two bloody chars? ![]() - I visit this website The html conversion of the string (coming from the code view) is: "e$#768; acquistabile" (with $ used in place of &) BUT kindly note the unicode string *shows* the space when pasted in the box thank for the help to you all ciao Last edited by pgfiore; 11-29-2013 at 04:26 AM. |
![]() |
![]() |
![]() |
#9 | |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
![]() http://www.fileformat.info/info/unic...0300/index.htm So the string is "e", "combining grave accent", "space", "a", "c", etc. Nothing wrong with that, and there's definitely a bug in code view if it's not showing the space. But it's not necessarily a bug in Sigil, it could be in some of the libraries it's using. As you have been told already, the easiest solution is to use the precomposed character "è", instead of "e"+"combining grave accent". That's: C3 A8 20 61 63 71 75 69 73 74 61 62 69 6C 65 http://www.fileformat.info/info/unic...00e8/index.htm |
|
![]() |
![]() |
![]() |
#10 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
coding cobol I never used more than... well should be a couple of dozens of charactes, upper case only ca va sans dire! ;-)
64 ought to be enough for anybody!!! And know? I asked the author of "[...] implement it yourself and submit a pull request on GitHub.", but seems they dont accept patches in cobol. |
![]() |
![]() |
![]() |
#11 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
One question remains unanswered. What is the source of the ePUB or HTML used?
|
![]() |
![]() |
![]() |
#12 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
Well Toxaris, not sure to catch the real meaning of your question; give it a try.
The Source is: - a mobi directly converted to epub by one of the latest calibre. - a piece of code like that, in yellow the involved words (pls note I cannot guarantee the "è" is still coded correctly after two copy/paste; the real hex is here above): <?xml version='1.0' encoding='utf-8'?> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Mio Titolo </title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> <link href="stylesheet.css" rel="stylesheet" type="text/css"/> <link href="page_styles.css" rel="stylesheet" type="text/css"/> </head> <body class="calibre"> <p id="filepos305" class="calibre1"> <span class="calibre2"> <span class="bold">MIO TITOLO </span> </span> </p> <p class="calibre1"> <span class="calibre3">AUTHOR </span> </p> <p class="calibre1"> <span class="calibre3">TRATTO DA UNA REALTÀ ATTUALE </span> </p> <p class="calibre4" style="margin:0pt; border:0pt; height:1em">* </p> <p class="calibre4" style="margin:0pt; border:0pt; height:1em">* </p> <p class="calibre5"> <span class="calibre3"> Il romanzo, cartaceo e ebook (Epub e Mobi), è acquistabile sul sito* </span> <a href="www.miosito.it"> <span class="calibre3"> <span class="calibre6"> <span class="underline">www.miosito.it </span> </span> </span> </a> <span class="calibre3"> </span> </p> <p class="calibre5"> <span class="calibre3"> Versione Epub per l’accessibilità alle persone ipovedenti e non vedenti (lettura audio, braille digitale, e a caratteri ingranditi). </span> </p> <span class="calibre7"> </span> </p> <p class="calibre1"> <span class="calibre3">Obi Wan</span> </p> <p class="calibre4" style="margin:0pt; border:0pt; height:1em">* </p> <div class="mbppagebreak" id="calibre_pb_0"> </div> </body> </html> |
![]() |
![]() |
![]() |
#13 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,543
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
|
![]() |
![]() |
![]() |
#14 |
Enthusiast
![]() Posts: 34
Karma: 10
Join Date: Dec 2012
Location: Italy
Device: Kindle
|
That sentence is by the editor, while the epub is generated by Calibre from mobi.
I never saw the original epub; could be different I feel, couldn't? |
![]() |
![]() |
![]() |
#15 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,681
Karma: 24031401
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Any word ending in a base character followed by a combining character a space and additional characters will be displayed without a space in Code View. The only known fix is to replace all composite characters with their combined Unicode equivalents. You can easily test this by looking at my test file. The first word, Thé, ends in e + 'COMBINING ACUTE ACCENT' (U+0301) followed by a space which is not displayed in Code View mode. Most likely nobody noticed this bug, because combining accents aren't really necessary anymore and most Unicode texts contain (pre-combined) accented characters. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Kindle Touch - Book titles acting weirdly | Katdragon | Amazon Kindle | 2 | 01-01-2013 07:57 AM |
Speakin' of weird: Linux build eating accented chars. | Hitch | Sigil | 2 | 12-17-2010 01:24 PM |
iRiver Story managed by Calibre | mareksuski | Calibre | 14 | 02-19-2010 02:45 AM |
Replacing Chars in URL | DAiki | Calibre | 5 | 10-13-2008 09:25 AM |