04-14-2013, 11:27 AM | #1 |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
Understanding bidirectional Unicode control characters
When "studying" Unicode specifications I came across Carl Henderson's remarkably clear and concise illustration of understanding bidirectional text.
I extracted the .html from his page and I created a .mobi and a .azw3 to test the capabilities of my Kindle Keyboard v3.4. The expected behaviour is explained in the text along with coding and rendering examples, greatly enhancing the comprehension. The subject is relatively complex and certainly not trivial. (I think) I *do* understand it now (?)... So can you! Hoping you find it useful to test other browsers and ereaders. I have identified the following rendering errors (I think) . And I would appreciate if a dedicated and applied reader interested in this subject could confirm my understanding. When Adobe Acrobat XI Pro v11.0, Firefox v20.0.1 or Microsoft Internet Explorer v10.0 render the .html The "implicit marker, third case" Can all these good software be doing that same error?! After converting the .html to .mobi with Kindle Previewer v2.85, to .azw3 with Calibre 0.9.27 and viewing them in my Kindle Keyboard v3.4 The "implicit marker, first case" Open my test files below to better see what I mean: |
04-14-2013, 01:25 PM | #2 |
Grand Sorcerer
Posts: 5,587
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
While Unicode experts love to discuss these issues, in real life, there are hardly any problems, as long as you use the dir="rtl" attribute for predominantly RTL text. And even if you don't do this, RTL text will be mostly displayed correctly, if it doesn't contain embedded LTR text, except for final punctuation characters at the end of a paragraph, which will always be displayed on the right hand side, even if they should be displayed after the last word of the sentence on the light hand side.
Also most rendering engines automatically use RTL rendering & shaping, when they encounter Unicode glyphs that need to be rendered RTL and switch to LTR when they encounter Latin text. I.e. most of these fancy RTL/LTR embedding codes aren't really necessary in real life applications. (I.e. mostly RTL text with LTR segments and vice versa.) Also some of them aren't fully supported by rendering engines anyway. Please find attached a simple test file that I just slapped together using a BBC Arabic article about the late Margret Thatcher. On the first HTML page I inserted her English name in an Arabic paragraph, and on the third page I inserted her (transliterated) Arabic name in an English paragraph. Both inserts where not enclosed by any special RTL or LTR embedding codes. (The second HTML page shows what happens if you don't use the dir="rtl" attribute: punctuation characters are always displayed on the right hand side and inserted LTR text breaks up the paragraph.) tl;dr RTL text rendering isn't as complicated as it looks like |
Advert | |
|
04-14-2013, 01:55 PM | #3 |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
So true. Thanks for your explanations and real life examples. Your wise advice (and the rendering errors I pointed to) show how to stay out of trouble.
I am still just an apprentice, but it was dnatsrednu ot nuf hcum! admittedly spending far too much time on the theory. Again, sincere thanks for sharing your expertise. |
04-15-2013, 12:25 AM | #4 |
Connoisseur
Posts: 57
Karma: 1010
Join Date: Jul 2011
Device: Archos A70 eReader, Kindle Touch, Sony PRS-T2
|
Thank you! I planned to work on a book mixing latin and hebrew charaters, it comes in handy.
|
04-15-2013, 03:33 AM | #5 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Does ADE support RTL languages these days? ISTR that earlier versions of it didn't.
|
Advert | |
|
04-15-2013, 04:28 AM | #6 | |
Grand Sorcerer
Posts: 5,587
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
1. You'll obviously need to embed an Arabic font 2. You'll need to manually generate the Arabic context forms. 3. You'll need to reverse the order of the preshaped Arabic letters. (The last two steps can be accomplished with this free tool.) Since all of this is a major PITA, it's only practical for very short quotes, but it can be done. Please find attached an excerpt from a German book available at MR that demonstrates this technique. Last edited by Doitsu; 04-15-2013 at 02:12 PM. Reason: Removed unused styles and fonts & added some comments |
|
04-15-2013, 04:43 AM | #7 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Ouch - that does sound painful. And presumably, if you reverse the order of the letters, and then do a conversion to a format which does support RTL, such as KF8, you'd need to reverse the letters again to make it display correctly!
|
04-15-2013, 05:10 AM | #8 | |
Grand Sorcerer
Posts: 5,587
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
(Note the additional vowel signs on the first Arabic word in the screenshot of the Kindle version, which was generated without further modification from the .epub file attached to my previous post.) |
|
04-15-2013, 05:16 AM | #9 |
eBook Enthusiast
Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Very cunning - thanks!
|
04-27-2013, 11:01 AM | #10 | |
curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ
Posts: 3,002
Karma: 50506927
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3₃.₄.₃ PW3&4₅.₁₃.₃
|
Quote:
Spoiler:
I thought that the <bdo dir="rtl"> tag in my example #9 could possibly ease step 3. Though I still prefer the elegance of simple Unicode ʇɟǝןʎdoƆ (ↄ) for reversing (c) Copyright. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Support of Special Unicode Characters? | gawl | ePub | 6 | 03-27-2013 02:41 PM |
Support of Special Unicode Characters in EPUB? | gawl | PocketBook | 1 | 03-24-2013 05:12 AM |
¿Convert unicode decomposed characters to unique/normal characters? | JohnQwerty | Calibre | 3 | 04-05-2012 12:08 PM |
Non-Roman Unicode Characters | teh603 | Writers' Corner | 7 | 03-26-2012 11:06 AM |
Reader adds space after unicode characters... | bmfrosty | Astak EZReader | 2 | 07-16-2009 08:53 PM |