View Single Post
Old 12-11-2009, 06:29 PM   #80
Jim Chapman
Addict
Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.Jim Chapman ought to be getting tired of karma fortunes by now.
 
Posts: 310
Karma: 2025434
Join Date: Oct 2009
Device: Lumia 950 Phone
Quote:
Originally Posted by emai7s2 View Post
Hi Jim,

The talk of fonts inspired me load a couple of files in text format that I thought might present a good challenge for Freda. One of the files contained Cyrillic characters (Здравствуйте!), the other Pinyin (e.g., Lǜchá) and Mandarin (e.g. 绿茶).

To my surprise, Freda rendered the Cyrillic quite well. It got about 80% of the Pinyin right, and none of the Mandarin.

Do you think Freda will be able to render all these character sets properly at some point? It's about 75% of the way there already.
Interesting. Freda really just relies on Windows .NetCF to handle character encoding. In the release version (imminent) I give you more options to tweak the encoding, but if the epub file is created with the right encoding information, it shouldn't need tweaking anyway. If your book is encoded in UTF-8 or UTF-16 it should basically work. My code does assume things like:
- words are separated by whitespace characters
- pages are read from top to bottom
- lines are read from left to right; this rule applies both to letters and numbers
I'd be interested in making sure that Freda does properly handle any UTF-8/16 encoded epub for a language that follows these rules (I think Cyrillic, Pinyin and Mandarin do ... whereas Arabic, Hebrew and assorted South-East Asian languages do not). I'm not going to try to deal with all possible ANSI code pages, because (1) most of these code-pages aren't installed on most devices (2) I don't have any devices or images to test them on (3) if you really want to read a book that's been encoded with some funky ANSI variant, you have the option of re-coding it in UTF-8 or -16. Right-to-left languages and languages with unusual definitions of whitespace will have to wait a bit longer (my 'justification' code is already really ugly; my heart sinks at the thought of having to deal with it!).

Anyone with a language they'd like to see working on Freda is invited to send me a sample UTF-8 or UTF-16 epub text and some indication of what it should look like when properly rendered, I will put in some effort on it, once I've got release 1.0 shipped.

Thanks,

Jim
Jim Chapman is offline   Reply With Quote