well, I was very surprised about the results I wrote about in my last posting. Could they really have a different coding based on what folder the text is stored in ?
@reprep: - I did not know about Turkish, and include a link here:
http://en.wikipedia.org/wiki/Wikiped...ish_characters
My view and hope has been that books/dictionaries would both handle utf-8, which does not seem possible with Turkish language. We are somewhat at the heart of some of the Duokan mess, as they attempt a larger code-base. 16bits seem to be needed for Turkish, which is also the issue with Duokan. ( uncertain what they attempt doing at the moment )
Turkish will need a separate character encoder, for ISO 8859-9, or include some exceptions as described in the link. Maybe GB18030 will handle it in the future ?