MobileRead Forums - View Single Post

user_none · 02-12-2013, 07:58 PM

There is no good way to reliably detect a file's character encoding. Your best bet is to decode, and check manually or if you know the encoding specify it explicitly.

In your case you could check for characters like Ă and ¨ in the text and use that as a trigger that the encoding was wrong. However, these are valid characters for that encoding. So this technique will only work in cases where you know those characters will not be present in the text. If this is a novel in a specific language this would work the majority of the time. But it is not a fool proof system and it is not a good general purpose method.

02-12-2013, 07:58 PM	#2
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	There is no good way to reliably detect a file's character encoding. Your best bet is to decode, and check manually or if you know the encoding specify it explicitly. In your case you could check for characters like Ă and ¨ in the text and use that as a trigger that the encoding was wrong. However, these are valid characters for that encoding. So this technique will only work in cases where you know those characters will not be present in the text. If this is a novel in a specific language this would work the majority of the time. But it is not a fool proof system and it is not a good general purpose method.