Unicode files converted to .txt?
Hi,
I just tried to convert some German language EPub files to .txt (for the reason see bottom of message) and found that most of them came out of the conversion more or less garbled - I suppose they may have had unicode characters inside; for interest's sake, I also tried some English language files, and some of them showed the same behaviour - inverted commas, apostrophes and such being replaced by two character combinations.
I found a workaround - convert to .rtf, load into an Office program and save as .txt.
Still, I'm curious: is there a basic problem for the conversion of such files straight to .txt, or has the .txt converter simply never been updasted to address this issue?
Regards,
Jochen
--------------------------------------------------------------
Why .txt, when I had EPub?
I came across a source of electronic versions of old (19th century) German language books, and remembered many titles - I had read them as a kid in my great grandfather's library.
Since the 19th century, German has gone through two spelling reforms - a minor one in 2006 and a very major one in (I think) 1904 (this one even changed the spelling of a number of place names); upwards of 90% of the changes made then can easily be implemented with Find/Replace, so I wanted to increase legibility of the texts by doing just that and on one of my machines I have an editor for plain text that is very good indeed.
Last edited by Jochen K.; 01-29-2013 at 05:20 AM.
|