![]() |
#1 |
Member
![]() Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
|
![]()
Hi,
I just tried to convert some German language EPub files to .txt (for the reason see bottom of message) and found that most of them came out of the conversion more or less garbled - I suppose they may have had unicode characters inside; for interest's sake, I also tried some English language files, and some of them showed the same behaviour - inverted commas, apostrophes and such being replaced by two character combinations. I found a workaround - convert to .rtf, load into an Office program and save as .txt. Still, I'm curious: is there a basic problem for the conversion of such files straight to .txt, or has the .txt converter simply never been updasted to address this issue? Regards, Jochen -------------------------------------------------------------- Why .txt, when I had EPub? I came across a source of electronic versions of old (19th century) German language books, and remembered many titles - I had read them as a kid in my great grandfather's library. Since the 19th century, German has gone through two spelling reforms - a minor one in 2006 and a very major one in (I think) 1904 (this one even changed the spelling of a number of place names); upwards of 90% of the changes made then can easily be implemented with Find/Replace, so I wanted to increase legibility of the texts by doing just that and on one of my machines I have an editor for plain text that is very good indeed. Last edited by Jochen K.; 01-29-2013 at 05:20 AM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,201
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The .txt files will be fine, you just need to tell whatever program you are using to open the .txt files that they are encoded using utf-8
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Member
![]() Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
|
Weird - do you have any supernatural powers? ;-)
I just ran Calibre again, didn't change anything except reset the output format back from .rtf to .txt - and the test file I used translated flawlessly into .txt - and when I looked through all the options I found one that said text output was in utf-8 ! Thanks a million, Jochen |
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
|
Hi,
it seems that utf-8 does NOT solve all my problems, after all. As an example, I downloaded from Amazon's kindle store a (free, non DRM) book bei Karl May - Old Surehand 1. Converting this to rtf gives perfect text, but the empty lines between paragraphs have vanished. [EDIT: I just looked at the original text in the KindleForPC reader software, and apparently Calibre's txt conversion actually improves on the Original where paragraph spacing is concerned - the Original does't have any spaces between paragraphs either, so the rtf conversion is really 1:1.] Converting to txt (the only settings on the Txt Output page being utf-8/system/plain) gives me spaces between paragraphs, but unicode junk instead of umlauts, apostrophes and such loke. Is there a setting for either of those conversions that I have overlooked and that will solve my problem? Regards, Jochen Last edited by Jochen K.; 02-18-2013 at 08:06 AM. Reason: additional info |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Unwanted epub files once mobi files have been converted. | fletchdt | Conversion | 5 | 03-22-2012 10:18 PM |
Txt files - Convert to Epub - Multiple files into one book - noob help | Cernan | Calibre | 6 | 05-18-2010 10:12 AM |
Can .mht files be converted? | Starfish | Sony Reader | 3 | 12-06-2009 09:03 AM |
How to create non-embedded Unicode EPUB,LRF,TXT,RTF,PDF | alexmobile | Sony Reader | 1 | 09-23-2009 10:04 PM |
Covert Unicode TXT to EPUB, failed | lovemov | Calibre | 5 | 04-06-2009 07:41 PM |