Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-29-2013, 05:15 AM   #1
Jochen K.
Member
Jochen K. began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
Question Unicode files converted to .txt?

Hi,

I just tried to convert some German language EPub files to .txt (for the reason see bottom of message) and found that most of them came out of the conversion more or less garbled - I suppose they may have had unicode characters inside; for interest's sake, I also tried some English language files, and some of them showed the same behaviour - inverted commas, apostrophes and such being replaced by two character combinations.

I found a workaround - convert to .rtf, load into an Office program and save as .txt.

Still, I'm curious: is there a basic problem for the conversion of such files straight to .txt, or has the .txt converter simply never been updasted to address this issue?

Regards,

Jochen

--------------------------------------------------------------
Why .txt, when I had EPub?
I came across a source of electronic versions of old (19th century) German language books, and remembered many titles - I had read them as a kid in my great grandfather's library.

Since the 19th century, German has gone through two spelling reforms - a minor one in 2006 and a very major one in (I think) 1904 (this one even changed the spelling of a number of place names); upwards of 90% of the changes made then can easily be implemented with Find/Replace, so I wanted to increase legibility of the texts by doing just that and on one of my machines I have an editor for plain text that is very good indeed.

Last edited by Jochen K.; 01-29-2013 at 05:20 AM.
Jochen K. is offline   Reply With Quote
Old 01-29-2013, 05:22 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The .txt files will be fine, you just need to tell whatever program you are using to open the .txt files that they are encoded using utf-8
kovidgoyal is offline   Reply With Quote
Advert
Old 01-29-2013, 10:11 AM   #3
Jochen K.
Member
Jochen K. began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
Weird - do you have any supernatural powers? ;-)

I just ran Calibre again, didn't change anything except reset the output format back from .rtf to .txt - and the test file I used translated flawlessly into .txt - and when I looked through all the options I found one that said text output was in utf-8 !

Thanks a million,

Jochen
Jochen K. is offline   Reply With Quote
Old 02-18-2013, 07:54 AM   #4
Jochen K.
Member
Jochen K. began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Jan 2013
Location: Cologne, Germany
Device: Galaxy S4 (no second device to lug around)
Hi,

it seems that utf-8 does NOT solve all my problems, after all.

As an example, I downloaded from Amazon's kindle store a (free, non DRM) book bei Karl May - Old Surehand 1.

Converting this to rtf gives perfect text, but the empty lines between paragraphs have vanished.

[EDIT: I just looked at the original text in the KindleForPC reader software, and apparently Calibre's txt conversion actually improves on the Original where paragraph spacing is concerned - the Original does't have any spaces between paragraphs either, so the rtf conversion is really 1:1.]

Converting to txt (the only settings on the Txt Output page being utf-8/system/plain) gives me spaces between paragraphs, but unicode junk instead of umlauts, apostrophes and such loke.

Is there a setting for either of those conversions that I have overlooked and that will solve my problem?

Regards,

Jochen

Last edited by Jochen K.; 02-18-2013 at 08:06 AM. Reason: additional info
Jochen K. is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Unwanted epub files once mobi files have been converted. fletchdt Conversion 5 03-22-2012 10:18 PM
Txt files - Convert to Epub - Multiple files into one book - noob help Cernan Calibre 6 05-18-2010 10:12 AM
Can .mht files be converted? Starfish Sony Reader 3 12-06-2009 09:03 AM
How to create non-embedded Unicode EPUB,LRF,TXT,RTF,PDF alexmobile Sony Reader 1 09-23-2009 10:04 PM
Covert Unicode TXT to EPUB, failed lovemov Calibre 5 04-06-2009 07:41 PM


All times are GMT -4. The time now is 07:18 PM.


MobileRead.com is a privately owned, operated and funded community.