![]() |
#1 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
how to tell the character encoding???
I have problems with files in Calibre from time to time. Sometimes they're prc's that I "convert" to mobi to embed my new metadata. Sometimes they're epub or .pdb files which I convert to mobi.
Often the em dashes and/or the apostrophes and sometimes even the quotation marks are replaced with squares in the converted file. After doing some digging here I gather that this may be an "input character encoding" problem and I need to put the appropriate encoding type into my preferences. I cannot understand how I'm supposed to determine what my character encoding is? I tried cp1252 which I gather is common. That didn't help me, so I guess it's a different codec: but I have no idea how to figure out which one. Can anyone help? |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Te easiest solution is to simply use the transliterate unicode characters option which will replace these special characters with their plain ascii equivalents.
|
![]() |
![]() |
![]() |
#3 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
I actually tried that...but it simply REMOVED all of the apostrophes in the document.
|
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Most people don't do it that way, however. They just try reasonable options until one seems to work. Here are the ones I usually try: cp1252 cp1251 latin1 utf-8 |
|
![]() |
![]() |
![]() |
#5 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,891
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Curly Quotes (93,94) and Apostrophes(92) DELETED when converted to EPUB Transliterate enabled. This is a simple TXT file, so there are no internals that declare charset (that was used) |
|
![]() |
![]() |
![]() |
#6 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
It seems that the book might have broken at the input stage.
I bought the book then ran it through ereader2html then input the html into calibre. The output of ereader2html looks fine, but when I click V in calibre it shows me a winzip folder (it imported as zip). If I open the book file in there the em dashes and apostrophes are replaced with squares. At that point any converting I do won't help. So is there a way to import html without losing emdashes and apostrophes? For now I found the workaround of opening the html file in mobipocket creator and outputting a prc file. It's adding extra steps to what is already a fairly arduous process. Is there a way to streamline? |
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I found the same thing going txt->txt. I expected it to convert Curly Double Quotes (93, 94) -> to Ordinary Double Quote (22) and Curly Single Quotes - Apostrophes (91, 92) to Ordinary Single Quote (27). Bug?
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Did you specify the correct encoding for the TXT file in input encoding?
|
![]() |
![]() |
![]() |
#10 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
![]() Edit: Yes, adding the correct encoding CP1252 caused it to convert as expected. Last edited by Starson17; 06-16-2010 at 02:55 PM. |
|
![]() |
![]() |
![]() |
#11 | |||
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Last edited by Starson17; 06-16-2010 at 03:09 PM. |
|||
![]() |
![]() |
![]() |
#12 | |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,891
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
V .7.2 |
|
![]() |
![]() |
![]() |
#13 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#14 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,891
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
![]() |
![]() |
![]() |
#15 | |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
Quote:
Or is there something I'm missing??? |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Pdf to epub Turkish character encoding problem | blueresistance | Conversion | 1 | 02-25-2011 05:31 PM |
Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
how to add encoding? | nsg | Calibre | 5 | 02-25-2009 09:51 PM |
Character encoding in the filesystem | Jellby | Bookeen | 1 | 03-30-2008 05:36 AM |
FBReader fixes character encoding problem | jbenny | News | 1 | 10-18-2007 10:50 PM |