06-16-2010, 10:38 AM | #1 |
Zealot
Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
how to tell the character encoding???
I have problems with files in Calibre from time to time. Sometimes they're prc's that I "convert" to mobi to embed my new metadata. Sometimes they're epub or .pdb files which I convert to mobi.
Often the em dashes and/or the apostrophes and sometimes even the quotation marks are replaced with squares in the converted file. After doing some digging here I gather that this may be an "input character encoding" problem and I need to put the appropriate encoding type into my preferences. I cannot understand how I'm supposed to determine what my character encoding is? I tried cp1252 which I gather is common. That didn't help me, so I guess it's a different codec: but I have no idea how to figure out which one. Can anyone help? |
06-16-2010, 10:55 AM | #2 |
creator of calibre
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Te easiest solution is to simply use the transliterate unicode characters option which will replace these special characters with their plain ascii equivalents.
|
06-16-2010, 10:59 AM | #3 |
Zealot
Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
I actually tried that...but it simply REMOVED all of the apostrophes in the document.
|
06-16-2010, 11:04 AM | #4 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Most people don't do it that way, however. They just try reasonable options until one seems to work. Here are the ones I usually try: cp1252 cp1251 latin1 utf-8 |
|
06-16-2010, 11:10 AM | #5 | |
Well trained by Cats
Posts: 30,371
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Curly Quotes (93,94) and Apostrophes(92) DELETED when converted to EPUB Transliterate enabled. This is a simple TXT file, so there are no internals that declare charset (that was used) |
|
06-16-2010, 11:12 AM | #6 |
Zealot
Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
It seems that the book might have broken at the input stage.
I bought the book then ran it through ereader2html then input the html into calibre. The output of ereader2html looks fine, but when I click V in calibre it shows me a winzip folder (it imported as zip). If I open the book file in there the em dashes and apostrophes are replaced with squares. At that point any converting I do won't help. So is there a way to import html without losing emdashes and apostrophes? For now I found the workaround of opening the html file in mobipocket creator and outputting a prc file. It's adding extra steps to what is already a fairly arduous process. Is there a way to streamline? |
06-16-2010, 11:21 AM | #7 |
creator of calibre
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
06-16-2010, 02:29 PM | #8 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I found the same thing going txt->txt. I expected it to convert Curly Double Quotes (93, 94) -> to Ordinary Double Quote (22) and Curly Single Quotes - Apostrophes (91, 92) to Ordinary Single Quote (27). Bug?
|
06-16-2010, 02:40 PM | #9 |
creator of calibre
Posts: 44,336
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Did you specify the correct encoding for the TXT file in input encoding?
|
06-16-2010, 02:45 PM | #10 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Edit: Yes, adding the correct encoding CP1252 caused it to convert as expected. Last edited by Starson17; 06-16-2010 at 02:55 PM. |
|
06-16-2010, 03:03 PM | #11 | |||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Last edited by Starson17; 06-16-2010 at 03:09 PM. |
|||
06-16-2010, 03:47 PM | #12 | |
Well trained by Cats
Posts: 30,371
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
V .7.2 |
|
06-16-2010, 04:20 PM | #13 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
06-16-2010, 04:57 PM | #14 |
Well trained by Cats
Posts: 30,371
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
06-16-2010, 07:36 PM | #15 | |
Zealot
Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
Quote:
Or is there something I'm missing??? |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Pdf to epub Turkish character encoding problem | blueresistance | Conversion | 1 | 02-25-2011 05:31 PM |
Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
how to add encoding? | nsg | Calibre | 5 | 02-25-2009 09:51 PM |
Character encoding in the filesystem | Jellby | Bookeen | 1 | 03-30-2008 05:36 AM |
FBReader fixes character encoding problem | jbenny | News | 1 | 10-18-2007 10:50 PM |