|
|
#1 |
|
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
how to tell the character encoding???
I have problems with files in Calibre from time to time. Sometimes they're prc's that I "convert" to mobi to embed my new metadata. Sometimes they're epub or .pdb files which I convert to mobi.
Often the em dashes and/or the apostrophes and sometimes even the quotation marks are replaced with squares in the converted file. After doing some digging here I gather that this may be an "input character encoding" problem and I need to put the appropriate encoding type into my preferences. I cannot understand how I'm supposed to determine what my character encoding is? I tried cp1252 which I gather is common. That didn't help me, so I guess it's a different codec: but I have no idea how to figure out which one. Can anyone help? |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,601
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Te easiest solution is to simply use the transliterate unicode characters option which will replace these special characters with their plain ascii equivalents.
|
|
|
|
|
|
#3 |
|
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
I actually tried that...but it simply REMOVED all of the apostrophes in the document.
|
|
|
|
|
|
#4 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Most people don't do it that way, however. They just try reasonable options until one seems to work. Here are the ones I usually try: cp1252 cp1251 latin1 utf-8 |
|
|
|
|
|
|
#5 | |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Curly Quotes (93,94) and Apostrophes(92) DELETED when converted to EPUB Transliterate enabled. This is a simple TXT file, so there are no internals that declare charset (that was used) |
|
|
|
|
|
|
#6 |
|
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
It seems that the book might have broken at the input stage.
I bought the book then ran it through ereader2html then input the html into calibre. The output of ereader2html looks fine, but when I click V in calibre it shows me a winzip folder (it imported as zip). If I open the book file in there the em dashes and apostrophes are replaced with squares. At that point any converting I do won't help. So is there a way to import html without losing emdashes and apostrophes? For now I found the workaround of opening the html file in mobipocket creator and outputting a prc file. It's adding extra steps to what is already a fairly arduous process. Is there a way to streamline? |
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,601
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
|
|
#8 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I found the same thing going txt->txt. I expected it to convert Curly Double Quotes (93, 94) -> to Ordinary Double Quote (22) and Curly Single Quotes - Apostrophes (91, 92) to Ordinary Single Quote (27). Bug?
|
|
|
|
|
|
#9 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,601
Karma: 28548974
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Did you specify the correct encoding for the TXT file in input encoding?
|
|
|
|
|
|
#10 | |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
I left it blank. In retrospect, it's obvious I needed to specify an encoding. Plus, I tried importing an html file with smart quotes. It kept all the smart quotes in the html, although it did convert the original file so that each byte now has an associated 00 byte.Edit: Yes, adding the correct encoding CP1252 caused it to convert as expected. Last edited by Starson17; 06-16-2010 at 02:55 PM. |
|
|
|
|
|
|
#11 | |||
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
Quote:
Last edited by Starson17; 06-16-2010 at 03:09 PM. |
|||
|
|
|
|
|
#12 | |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
V .7.2 |
|
|
|
|
|
|
#13 |
|
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
|
|
|
|
|
#14 |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,241
Karma: 61360164
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
|
|
|
|
|
#15 | |
|
Zealot
![]() ![]() ![]() ![]() ![]() ![]() Posts: 107
Karma: 591
Join Date: May 2008
Device: kindle, iOS, Blackberry, Sony DPT (pdfs)
|
Quote:
Or is there something I'm missing??? |
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Pdf to epub Turkish character encoding problem | blueresistance | Conversion | 1 | 02-25-2011 05:31 PM |
| Encoding | prusaks | Recipes | 0 | 09-27-2010 06:25 AM |
| how to add encoding? | nsg | Calibre | 5 | 02-25-2009 09:51 PM |
| Character encoding in the filesystem | Jellby | Bookeen | 1 | 03-30-2008 05:36 AM |
| FBReader fixes character encoding problem | jbenny | News | 1 | 10-18-2007 10:50 PM |