05-21-2009, 07:00 PM | #1 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
Vanishing punctuation
I converted an HTML file into an ePub book today, and discovered that it seems to be missing most of its punctuation afterward: apostrophes, emdashes, etc.
What might be the cause of that? |
05-22-2009, 07:02 AM | #2 |
Sigil & calibre developer
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Could you post the source files?
|
Advert | |
|
05-22-2009, 08:41 AM | #3 |
Junior Member
Posts: 6
Karma: 112
Join Date: May 2009
Device: Sony Reader
|
I'm having the same problem! I've tried changing the source encoding, converting epub to epub, etc. I either get the funny characters or no quotes or apostrophes at all! Otherwise, format/text looks great!
|
05-22-2009, 08:40 PM | #4 |
Guru
Posts: 869
Karma: 2676800
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
If the source file is html, I've found that you may need to change the encoding in the source file from ANSI to UTF-8 or unicode.
To do this open your html file in your favourite text editor and change its encoding, either by going "Save As" or there may be a specific menu item to do this. |
05-22-2009, 08:44 PM | #5 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
How would I do that in emacs?
|
Advert | |
|
05-23-2009, 11:50 AM | #6 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Definitely encoding detection problem. You have to figure out what encoding the html file is in, then either tell calibre to use that encoding via the source encoding option or convert the html file to utf-8
|
05-25-2009, 02:44 PM | #7 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
All right, how would I go about doing that?
|
05-25-2009, 02:50 PM | #8 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
encoding detection is a bit of a black art, I typically use programmers tools to do it, so I'm not sure what would be an easy way to do it for non-programmers. But a couple of common encodings to try are cp1252, cp1251, latin1
|
05-25-2009, 03:07 PM | #9 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
And how do I tell Calibre what encoding to use? It doesn't seem to be in the meta information for the book.
|
05-25-2009, 03:41 PM | #10 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
there's a conversion option called source encoding
|
05-27-2009, 09:10 PM | #11 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
For what it's worth, I was able to download the text editor Notepad++, with which I found that the texts I wanted to encode were encoded in ANSI encoding, which Calibre didn't even recognize when I tried to enter it in the "Encoding Type" box. Fortunately, it had a menu option to let me re-encode to UTP-8, and once I did that it converted like a charm.
|
05-27-2009, 09:56 PM | #12 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The encoding name is not ansi, it's ascii
|
05-27-2009, 10:14 PM | #13 |
Fanatic
Posts: 514
Karma: 2954711
Join Date: May 2006
|
That's not what Notepad++'s menu says.
|
05-27-2009, 11:38 PM | #14 |
Guru
Posts: 644
Karma: 1242364
Join Date: May 2009
Location: The Right Coast
Device: PC (Calibre), Nexus 7 2013 (Moon+ Pro), HTC HD2/Leo (Freda)
|
Actually it is kind of both.
ANSI is the American National Standards Institute. They're applicable to this discussion because of ASCII. Originally ASCII used seven bits per character but ANSI suggested that be changed to eight bits in order to accommodate other languages. To oversimplify, this was a first attempt at "Unicode". Today all computers are using ANSI compliant ASCII. It is a bit of an issue in the extremely rare situation of finding a text file output by a computer built in the mid 1960's. In which case the seven bit ASCII is used. This is why Notepad ++ mentions that the file is ANSI encoded. It should be noted that until a few years ago it was still possible to generate seven bit ASCII output , even on a newer computer. Last edited by Sabardeyn; 05-27-2009 at 11:41 PM. |
05-28-2009, 12:49 AM | #15 |
creator of calibre
Posts: 44,356
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
All I meant was that the encoding you have to tell calibre to use is "ascii"
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Simpler Way to Smarten Punctuation | Rand Brittain | Calibre | 3 | 10-10-2010 08:16 PM |
Thanks for new 'Smarten Punctuation' feature | jackie_w | Calibre | 1 | 09-21-2010 02:53 PM |
Punctuation | Dresden | Calibre | 7 | 08-31-2010 05:14 AM |
Correct formatting of punctuation | ghostyjack | Workshop | 12 | 08-16-2010 01:36 PM |
Punctuation | jgray | Workshop | 10 | 04-14-2010 07:38 AM |