MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Typographic quotes broken after editing in Sigil (https://www.mobileread.com/forums/showthread.php?t=96016)

MacEachaidh 08-26-2010 06:41 AM

Typographic quotes broken after editing in Sigil
 
I have several epub files created with standard western font encoding, that I have opened in Sigil to embed cover graphics. After saving them, I find that typographic quotes have been converted to ... I'm not sure what. Unicode?

For instance, a typographic apostrophe now appears as ’, an open double-quote as “ and a close double-quote as �?.

Does this make sense to anyone? How do I fix it?

(I've seen it before in other contexts, and expect it has something to do with a specified encoding, but I don't know what's wrong and how to fix it.)

Thanks for any help.

Vintage Season 08-26-2010 08:02 AM

Here is the darn-near-universal (a.k.a. "unicode") way to fix it:

Use the following for a typographic apostrophe: ’
Use the following for an open double-quote: “
Use the following for a close double-quote: ”

- M.

MacEachaidh 08-26-2010 08:44 AM

Thanks VS. But let me get this straight: aren't they Unicode codes? Doesn't that require the original document to be saved specifically as Unicode for these to display correctly?

What are the other ones I quoted, then, and why would they show up? I've seen them before showing up in documents in Microsoft Word (for instance), and assumed they were something to do with coding, but other than doing a find-and-replace on them didn't really know how to fix them. What are they, exactly?

And why would editing the file in Sigil - not changing either the body copy or the CSS, but simply embedding a jpg file as the cover - makes these weird codes show up?

Dave_S 08-26-2010 09:02 AM

Quote:

Originally Posted by MacEachaidh (Post 1075952)
For instance, a typographic apostrophe now appears as ’, an open double-quote as “ and a close double-quote as �?.

FWIW, I usually see that kind of mixup when encoding="xxxx" is in the beginning of of the XHTML, but the text is actually encoded as "yyyy", where xxxx and yyyy are two different text encoding schemes. That usually seems to happen when MS tools are used, which generate win-1252 encoding which is badly incompatible with utf-8 for special characters.

Jellby 08-26-2010 10:03 AM

The ’ et al codes are not Unicode, they are HTML entities, and work whatever the real encoding of the file is.

You can write explicitly the needed character, but then it has to be in the correct declared encoding (and in ePUB, it must utf-8 or utf-16). If you mix encodings bad things happen.

A trick: when I need to differentiate between single quotes and apostrophes, I use ‘ ’ for the quotes and & #8217; (without space) for the apostrophe. The character is exactly the same, but that allows for better search and replace in the future. I don't know if Sigil would keep this, though.

Valloric 08-26-2010 10:28 AM

Quote:

Originally Posted by MacEachaidh (Post 1075952)
I have several epub files created with standard western font encoding, that I have opened in Sigil to embed cover graphics. After saving them, I find that typographic quotes have been converted to ... I'm not sure what. Unicode?

For instance, a typographic apostrophe now appears as ’, an open double-quote as “ and a close double-quote as �?.

Quote:

Originally Posted by Dave_S (Post 1076095)
FWIW, I usually see that kind of mixup when encoding="xxxx" is in the beginning of of the XHTML, but the text is actually encoded as "yyyy", where xxxx and yyyy are two different text encoding schemes. That usually seems to happen when MS tools are used, which generate win-1252 encoding which is badly incompatible with utf-8 for special characters.

Dave_S has it right. Your encodings are mixed up. The file is probably declaring the use of one encoding, but actually using another. Or it's declaring the use of two encodings (which is impossible).

Sigil looks at the encoding declared and converts the bytestream from that encoding into UTF-16. As long as the file is truthful about it's encoding :), this works great.

Quote:

Originally Posted by Jellby (Post 1076205)
A trick: when I need to differentiate between single quotes and apostrophes, I use ‘ ’ for the quotes and & #8217; (without space) for the apostrophe. The character is exactly the same, but that allows for better search and replace in the future. I don't know if Sigil would keep this, though.

It should.

Vintage Season 08-27-2010 10:31 AM

Quote:

Originally Posted by Jellby (Post 1076205)
A trick: when I need to differentiate between single quotes and apostrophes, I use ‘ ’ for the quotes and & #8217; (without space) for the apostrophe. The character is exactly the same, but that allows for better search and replace in the future. I don't know if Sigil would keep this, though.

Thanks for that tip! (It's one of many "I should have thought of that" permutations I've encountered on these forums...)

- M.

Vintage Season 08-27-2010 10:33 AM

Quote:

Originally Posted by MacEachaidh (Post 1076064)
Thanks VS. But let me get this straight: aren't they Unicode codes? Doesn't that require the original document to be saved specifically as Unicode for these to display correctly?

As Jellby correctly pointed out, those are HTML entities, and not Unicode. I should not have used that word in my original description.

- M.


All times are GMT -4. The time now is 06:40 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.