View Full Version : Smart quotes in XHTML?


MaggieScratch
03-26-2009, 06:43 PM
After some experimentation, I've determined that the easiest way (for me) to make an ePub book from plain text is to just mark it up myself. However, I can't figure out how to get "smart quotes" in an XHTML page and have it be valid. As far as I can tell, smart quotes aren't valid for XHTML, or at least I can't get any file that uses them to validate. It doesn't say why, just that it encountered an unexpected code or something like that. I took out the smart quotes and it validated just fine.

I've been unzipping ePub books to see how they work. I've found some that have the smart quotes. Are they just inserted and nobody worries if they don't validate?

zelda_pinwheel
03-26-2009, 06:48 PM
by "smart quotes" do you mean properly angled quotes ? in that case, try these codes :

double quotes :
left “
right ”

single :
left ‘
right ’

angled quotes :
left «
right »

for more html entities look at this reference table (http://www.w3schools.com/tags/ref_entities.asp).

pdurrant
03-26-2009, 07:02 PM
If you use a text editor that handles UTF-8 text, and specify the XHTML character set to be UTF-8 you can just include curly quotes and they'll verify OK. Otherwise you'll need to use the entities.

“

etc
After some experimentation, I've determined that the easiest way (for me) to make an ePub book from plain text is to just mark it up myself. However, I can't figure out how to get "smart quotes" in an XHTML page and have it be valid. As far as I can tell, smart quotes aren't valid for XHTML, or at least I can't get any file that uses them to validate. It doesn't say why, just that it encountered an unexpected code or something like that. I took out the smart quotes and it validated just fine.

I've been unzipping ePub books to see how they work. I've found some that have the smart quotes. Are they just inserted and nobody worries if they don't validate?

cerement
03-26-2009, 09:34 PM
Or use the <q> tag and the stylesheet from this page: Language Specific Quotation Marks (http://www.witch.westfalen.de/csstest/quotes/quotes.html)

zelda_pinwheel
03-26-2009, 09:36 PM
Or use the <q> tag and the stylesheet from this page: Language Specific Quotation Marks (http://www.witch.westfalen.de/csstest/quotes/quotes.html)

thanks for that excellent ressource !

jgray
03-26-2009, 11:09 PM
I have had trouble with the named character entities not displaying correctlyin some situations. It is safer to use the numeric entities, which will always work:

Left double quote =
Right double quote =

Some text editors will allow you to enter the UTF-8 character, or the named entity and convert them to numeric on command.

The named entities that are safe to use in all cases are:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#P redefined_entities_in_XML

Jellby
03-27-2009, 07:13 AM
The proper "smart quotes" (i.e., not just oriented quotation marks, but with the ability of detecting the language and quotation level) rely, as far as I know, on the ":before" and ":after" pseudo-classes and "content" property of CSS. I'm not sure these are supported in the ePUB specification, I believe they aren't.

Besides, there is still the "issue" with multi-paragraph quotes, I don't think it's possible to define how they should behave with CSS... So, I decided to forget about the "smartiness" and instead just use the proper character (or entity) in each place.

MaggieScratch
03-27-2009, 01:41 PM
I used named entities in HTML docs that I converted via Mobipocket Creator and they didn't work; just showed the code rather than the character. I ended up inserting smart quotes directly in the original HTML document. (And yes, by "smart quotes" I mean angled quotation marks, single and double--that's what they're called in Word.) I guess that's why I didn't think of it when making an XHTML document for ePub.

I did use UTF-8. I wonder why the pages didn't verify, then? The error didn't name them specifically, just said that there was a character it couldn't read. I assumed it was the quotation marks, because when I took them out there was no problem. I used the verifier on the w3.org site, uploading the document. I just used regular Notepad to make the files.

I don't own a reader that can read ePub. Do most of the readers parse named entities properly? If so, I think that might be the way to go.

When I'm coding regular HTML, I just use the straight quotes. But for something I'm going to read on a dedicated reader, it should be as book-like as possible, in my opinion, especially when trying to convert the paper fetishists! :) Thanks for the advice.

Jellby
03-27-2009, 02:40 PM
I used named entities in HTML docs that I converted via Mobipocket Creator and they didn't work; just showed the code rather than the character.

That's strange, I'm sure it should work. Maybe you forgot the semicolon at the end of the entities or some other typo? Or maybe MPC didn't "know" you were importing an HTML file and then converted &lsquo; into &amp;lsquo;?

(And yes, by "smart quotes" I mean angled quotation marks, single and double--that's what they're called in Word.)

"Smart" in the smart quotes does not mean curled/angled quotes, but the ability (or whatever) of Word to properly guess which kind of quote mark (open or close) should be used when you just type " (straight quote mark).

[quote]I did use UTF-8. I wonder why the pages didn't verify, then? The error didn't name them specifically, just said that there was a character it couldn't read.

I'd guess the character encoding was not set to UTF-8 in the header.

I don't own a reader that can read ePub. Do most of the readers parse named entities properly? If so, I think that might be the way to go.

Neither do I, but I can test ePUBs in Adobe Digital Editions or in a browser. Everything I've tried recognized entities (&rsquo;) or numbers (& #8217;), including the mobipocket reader in the Cybook.

DaleDe
03-27-2009, 04:53 PM
I have had trouble with the named character entities not displaying correctlyin some situations. It is safer to use the numeric entities, which will always work:

Left double quote =
Right double quote =

Some text editors will allow you to enter the UTF-8 character, or the named entity and convert them to numeric on command.

The named entities that are safe to use in all cases are:

http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#P redefined_entities_in_XML

You really don't have to run off to other sites for this sort of thing. Look in our own wiki under Special characters (http://wiki.mobileread.com/wiki/Special_characters)

Dale

llasram
03-27-2009, 04:59 PM
don't own a reader that can read ePub. Do most of the readers parse named entities properly? If so, I think that might be the way to go.

To be spec-correct when using named entities you need to declare one of the XHTML DTDs, which means that you cannot include any non-XHTML markup in the document (such as SVG or OPF namespace-case sections). These days there's no real reason to use entities anyway. Just use an editor which will allow you to insert the character directly and use an EPUB-valid Unicode encoding (UTF-8 or UTF-16).

jgray
03-28-2009, 06:08 PM
It seems that the forum software is being too smart. My numeric entities got converted to quote characters. By now, everyone should know where to go to find a table of extended characters, so it doesn't matter.

I don't remember the exact circumstances, but I did have display problems when using either the UTF-8 characters themselves, or the named entities. I have never had a problem using the numeric entities. It may simply be a matter of the various reading softwares not following spec.

cerement
03-28-2009, 06:53 PM
It may simply be a matter of the various reading softwares not following spec.
When you can't even get the major browsers to agree to follow spec it's no miracle that the electronic readers get "selective" about following spec ...

[And one of the biggest arguments from TeX users against XML/XHTML is the complete lack of compliance among programs, even among the major players - several people on these forums have commented that Adobe Digital Editions ePubs regularly do not pass epubcheck]