HTML entities being changed to actual glyphs

GrannyGrump · 09-09-2011, 04:35 AM

I have been trying to learn about HTML and CSS from various online tutorials and Mobile Read posts. I saw a post here on the forum about using HTML entities to help prevent the question mark/blank box that happens when a font can't support a special character.
This works for me for  , but it seems that many others that I use get converted to the actual glyph as soon as I move from Code View to Book View and back again. For instance, “ %mdash; and others.

Is this a normal behavior? Should I try to prevent it? If so, how? Is this something that can be set in the header of the HTML file?

Thanks for any enlightenment....
(I feel so stooooopid!)

charleski · 09-09-2011, 04:58 AM

Quote:

Originally Posted by grannyGrumpy

I saw a post here on the forum about using HTML entities to help prevent the question mark/blank box that happens when a font can't support a special character.

That won't work. HTML entities only aid in the writing of the actual code (since keyboards generally don't have a key for emdashes or typographical quotes), they won't alter the requirement that the font needs the correct glyphs to show them.

What you're seeing is normal behaviour.

You'll only run into problems with missing characters if your book contains fancy wingdings or uses characters that fall outside the narrow set used in Western European languages.

ADE- (or RMSDK-) based readers support a default character set that's defined in tables D1 and D3 given in this document. Apple's iBooks supports a much wider character set, though it's not properly documented in any open location.

Obviously, you can avoid this problem by embedding a font which supports the characters you need if this is an issue.

[Edit]: I see Adobe has updated its docs to reflect PDF's status as an ISO standard. The character set supported by basic ADE implementations can now be found in tables D2 and D5 of this document.

Jellby · 09-09-2011, 04:58 AM

Quote:

Originally Posted by grannyGrumpy

I saw a post here on the forum about using HTML entities to help prevent the question mark/blank box that happens when a font can't support a special character.

Either the post wast wrong or you misunderstood it. Entities or characters makes no difference for the font, it can only help with encoding problems, if you want to make sure there is none. If a font works with "á" and not with "á", it only means you are using the wrong encoding for the latter.

user_none · 09-09-2011, 07:57 AM

All text pages within Sigil (XHTML, OPF...) are always UTF-8 encoded. Unless your reader doesn't understand UTF-8 or you are using a character not supported by the font you are using you should have no issues with the entity being translated to the unicode character it represents.

GrannyGrump · 09-10-2011, 01:16 AM

Thanks guys, for clarifying all that. Like I said, I'm trying to learn this stuff by myself, and I guess I confuse myself often.

Something to add to my notebook of HTML factoids!
Thanks!!

09-09-2011, 04:35 AM	#1
GrannyGrump Obsessively Dedicated... Posts: 3,200 Karma: 34977556 Join Date: May 2011 Location: JAPAN (US expatriate) Device: Sony PRS-T2, ADE on PC	HTML entities being changed to actual glyphs I have been trying to learn about HTML and CSS from various online tutorials and Mobile Read posts. I saw a post here on the forum about using HTML entities to help prevent the question mark/blank box that happens when a font can't support a special character. This works for me for  , but it seems that many others that I use get converted to the actual glyph as soon as I move from Code View to Book View and back again. For instance, “ %mdash; and others. Is this a normal behavior? Should I try to prevent it? If so, how? Is this something that can be set in the header of the HTML file? Thanks for any enlightenment.... (I feel so stooooopid!)

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Glyphs?	fiona86	ePub	4	01-18-2012 11:29 PM
Epub format, B & N PubIt!, and HTML character entities	jlandahl	ePub	3	04-07-2011 04:38 AM
Importing changed HTML file -> duplicate entries	MamaSylvia	Conversion	8	03-19-2011 12:29 PM
Why do html entities get replaced upon import?	kentmatt	Calibre	1	12-08-2010 12:21 PM
Can I preserve entities when converting from html? (To avoid unicode on kindle)	krunkster	Calibre	1	04-07-2009 05:11 PM

09-09-2011, 07:57 AM	#4
user_none Sigil & calibre developer Posts: 2,488 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR	All text pages within Sigil (XHTML, OPF...) are always UTF-8 encoded. Unless your reader doesn't understand UTF-8 or you are using a character not supported by the font you are using you should have no issues with the entity being translated to the unicode character it represents.

09-10-2011, 01:16 AM	#5
GrannyGrump Obsessively Dedicated... Posts: 3,200 Karma: 34977556 Join Date: May 2011 Location: JAPAN (US expatriate) Device: Sony PRS-T2, ADE on PC	Thanks guys, for clarifying all that. Like I said, I'm trying to learn this stuff by myself, and I guess I confuse myself often. Something to add to my notebook of HTML factoids! Thanks!!

Advert

Advert