Thread: Typos in ebooks
View Single Post
Old 05-25-2011, 12:15 AM   #207
bizzybody
Addict
bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.bizzybody ought to be getting tired of karma fortunes by now.
 
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
The attached file is a text file with the UTF-8 codes and their extended ASCII or Windows-1252 equivalents. (Or ISO 8859-1.) Note that the non-breaking space has the HTML "friendly" code because that's a non-printable character, also non-type-able without using the Alt+nnn code. The HTML code works with any book conversion software I've used.

Any Unicode supporting system should *not* need any of these characters' Unicode versions or UTF-8 codes in order to properly display them.

In fonts like Terminal, or the ANSI set (which Terminal is a monospaced TrueType clone of), some of the characters are different, but you won't encounter that on PDAs or book readers.

If you want your book to reach the widest possible audience, without getting questions about why there's all those weird characters or boxes or why the punctuation is all missing or replaced with nothing and the words jammed together... use the normal characters on this list instead of their Unicode versions, or in HTML their UTF-8 codes.

If the language you're using in your book has characters not in this list, then it's extremely likely the people reading it will have a device that supports Unicode or some other method of displaying those characters.

The main reason for all these issues with character encoding is America's fault. Since the vast majority of personal computers are still based on Ye Olde IBM PC, which was originally designed by Americans for English speakers, support for "foreign" characters was pretty much an afterthought for MS-DOS and PC-DOS. A similar problem was built into the early Internet (which is *not* the World Wide Web), which in its early years was all American. All the characters required for English could be encoded using 7-bit words, so that's how it was done, leaving the one bit always assumed to be zero unless commands were sent to specifically initiate a binary file transfer.

Remember that even mainframe computers 30+ years ago had memory measured in kilobytes. A system with a whole megabyte of RAM had a gigantic amount of memory to play with.

That's why the BinHex encoding format was created for sending Macintosh files across the internet. Many of the early routing systems were set to ignore the leftmost bit so that all outgoing traffic had that bit set to zero, no matter what it had been when it came in. BinHex uses only 7-bit text characters, thus it would survive transits through 7-bit routers. The MacBinary format used 8-bit text characters and was up to 1/8th more compact, which was a big savings when a 3600 baud modem was "screaming fast" and there was no such thing as unlimited data accounts.

So when you see weird junk in your books, first blame the English-centric American pioneers of the micro computer and the Internet, then blame the people at the company who made your reading device for not getting on the Unicode bandwagon from the start.

In other words, there's really no excuse for Palm OS (or any other PDA or book reader) to not have Unicode support, since the first standard for it was completed circa 1990~91 and the first Palm didn't go on sale until 1996!
Attached Files
File Type: txt all-the-codes.txt (2.6 KB, 109 views)

Last edited by bizzybody; 05-25-2011 at 12:19 AM.
bizzybody is offline   Reply With Quote