05-20-2011, 09:02 AM | #196 |
Zealot
Posts: 111
Karma: 1003802
Join Date: Jan 2010
Location: NY
Device: Sony PRS-950
|
Yes, I think it does have something to do with Sony not supporting the unicode used in the book. It translates some symbols correctly but not all of them. The book looks ok on my computer's screen. It is for these reasons that I listed that point as number four. The lack of images and the issues with their references is more serious. Still, if there is a unicode compatibility issue I don't see why publishers can't use the ASCII character codes so we don't have this problem.
EDIT: I should say that the images definitely aren't there because I've exploded the e-pub and looked for them. In the past I've discovered missing images this way. I too am interested in what Penguin will say... Last edited by raac; 05-20-2011 at 09:14 AM. |
05-20-2011, 01:26 PM | #197 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
What do you mean with "ASCII character codes"? I guess it's one of these: 1) Use named or numerical entities instead of the Unicode character. For instance, instead of "Peña" write "Peña". This does not solve anything, no matter whether you use "ñ" or "ñ", the font does not have the character, and it shows a box or a question mark instead. 2) "Downgrade" to some similar character that is in the ASCII set, removing diacritics etc. For instance, instead of "Peña" write "Pena". This risks to be completely wrong and misleading, "peña" means rock, boulder, while "pena" means pain, sorrow. There's a reason why diacritics exist in most languages. |
|
Advert | |
|
05-20-2011, 02:35 PM | #198 | |
Zealot
Posts: 111
Karma: 1003802
Join Date: Jan 2010
Location: NY
Device: Sony PRS-950
|
Quote:
|
|
05-20-2011, 02:58 PM | #199 | |
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Really, the culprit here is Sony (for not allowing custom fonts on their readers) and Adobe (for not providing a better Unicode coverage in the default font). There's no excuse for any of them. |
|
05-20-2011, 05:49 PM | #200 |
Addict
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
|
Here's a program that can read in a UTF-8 encoded HTML file and replace the UTF-8 HTML codes with the exact extended ASCII equivalent. https://www.mobileread.com/forums/sho...d.php?t=109996
It's not just for that, it can be used to process any text file and swap any specific string(s) with other text string(s). It's written in C# and needs a bit more debugging because if the replacement list is too long it does things it should not do. As is, it can handle enough to swap the most common accented characters used in English, as well as the punctuation characters. Debugged to handle any length swap list, it could be a very useful text file manipulation tool. It's already faster than any word processor or text editor for doing huge numbers of replacements. With a full character set swap file (which it currently can't handle) one could use it for one time pad cipher codes. Could even run a file through several swaps to swap words for code words then totally scramble all the letters. The receiving person would need correctly formatted swap lists, used in the right order, to unscramble and decode. WTH use UTF-8 for punctuation when ASCII and ordinary character encodings for Windows and other systems have characters like left and right quotes that produce exactly the same visible result? Unicode for standard characters when there's no need is text-bloat. Replacing a couple thousand left and right unicode double quote marks with the left and right ASCII versions can reduce the file size quite a bit! A UTF-8 code is up to 7 characters, if leading zeroes are used. &#nnnn; One could write a whole text file that way but it'd be six times larger than using plain characters. Another method that mostly works on HTML source files is to Save As Filtered HTML from Microsoft Word, but that can introduce its own issues with Microsoft's 'additions'. |
Advert | |
|
05-21-2011, 03:31 AM | #201 | ||
frumious Bandersnatch
Posts: 7,516
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Code:
recode utf8..html file.html Quote:
Using Unicode characters means using some Unicode encoding to represent the character directly, not through entities like above, so I can just write "é" or "ñ". These, in UTF-8, take at most 4 bytes, and typically 2 bytes (for Latin, Cyrillic or Greek scripts) or 3 bytes (for some punctuation). But anyway, in ePUB all files are compressed, so the "bloat" introduced by the entities will be largely cancelled (since they are repetitive sequences, they can be more efficiently compressed). |
||
05-23-2011, 11:07 AM | #202 |
Zealot
Posts: 111
Karma: 1003802
Join Date: Jan 2010
Location: NY
Device: Sony PRS-950
|
Penguin have so far only sent me a stock reply, saying that they have forwarded my message on to the appropriate department and may contact me again. We'll see what happens...
|
05-24-2011, 09:11 AM | #203 |
Enthusiast
Posts: 38
Karma: 50000
Join Date: Mar 2010
Location: Lancashire, England
Device: none
|
I'm not a Kindle user, but I was surprised (shocked, dismayed) to see a full-page Kindle advert on the back cover of the Radio Times (a high profile weekly magazine in the UK) that displayed a page from 'Ordinary Thunderstorms', where the em-dash, or even the shorter en-dash, has been superceded by a hyphen. At first I thought 'river-all' was some obscure feature of a river, until, in the same sentence, I came to 'no doubt-but let's wait'. The next paragraph starts with, 'There he is-look-stepping hesitantly down from a taxi'.
I found this so distracting, I couldn't read on - even though it was only a single-page advert. There is no way that I would buy a Kindle if all their books are edited in this way. Am I alone in finding this disturbing? Or is it common practice in e-readers, which regular customers accept without complaint? |
05-24-2011, 10:54 AM | #204 |
Books are brain food.
Posts: 2,950
Karma: 4836916
Join Date: Nov 2010
Location: U.S.
Device: Paperwhite · Fire HD6/HD8/HD10 · Galaxy Tab A7
|
I just downloaded a sample of Ordinary Thunderstorms so I could see what you are talking about. Actually, those aren't hyphens. There are en dashes where there should be the longer em dashes. (If you still have the advertisement, compare the en dashes you referred to with the hyphens in "pale-faced" and "even-featured" if they show there.)
I find it very difficult to read that way too. I'm not sure why the publisher did that. It's very easy to code in em dashes. It was certainly a very poor example for Amazon to use in their Kindle advert. I have to say that I have not seen that in a Kindle ebook before. I've usually seen the proper em dash used, two hyphens together, or space-hyphen-space. Edited to add: When I created my husband's ebook, I did use the proper em dash. But there is a drawback, on the Kindle anyway. Kindle attempts to justify text, but it cannot hyphenate. Text is reflowable, so a publisher cannot control this either. If a line break occurs at an em dash (or an en dash), the Kindle cannot break it right after the dash, as you would see in print. Instead, it treats the word-em dash-word as a block and carries it all to the next line. This can leave a very unsightly space at the end, where the line broke. There's nothing that can be done about that. That's one reason why some people use space-hyphen-space instead of em dash in ebooks. (And others probably don't know how to create the em dash.) This doesn't explain why the publisher used the en dash instead of the em dash in the book you cited, but I wanted to point out that there are some related difficulties with ebook formatting. Last edited by DreamWriter; 05-24-2011 at 11:23 AM. |
05-24-2011, 11:25 AM | #205 |
Can one read too much?
Posts: 2,015
Karma: 2487799
Join Date: Aug 2010
Location: Naples, FL
Device: Kindle PW 3, Sony 350 and 650
|
Speaking of em dashes -- my last ebook had those instead of a final ess-apostrophe, so I was faced with things like " ... my parents -- car, the neighbors -- children" etc.
|
05-24-2011, 11:31 AM | #206 |
Books are brain food.
Posts: 2,950
Karma: 4836916
Join Date: Nov 2010
Location: U.S.
Device: Paperwhite · Fire HD6/HD8/HD10 · Galaxy Tab A7
|
|
05-25-2011, 12:15 AM | #207 |
Addict
Posts: 286
Karma: 7742186
Join Date: Apr 2007
Location: Idaho, USA
Device: Various PalmOS PDAs, Android Phones, Sharper Image Literati
|
The attached file is a text file with the UTF-8 codes and their extended ASCII or Windows-1252 equivalents. (Or ISO 8859-1.) Note that the non-breaking space has the HTML "friendly" code because that's a non-printable character, also non-type-able without using the Alt+nnn code. The HTML code works with any book conversion software I've used.
Any Unicode supporting system should *not* need any of these characters' Unicode versions or UTF-8 codes in order to properly display them. In fonts like Terminal, or the ANSI set (which Terminal is a monospaced TrueType clone of), some of the characters are different, but you won't encounter that on PDAs or book readers. If you want your book to reach the widest possible audience, without getting questions about why there's all those weird characters or boxes or why the punctuation is all missing or replaced with nothing and the words jammed together... use the normal characters on this list instead of their Unicode versions, or in HTML their UTF-8 codes. If the language you're using in your book has characters not in this list, then it's extremely likely the people reading it will have a device that supports Unicode or some other method of displaying those characters. The main reason for all these issues with character encoding is America's fault. Since the vast majority of personal computers are still based on Ye Olde IBM PC, which was originally designed by Americans for English speakers, support for "foreign" characters was pretty much an afterthought for MS-DOS and PC-DOS. A similar problem was built into the early Internet (which is *not* the World Wide Web), which in its early years was all American. All the characters required for English could be encoded using 7-bit words, so that's how it was done, leaving the one bit always assumed to be zero unless commands were sent to specifically initiate a binary file transfer. Remember that even mainframe computers 30+ years ago had memory measured in kilobytes. A system with a whole megabyte of RAM had a gigantic amount of memory to play with. That's why the BinHex encoding format was created for sending Macintosh files across the internet. Many of the early routing systems were set to ignore the leftmost bit so that all outgoing traffic had that bit set to zero, no matter what it had been when it came in. BinHex uses only 7-bit text characters, thus it would survive transits through 7-bit routers. The MacBinary format used 8-bit text characters and was up to 1/8th more compact, which was a big savings when a 3600 baud modem was "screaming fast" and there was no such thing as unlimited data accounts. So when you see weird junk in your books, first blame the English-centric American pioneers of the micro computer and the Internet, then blame the people at the company who made your reading device for not getting on the Unicode bandwagon from the start. In other words, there's really no excuse for Palm OS (or any other PDA or book reader) to not have Unicode support, since the first standard for it was completed circa 1990~91 and the first Palm didn't go on sale until 1996! Last edited by bizzybody; 05-25-2011 at 12:19 AM. |
05-25-2011, 11:05 AM | #208 |
Enthusiast
Posts: 38
Karma: 50000
Join Date: Mar 2010
Location: Lancashire, England
Device: none
|
Hi Dream Writer. So it isn't just me - I'm relieved to hear it. What I find unbelievable, is that a company like Amazon didn't spot that for themselves when they agreed the advert. Talk about compounding an error.
One thing you mentioned that I'd like to pick up on, is where you state 'Some people probably don't know how to create the em-dash'. You can count me in on that - I assume you're referring to MicroSoft Word. How I do it, is copy an em-dash from the text and then paste it where I want it. Alternatively, I pick one out of 'Symbols', then copy it for further use. Cumbersome, I know, but it's far better than having hundreds of en-dashes to convert during editing. If you know how to activate a consistent em-dash in Word, I'd be delighted if you could let me in on the trick. |
05-25-2011, 01:07 PM | #209 | |
The Grand Mouse 高貴的老鼠
Posts: 71,511
Karma: 306214458
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
Quote:
On Windows you probably need to do something complicated with the numeric keypad. (Checked: Probably Alt+0150 for en-dash and Alt+0151 for em-dash) http://en.wikipedia.org/wiki/Dash |
|
05-25-2011, 01:39 PM | #210 |
Grand Sorcerer
Posts: 27,552
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Those are the correct alt codes for the different dashes on Windows.
If you don't feel like memorizing alt codes (or writing them down) just bring up the Windows Character Map utility (Programs->Accessories->System Tools). It will allow you to select and copy any of the special (or unicode) characters so you can paste them into documents. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Looking for examples of typos in eBooks | Tonycole | General Discussions | 1 | 05-05-2010 04:23 AM |
typos or mistakes in ebooks | delcimai | Sony Reader | 15 | 02-14-2010 11:53 AM |
Typos during conversion | ddavtian | Calibre | 11 | 10-20-2008 12:57 AM |
eBooks and Typos | seldan | Reading and Management | 9 | 10-08-2007 12:35 PM |
ebook typos | sugarbear2403 | Sony Reader | 6 | 10-09-2006 11:47 PM |