View Full Version : Change code page


mota
11-09-2007, 06:56 AM
Hi.

Does anyone know if there is a way to change code page of txt or pdb files when viewing?

Thnx

NatCh
11-09-2007, 05:21 PM
Welcome to MobileRead, mota! :welcome:

I'm not sure what you mean by "code page" .... :headscratch:

jbenny
11-09-2007, 05:33 PM
Welcome to MobileRead, mota! :welcome:

I'm not sure what you mean by "code page" .... :headscratch:

He means the character set that is used to represent the binary data in the ebook. Some examples are "US-ASCII", "Windows-1252", "UTF-8", etc. Different code pages (character encodings) will display different characters for a given binary number. This is even more important for the display of foreign languages (non-western languages in particular).

This issue came up recently with regards to FBReader not displaying curly quotes and em-dashes correctly, due to it defaulting to a particular character encoding that displayed the curly quotes and em-dashes as little boxes. A later release of FBReader added a setting which allowed the user to set a default character encoding to correct this.

I noticed in someone's screenshot of a text ebook on the Cybook that some of the characters were not displayed correctly. This is most likely because of an incorrect character encoding. In other words, the Cybook does need a way for the user to change this.

NatCh
11-09-2007, 05:48 PM
That's what I suspected he might mean. I don't know of anyone getting nearly that far into the Reader's guts yet. :shrug:

DaleDe
11-09-2007, 06:33 PM
That's what I suspected he might mean. I don't know of anyone getting nearly that far into the Reader's guts yet. :shrug:

Actually, since you can load your own fonts the code page can be disregarded. You simply pick a font that has the correct mapping in the font characters you want. The binary code of the characters determines what gets display. This is true for all 8 bit codes but if you support internationalization code set (Unicode, etc.) then things get a bit tricky. Most books, however, are not Unicode ready.

Dale

NatCh
11-09-2007, 06:50 PM
:smack: Nevermind. :smack: I was talking about the wrong reading device. :smack: Please pay no attention to the man behind the curtain beating himself about the head and ears. :smack:

jbenny
11-09-2007, 10:14 PM
Actually, since you can load your own fonts the code page can be disregarded. You simply pick a font that has the correct mapping in the font characters you want. The binary code of the characters determines what gets display. This is true for all 8 bit codes but if you support internationalization code set (Unicode, etc.) then things get a bit tricky. Most books, however, are not Unicode ready.

Dale

If Bookeen is going to support epub, then they will need to worry about Unicode. Here is an excerpt from the OPS spec:

"Publications may use the entire Unicode character set, using UTF-8 or UTF-16 encodings, as defined by Unicode (see http://www.unicode.org/unicode/standard/versions). The use of Unicode facilitates internationalization and multilingual documents. However, Reading Systems are not required to provide glyphs for all Unicode characters.

Reading Systems must parse all UTF-8 and UTF-16 characters properly (as required by XML). Reading Systems may decline to display some characters, but must be capable of signaling in some fashion that undisplayable characters are present. Reading Systems must not display Unicode characters merely as if they were 8-bit characters. For example, the biohazard symbol (0x2623) need not be supported by including the correct glyph, but must not be parsed or displayed as if its component bytes were the two characters "&#" (0x0026 0x0023).

To aid Reading Systems in implementing consistent searching and sorting behavior it is required that Unicode Normalization Form C (NFC) be used (See http://www.w3.org/TR/charmod-norm/)."

IceHand
11-10-2007, 05:51 AM
Bookeen's FAQ says: "Please note that for text files, you will have to set a local information which specifies which is the language of the document."
I'm not exactly sure, what they mean by that, but I think you have to set the character encoding yourself -- I have no idea how to do that though. If it's not in the manual, you should ask them via e-mail.