View Full Version : foreign characters not showing up?


sovre
07-22-2011, 05:42 PM
I have some epub formatted books with foreign characters. The texts look fine in Calibre, and also if read on my iPod Touch, but on my Sony 950 all of the foreign characters show up as question marks, which becomes quite distracting when I am trying to read. I'm trying to figure out if there is a way to fix this? Thanks in advance to anyone who can share the solution.

Toxaris
07-22-2011, 06:07 PM
The Sony should not have any issues with foreign characters. Are you sure that the encoding is UTF-8?

sovre
07-22-2011, 06:15 PM
I don't know, as I received the file already formatted. What tool would I use to verify this? Is there a way for me to convert it to Unicode 8 if necessary?

roger64
07-23-2011, 01:20 AM
I don't know, as I received the file already formatted. What tool would I use to verify this?

On Linux, you can use the command tool file which is intended to determine file type (see man file)
Set yourself on the path (using cd command), then use the following command:
file name_of_the_file
http://linux.die.net/man/1/file

There is one online converter using this command
http://www.cometdocs.com/file.htm

charleski
07-23-2011, 04:21 AM
The most likely answer is that your ePub doesn't have an embedded font. ADE's default glyph set is very narrow and only really suitable for books in English.

If you can edit the ePub, then you can embed a font yourself. Or alternatively you could load kartu's PRS+ firmware into your 950 (see the Sony Dev sub-forum on this board), which allows you to upload a default font of your choice to the reader.

Toxaris
07-23-2011, 10:40 AM
Sorry, I don't agree. I have character test epub which contains all characters used in west-european languages. All the characters display on my old PRS-300 without a problem. I haven't tried Eastern-European or Cyrrilic, so that could be the case.

For me embedding a font would be the last option, due to the extra size and possible copyright issues.

Sovre, what characters are you trying to display? Perhaps then we can identify the problem better.

charleski
07-23-2011, 03:06 PM
The default font covers enough of Latin Supplement 1 to get by in many Western European languages, but there are gaps and it has no support for Latin Extended-A.

sovre
07-23-2011, 08:47 PM
I am trying to read an English translation of a Pali text. The proper names in Sanskrit which have diacritical marks are the ones which cannot be displayed.

I have PRS+ loaded as the firmware for my reader, but I do not know how to upload a new font?

If you could explain how to do this or send me a link to instructions, I'd appreciate it!

I notice in the PRS+ settings there is something called "User EPUB style (CSS file)" under Book Viewer Settings, but I'm not sure what this means or what its intended use is.

charleski
07-24-2011, 05:45 AM
Download the files I've attached. Hook your reader up to the computer via USB and open the READER drive. Make a new directory called fonts in the top level of that drive, then open the CharisSIL zip file and copy the four .ttf files it contains into that directory. Go back to the top level of the READER drive, and go down to database/system/PRSPlus/epub . Copy the CharisSIL.css file into that directory. Close the Sony Reader app if it auto-launched, and then eject the READER drive from the computer and disconnect it when it says it's safe to do so.

If you now go to User EPUB Style you'll have an option for CharisSIL. Select that and the font will be applied to every book you open that doesn't have a font embedded itself. If you already have the book open that you want to read, open a new one, then go back to the home screen and open the book you're reading.

sovre
07-24-2011, 09:23 PM
Hi Charleski,

Thanks for taking the time to give me these clear instructions. I followed all you said, and now have that font selected as my epub style.

Only one problem: I am still getting a "?" for the letters which have diacritical marks in that text.

Oddly enough, I did notice this: the font has changed, but it changed in texts which were fine and not giving me a problem. And the texts which use the new font do not appear to be as readable as those using the old font. The paragraphs look somehow "clumped together," instead of there being a bit of breathing space between the lines as before.

sovre
07-24-2011, 11:45 PM
While reading another thread on another subject I got an idea about this problem.

And then I tried this:

I copied the epub file and pasted it into MS Word, and then saved it as an rtf file. I then converted the rtf file to epub using calibre and sent it once again to my Sony.

Now all words show up fine, with proper accenting. No more question marks.

But the solution is not ideal because some of the formatting was lost, and I have footnote numbers showing up in odd places and causing paragraph breaks where they should be none.

charleski
07-25-2011, 06:59 AM
Only one problem: I am still getting a "?" for the letters which have diacritical marks in that text.

I copied the epub file and pasted it into MS Word, and then saved it as an rtf file. I then converted the rtf file to epub using calibre and sent it once again to my Sony.

Now all words show up fine, with proper accenting. No more question marks.

This suggests that the problem actually lies in the encoding, as others mentioned earlier in the thread. As you've discovered, calibre's automatic conversion can easily break if there's any problem in the source. Unzip the epub and extract one of the html files, then open that in Notepad++ (http://notepad-plus-plus.org/). Can you still see the proper diacritics in the text? Clicking on the Encoding menu, does it say 'Encode in UTF-8 without BOM'? What is the first line of the file? It should be something like
<?xml version="1.0" encoding="utf-8" standalone="no"?>


Oddly enough, I did notice this: the font has changed, but it changed in texts which were fine and not giving me a problem. And the texts which use the new font do not appear to be as readable as those using the old font. The paragraphs look somehow "clumped together," instead of there being a bit of breathing space between the lines as before.
Yeah, this is a problem with the font. Charis supports an extremely wide range of glyphs, has bold and italic variants, and is free, so I could post it here. But it was designed to be compact, and therefore doesn't have its height metrics set properly (they've released an even more compact version, which is worse). If you look around at the fonts you have installed on your system you may well find one that will work better, just copy those over (you'll need normal, italic, bold, and bold-italic variants, which will be different files) and edit the file names in the css file.

sovre
07-25-2011, 02:52 PM
Ok I downloaded Notepad ++.

Yes it says Encode in UTF-8 without Bom under "encoding"

The first line is: <?xml version='1.0' encoding='utf-8'?>

But I don't know how to access the text itself using this text editor. I only seem to be able to see information about the encoding and fonts. Opening the text in another text viewer I have which displays the entire text, I can see the diacritical marks fine.

Wouldn't this indicate the encoding of the text is ok? Or is there a problem with it?


Also, are there any Unicode 8 fonts you recommend as being clear and readable with the Sony Reader? And is there some kind of basic template I can use for creating a css file to go with the font (I've never done this before and don't know how!).

Thanks.

Toxaris
07-25-2011, 04:10 PM
Can you give an example of the characters you want to display that give a question mark? Give a few examples, if you like. Some simple, some exotic (your interpretation of course). Perhaps we can give a better advise then. I could add it to my character test epub and see what the result would be on my Sony.

sovre
07-25-2011, 08:48 PM
I've uploaded the file--I think that will make it easier for you to diagnose the problem, because you can see how it displays on your own Reader.

charleski
07-26-2011, 04:10 AM
Hah - calibre strikes again.

It is a font issue. The problem lies in the book's css:

.calibre {
display: block;
font-family: "DejaVu", sans;
font-size: 1em;
margin-bottom: 0;
margin-left: 5pt;
margin-right: 5pt;
margin-top: 0;
page-break-before: always;
text-align: justify
}

Remove the line saying
font-family: "DejaVu", sans;
since this is forcing your reader to use its default sans-serif font rather than the new font, which was defined as the default serif. If you want a font that's a bit more readable you could try Gentium (http://scripts.sil.org/cms/scripts/page.php?item_id=Gentium_download#801ab246), though it doesn't have a bold weight.

sovre
07-26-2011, 04:33 PM
I used the epub tweaker to remove that line and the epub is indeed displaying correctly now. What a relief. Thanks a lot for your help and guidance.

I also downloaded the Gentium font, and would like to try it out.

How do I make a css file for Gentium or any other font I may want to use with my reader, and how do I know what should be written within it? is there a website with instructions that would walk me through the process? I don't have much technical knowledge about these things. This is the first time I have even opened and looked within an epub file!

charleski
07-26-2011, 05:15 PM
Put the ttf files for Gentium in the same place as Charis, then make a copy the css file and replace the filename of the fonts with the filename of the Gentium fonts (same path) and save it as Gentium.css.

JSWolf
07-29-2011, 12:10 PM
The problem you now have is that you used Calibre to convert an ePub that contains code for an embedded font. That is a big mistake as Calibre botches the code for embedded fonts in an ePub > ePub conversion. Now you have font code in every XML file when it belongs just once in the CSS. Once you have an ePub, don't use Calibre to convert it yet again to ePub. Calibre tends to do some things that are not good like moving embedded font code from the CSS to every XML file.

If I was going to edit my ePub, I'd use Notepad++ or you could use Sigil. But either way, don't convert ePub > ePub.