View Full Version : Creating epub books using Bangla fonts


Nabodita
07-26-2013, 09:55 AM
Hello

I'm working on this project to create ebooks in Bangla. I've decided to start with the epub format because most of the ereaders on the market support epub... besides, I had a chat with Amazon and they assured me that the Kindle does not support Bangla font and they have no workaround for this.... So, epub.

Now obviously, my first concern is that the font has to display correctly on (most) e-readers otherwise this venture is a no-go from the start... From what i understand (and I hope I'm wrong) many e-readers don't support embedded fonts and even if they do, the user can override the formatting specified in the css file.

So, a couple of queries which I haven't been able to solve by googling (although I may be asking the wrong questions...):

1. Is bangla UTF-8 or UTF-16? UTF-8 appears to be declared by default in the content.opf file.. (I suppose this a good time to mention that I don't know xml; I've signed up on Lynda to do an introductory course and I'm also trying to find a good ebook on creating and editing epubs from scratch)

2. Can I get an e-reader to display the font I will embed in the css file? I created an epub and tested it on an iPad 3 and it displayed perfectly. However, I took a look at the css file and found that the font was referenced but the font file itself was not included in the epub (I converted using calibre).. Now, I won't use the iPad as a benchmark because in my uninformed opinion, apple products can do a lot of things that others can't!

After some research, I've decided to use Pages 2009 to create and convert the document to epub and Oxygen as the XML editor for tweaking the epub.

Any answers, comments, discussions or references would be sincerely
appreciated. I will, of course, be asking for help very frequently since I'm really a noob at this! :rolleyes:

Thanks in advance
Nabodita

Toxaris
07-26-2013, 01:51 PM
If you want a book, get the book from Liz Castro. Not everything is still valid, but the core is good. You can also use the wiki of this site and the site of Jedisaber.

Adding fonts can be tricky, especially if you use iBooks. You need to add a file especially for iBooks to force it to display fonts other than the build in ones. iBooks is one of the worst programs to test an ePUB, because it ignores quite some things from the standard and actually uses its own.

Nabodita
07-26-2013, 02:24 PM
Oh dear... I was actually quite excited when the book displayed correctly in the iPad. Thanks for the heads-up. BTW, Adobe Digital Editions doesn't seem to support Bangla language.

I'll get the book by Liz Castro right away and look up the wiki and Jedisaber's site. In the meantime, any information on forcing e-readers to read embeded fonts of complex scripts would be really appreciated...

Thanks, Toxaris, for the prompt reply!

Cheers
Nabodita

Toxaris
07-26-2013, 03:31 PM
If Bangla (form of Bengali?) is an LTR language (which I believe it is), it is supported. However, you need to add a font yourself as you have found out.

There are several topics here on mobileread about adding fonts. In general it comes down to adding the font to the ePUB and add references to it in the stylesheet. In the stylesheet you need to add an @font-face and in your elements a reference to the font-familiy.

For iBooks you need to add a 'com.apple.ibooks.display-options.xml' file to the ePUB.

Liz Castro also has a blog where she places info every now and then. There are also topics about fonts there.

Doitsu
07-26-2013, 04:00 PM
I had a chat with Amazon and they assured me that the Kindle does not support Bangla font and they have no workaround for this.... So, epub.

AFAIK, Amazon does not officially support Bengali Kindle books and you cannot embed custom Bengali fonts, but at least all current eInk Kindles (K3 and higher) should support them, because the Bengali Unicode range is covered by the system fallback font.

I just did a quick test with this newspaper article (http://www.ittefaq.com.bd/index.php?ref=MjBfMDdfMjdfMTNfMV83XzFfNTk1Mzg=) and it appears to display fine on my old Kindle K3. You may want to double-check the display on your Kindle Fire, though.

1. Is bangla UTF-8 or UTF-16? UTF-8 appears to be declared by default in the content.opf file.
You should declare the language code in the metadata section of the .opf file to ensure that readers with Indic languages support render the Bengali text correctly. (All source files should be utf-8 encoded.)

<dc:language>bn</dc:language>

You can embed custom fonts for readers that support them, but, AFAIK, you'll need to add an additional .xml file to epubs for iPads. For more information see this blog post (http://www.pigsgourdsandwikis.com/2011/04/embedding-fonts-in-epub-ipad-iphone-and.html).
Note that most available eInk epub readers are based on a mobile version of ADE (http://www.adobe.com/de/products/digital-editions.html) and cannot handle Indic vowel signs correctly. (You can test the display of your epubs on non-Apple devices by opening them with the ADE desktop edition.)

After some research, I've decided to use Pages 2009 to create and convert the document to epub and Oxygen as the XML editor for tweaking the epub.

Many ebook designers use Sigil (http://code.google.com/p/sigil/downloads/list). You might want to give it a try, too.

(I created the test .epub file with it in a couple of minutes.)

DomesticExtremis
07-26-2013, 09:31 PM
ISigil is the way.

If you are going to embed fonts, be sure to use open source ones or ones that have an
open licencse like those from Summer Institute of Language(SIL (http://www.sil.org/resources/software_fonts)).

Gnu Free Serif/Sans/Mono seem to cover Bengali

Nabodita
07-27-2013, 01:55 AM
Ok

I'm going to process all this information and get back to you...

Thanks so much!

Cheers
Nabodita

Jellby
07-27-2013, 02:49 AM
If Bangla (form of Bengali?) is an LTR language (which I believe it is), it is supported. However, you need to add a font yourself as you have found out.

According to Wikipedia, Bangla is just Bengali. As an Indic script, I guess it uses extensive ligature and glyph substitution, which are encoded in the font files themselves. Therefore support for this language (even if it's LTR) depends on the support for different font features, which may be different from reader to reader.

Toxaris
07-27-2013, 03:12 AM
You are right, I hadn't thought about the glyphs.

Doitsu
07-27-2013, 04:42 AM
@DomesticExtremis: Usually, SIL fonts are the way to go, but it seems for Bengali, other sites offer more free fonts. In my test file, I embedded the Mukti Narrow (http://www.nongnu.org/freebangfont/downloads.html) font, which has been released under the GPL license.

As an Indic script, I guess it uses extensive ligature and glyph substitution, which are encoded in the font files themselves.

The problem with Indic languages is that most of them have some vowel signs that logically follow the the base letter but need to be displayed left of it (e.g. Bengali vowel I (http://www.fileformat.info/info/unicode/char/9bf/index.htm)); in Bengali some of these vowels signs apparently also change their shape.

You can easily see these rendering issues if you compare how both ADE and Webkit render the first random heading that I included in the test file. The ADE version is much longer, because ADE cannot handle vowel placement and shaping correctly.

Nabodita
07-28-2013, 04:47 AM
Bought the book by Liz Castro but haven't started on it yet.

@Doitsu
Looked at your sample epub; the text in the xhtml file is rendering perfectly but the Table of Contents is not (at least, not in Sigil)... the vowel signs have shifted position and the ligatures are not displayed at all; rather they're showing up as individual letters accented by the 'halant'... from what I've been reading, this is probably a problem with the rendering engine...

Will look into it and keep everyone updated.

Sincerely appreciate all the responses :thanks:

Nabodita

Nabodita
07-28-2013, 05:05 AM
@Doitsu
Just tried to create an epub in Sigil: Added the font I'm using (Vrinda) to the Fonts folder, created a basic stylesheet using your code and modified the content.opf to include the language tag. However, when I try to copy/paste some Bengali text from Word, it's not displaying in the xhtml file... just a lot of boxes... What am I doing wrong?

Doitsu
07-28-2013, 06:20 AM
Looked at your sample epub; the text in the xhtml file is rendering perfectly but the Table of Contents is not (at least, not in Sigil)... from what I've been reading, this is probably a problem with the rendering engine...

Sigil uses the Qt engine, which has some known issues with Indic languages. If the TOC displays fine on your iPad, you'll probably have to live with this limitation.
(You can ask the Sigil team whether this can be fixed, but unless it's an easy fix, they most likely won't fix this.)

@Doitsu
Just tried to create an epub in Sigil: Added the font I'm using (Vrinda) to the Fonts folder, created a basic stylesheet using your code and modified the content.opf to include the language tag. However, when I try to copy/paste some Bengali text from Word, it's not displaying in the xhtml file... just a lot of boxes...
If you don't see any Bengali text in Book View mode, you most likely forgot to link a stylesheet to your .html files. (Select the .html files in the Book Browser, right-click them and select Link Stylesheet from the popup menu.)

Nabodita
07-28-2013, 07:40 AM
You most likely forgot to link a stylesheet to your .html files.

:smack: Yes I forgot that, didn't I.

Well, onwards I go; will get back with updates.

Cheers
Nabodita

Nabodita
08-10-2013, 01:46 PM
Just wanted to post an update:

Using Liz Castro's book, I've finally got a test epub ready; I still need to tweak a couple of things though;

In the metadata section of the content.opf file I've declared the name and author of the book in Bangla; is that a bad idea? What I'm trying to ask is will that affect the searchability of the book when I finally upload it for distribution? My guess is yes, but I just want to be sure...

Another question: Is there any way I can use embedded fonts to display the TOC? So far I've tested the sample epub on iPad 3, Aldiko on my android phone and ADE on my laptop. There doesn't appear to be any problem with the book in iBooks but Aldiko is rendering the font in the TOC incorrectly and ADE is displaying boxes where the TOC entries should be. Also, ADE is displaying boxes for the title and author as well.... I take this to mean that the font embedding has worked but since ADE does not natively support Bangla (I confirmed this earlier by testing without embedded fonts but having the font installed on my system), the TOC is displaying incorrectly.

Any thoughts or ideas would be welcome!

Cheers
Nabodita

seanos
08-11-2013, 12:05 AM
Another question: Is there any way I can use embedded fonts to display the TOC? So far I've tested the sample epub on iPad 3, Aldiko on my android phone and ADE on my laptop. There doesn't appear to be any problem with the book in iBooks but Aldiko is rendering the font in the TOC incorrectly and ADE is displaying boxes where the TOC entries should be. Also, ADE is displaying boxes for the title and author as well.... I take this to mean that the font embedding has worked but since ADE does not natively support Bangla (I confirmed this earlier by testing without embedded fonts but having the font installed on my system), the TOC is displaying incorrectly.

It’s not ideal, but after you create your TOC you could add one in HTML. In Sigil...Tools | Table Of Contents | Create HTML Table Of Contents. This will insert a page of links into your epub which you can format like any other text.

The ‘virtual’ TOC depends on the system fonts on the device so there’s not much you can practically do about that.

Nabodita
08-11-2013, 01:16 AM
The ‘virtual’ TOC depends on the system fonts on the device so there’s not much you can practically do about that.

That being the case, if I can install the font, say on my android phone, then the TOC should display correctly? I know that's not a solution for mass distribution; my curiosity in this case is academic...

ADE definitely will not display the TOC correctly since I already have the font installed in my system...

Another question: When I view the epub in Aldiko or ADE I am unable to change the font size. Is this because I have declared the font size in px in my stylesheet? (I was reading somewhere on this forum that if you declare the font size in absolute terms, it will not resize...) Or am I back to font rendering issues?

seanos
08-11-2013, 01:23 AM
That being the case, if I can install the font, say on my android phone, then the TOC should display correctly?

If you can figure out what font the reading program on your phone is using and then replace that font, it would probably work, but it might cause problems for other apps depending on your replacement font.

You would also need root access to your phone’s font folder of course.

seanos
08-11-2013, 01:29 AM
Another question: When I view the epub in Aldiko or ADE I am unable to change the font size. Is this because I have declared the font size in px in my stylesheet? (I was reading somewhere on this forum that if you declare the font size in absolute terms, it will not resize...) Or am I back to font rendering issues?

Don’t think I’ve tried setting a size, but that’s probably the cause. If you’re worried about the default being too small or big you could also experiment with font-size: smaller or font-size: larger in your stylesheet.

I’ve recently come across a book where the text was all set font-size: small. For most reader devices this isn’t too much of an issue since you have a wide range of font sizes to choose from.

In general it’s probably best to avoid explicit font sizes though as readers will change them.

Nabodita
08-11-2013, 01:31 AM
If you can figure out what font the reading program on your phone is using and then replace that font, it would probably work, but it might cause problems for other apps depending on your replacement font.

You would also need root access to your phone’s font folder of course.

Hmm, no; I don't think I want to mess around with the fonts after all; I guess that means I have to go with the html TOC.

you could also experiment with font-size: smaller or font-size: larger in your stylesheet.

Ok, I'll give that a go and report back.

Nabodita
08-11-2013, 02:00 AM
Edited my stylesheet as follows:

Used font-size=larger for my chapter headings
Used font-size=medium for my text

The headings and text is displaying and resizing correctly in ADE and Aldiko. Seems that the Chapter Headings don't become *noticeably* bold when I increase the font size... wonder if that's a glitch or it is bold and I can't tell the difference!

Thanks
Nabodita

P.S.: Will get back with further updates.

Nabodita
08-11-2013, 03:55 AM
In the metadata section of the content.opf file I've declared the name and author of the book in Bangla; is that a bad idea? What I'm trying to ask is will that affect the searchability of the book when I finally upload it for distribution? My guess is yes, but I just want to be sure...

Could someone please respond to this? I don't think I can have multiple dc tags for title and contributor - one set in bangla and one set in English... or can I?

Also, if I create an html TOC, I assume I will then have 2 TOCs in my ebook... is that right? Can I force ADE to display the html TOC instead of the toc.ncx in the side panel?

I also wanted to say thank you to everyone who's taken time to read and respond to this topic; I couldn't have come so far without all of you!

Nabodita

seanos
08-11-2013, 04:06 AM
Also, if I create an html TOC, I assume I will then have 2 TOCs in my ebook... is that right? Can I force ADE to display the html TOC instead of the toc.ncx in the side panel?

No ADE will only display toc.ncx in that way.

There might be readers that would display it differently (since that’s the kind of TOC MOBI format has), but I wouldn’t count on it.

Could someone please respond to this? I don't think I can have multiple dc tags for title and contributor - one set in bangla and one set in English... or can I?


Well I think you can add multiples of some tags (e.g. Author) but I think there can only be one Title. If the book is in Bangla wouldn’t it make sense for these fields to be in Bangla too?

Hopefully, someone more knowledgeable will appear once the USA comes online! ;)

Nabodita
08-11-2013, 04:17 AM
If the book is in Bangla wouldn’t it make sense for these fields to be in Bangla too?

Exactly! But what happens when I upload the book for distribution? Can someone search for it? Me guesses not...

Will work on the html TOC and report back with queries and results. Thanks seanos!

Jellby
08-11-2013, 07:16 AM
Could someone please respond to this? I don't think I can have multiple dc tags for title and contributor - one set in bangla and one set in English... or can I?

You can, and it's valid (and somehow even recommended), but I doubt there's any software that supports that.

For example, in La divina commedia I included the title in several languages, because the book actually includes multiple translations:

<dc:title xml:lang="it">La divina commedia</dc:title>
<dc:title xml:lang="de">Göttliche Komödie</dc:title>
<dc:title xml:lang="en">The Divine Comedy</dc:title>
<dc:title xml:lang="pt">A divina comédia</dc:title>
<dc:title xml:lang="ru">Божественная комедия</dc:title>

but ebook readers and managers will usually display only the first or the last of them.

In your case, maybe what you want is not the title in two languages, but in two different writing systems, I don't know if you can mark this with any attribute. Something similar happens with the author, you can evidently have several authors, but I don't think there's a way to specify that several different instances refer to the same person.

Doitsu
08-11-2013, 02:43 PM
Another question: Is there any way I can use embedded fonts to display the TOC?

AFAIK, the NCX TOC font is reader specific and cannot be controlled by adding code to the epub. However, you can use the title attribute in Sigil to have Sigil use a number instead of the heading text when generating the NCX TOC. For example:

<h3 title="1">অধ্যায় এক</h3>

The headings and text is displaying and resizing correctly in ADE and Aldiko. Seems that the Chapter Headings don't become *noticeably* bold when I increase the font size... wonder if that's a glitch or it is bold and I can't tell the difference!

Did you embed both regular and bold Bengali fonts? If not, you won't be able to display bold text.

BTW, I'm somewhat surprised that you're still testing your book with ADE. I thought that I had established in this post (http://www.mobileread.com/forums/showpost.php?p=2579563&postcount=10) that ADE does not support Bengali text.

Nabodita
08-12-2013, 02:22 AM
Did you embed both regular and bold Bengali fonts? If not, you won't be able to display bold text.

You're absolutely right, I did forget to embed the Bold font.

BTW, I'm somewhat surprised that you're still testing your book with ADE. I thought that I had established in this post that ADE does not support Bengali text.

True. Inspite of having the font installed on my system the TOC and the cover (where I only had the title and author in Bangla) were displaying boxes. However, by embedding the font and using "!important" with the font-family declaration in the stylesheet, it seems the Bangla font is displaying correctly, including glyphs, in the actual text of the book. Strange, I know! I've posted a screenshot of a preliminary test file... do take a look.

you can use the title attribute in Sigil to have Sigil use a number instead of the heading text when generating the NCX TOC.

This is an option I hadn't considered; I'll give it a shot. As seanos suggested, an html TOC is also an option, I suppose, although its not a very tidy solution...

Another thing I've figured out is that I may have to write separate epub files for different readers. For example, Aldiko on my android phone displays the ligatures incorrectly in the title and TOC... the text is fine if I use the font embedding trick I mentioned above. I assume this is because my HTC phone doesn't support Bangla natively. However, the same test file displays everything perfectly in Aldiko on my friend's Samsung Galaxy Tab 2.

The testing process is going to have to be long and extensive...

@Jellby: I did what you suggested. Since I want iBooks to display the title and author in Bangla on the top of each page, I tested and found that iBooks displayed the latter entry. Unfortunately, when I search for the book using iBook's search function, it doesn't show up.

Could you elaborate about the writing system? I can google it if I know what I'm looking for!

Thanks for all the inputs...
Nabodita

Jellby
08-12-2013, 04:04 AM
Inspite of having the font installed on my system the TOC and the cover (where I only had the title and author in Bangla) were displaying boxes. However, by embedding the font and using "!important" with the font-family declaration in the stylesheet, it seems the Bangla font is displaying correctly, including glyphs, in the actual text of the book. Strange, I know!

Not that strange. ADE simply doesn't use system fonts (as far as I know). For the book's content, it just uses its default font or the embedded fonts in the book (if correctly coded and if the planets are in the right places). For titles and TOCs, it's only the default font.

Could you elaborate about the writing system? I can google it if I know what I'm looking for!

Let's see... One thing is language, a different thing is writing system. English, Spanish, French, German... all use the same writing system (Latin script, mainly). Greek uses a different script, Russian uses yet another one, but Bulgarian uses the same as Russian (Cyrillic).

In Greek, we can say that Ὅμηρος wrote Ὀδύσσεια.
Still in Greek, but in Latin script (transcription/transliteration): Hómēros wrote Odýsseia
And translated into English: Homer wrote the Odyssey
Translated into Spanish: Homero wrote la Odisea
etc.

So, do you want to translate the title into English or keep it in Bengali, but written with Latin letters?

Nabodita
08-12-2013, 05:09 AM
There are two issues here.

One is making it possible for people to search for the book on the site where its been uploaded. I don't know how the search engine (say on Amazon) will retrieve the book from its database if the title and author are meta tagged in Bangla. Or maybe the search engine uses the actual filename of the epub; that would be Bangla written in Latin script. I don't have any idea about this whatsoever.

The second is that once someone has loaded the epub onto their reader, they should be able to search for the book by title or author. That is not possible if the title and author are coded in Bangla script. (Already tried that).

Furthermore, in iBooks the name of the author is displayed at the top of the right side page while the title is displayed on top of the left side page on every page of the actual text. This will display in Bangla script if it is written in Bangla in the meta tag. However, you can no longer use the search function to find the book, either by title or by author. It simply doesn't come up in the results.

It looks like I'll have to write the meta tags in Bangla using the Latin script. Which is such a pity because I'm mildly obsessive and I really think that if its a Bangla book, everything that I see on the screen of my ereader should be in Bangla. Nevertheless, unless someone has a brilliant workaround for this, I'm stuck with using the Latin script.

Thanks for the explanation, Jellby...don't need to google it after all! :p

Jellby
08-12-2013, 05:40 AM
It looks like I'll have to write the meta tags in Bangla using the Latin script. Which is such a pity because I'm mildly obsessive and I really think that if its a Bangla book, everything that I see on the screen of my ereader should be in Bangla. Nevertheless, unless someone has a brilliant workaround for this, I'm stuck with using the Latin script.

What you need is an OS and reader that supports non-Latin scripts in general, and Bengali in particular. That alone is, I guess, not easy to find, but if you want to make it available in a web store, then you need their systems to have the same support, which is utopic at the moment, I'm afraid, unless it's a store located and developed in a Bengali-speaking area, maybe.

Nabodita
08-12-2013, 07:33 AM
What you need is an OS and reader that supports non-Latin scripts in general, and Bengali in particular.

As far as I can tell, the present iOs supports Bengali natively; my epub displays fine without font embedding. In fact, the default Bangla font installed on iPad 3 / iPhone 4 is "Bangla Sangam MN". Beyond that, there doesn't seem to be much hope at the present.


:offtopic: Unfortunately, reading is a dying habit. It is fortuitous that books have received a new lease on life with the advent of the ebook. Because of its increasing popularity publishers across world are releasing digital editions of paperbacks. Many authors are also choosing to self publish their work in digital form.

Bengali literature is vast and colourful but possibly due to the technological obstacles, no one is publishing ebooks in Bangla. What I want to do is start the ball rolling; hence this thread. Maybe in 5 years time if there is sufficient demand for Bangla books, support for Bangla will become the norm. Till then, I'm afraid, I have to rely on extensive testing, workarounds and compromises. /end rant!

Anyway, I'm still experimenting and looking at workarounds... I'll keep posting my queries and keep everyone updated on my progress. I hope this thread helps others who want to work with non-Latin writing systems!

Cheers
Nabodita

mashru
08-13-2013, 06:29 AM
Some of us have been testing Indic ebooks on different devices. Most of the experiences reported here are in line with our experiences.

Here are a few suggestions:
1. Ipad: Do not use embedded fonts but let it use its native font. iOS uses ATT for font definition and rendering. Most .TTF fonts dont have correct ATT tables for complex characters.
2. Android. Aldilko and Bluefire need embedded fonts and they do a good job of displaying Indic characters. They work correctly under android OS versions > 4.
3. Kindle devices: Although indic script is not supported, basic unicode characters are supported. However, numerous complex characters as well as left vowel sign i render incorrectly. At the moment, Kindle devices are not usable. Kindle app on various OS have different set of issues. We have decided to not focus on Amazon app and devices except to test for basic compatibility.
4. Sigil display is mostly correct but you can not rely on it. You must test each book on different device. Calibre reader is good.
5. TOC: HTML TOC can be made to use either native (iOS) or embedded fonts (android). Other TOC for iOS renders complex characters incorrectly. Android doe snot render any characters.
6. Metadata: Both Ibookstore and indie stores such as smashwords use its own meta data specified when you uploaded the book. They do not use metadata embedded in the book for search. Hence you may use Indic script within epub and latin when uploading the book. Do not use indic script when uploading the book because no one will be able to search for the book!
7. Since Android needs embedded font and iOS does not want it, there are techniques for creating one epub file that can be used on both and can be used to create MOBI file.
8. If you are embedding fonts, you could embed non-unicode fonts (I hate it but many have done that because their original content was in non-unicode font).
9. We have most experience with Gujarati ebooks and have tried some devnagari scripts. I believe Bangla will have similar issues. You can see fully working epub books at ekatrafoundation.org site or at smashwords. They are free -- we are following Project Gutenberg's model as in running a non-profit organization.

I will be happy to discuss this further and share workarounds and templates that we have developed. We are too experimenting and learning as we create more books. You could write to me mashru2 [at] gmail [dot] com.

Nabodita
08-14-2013, 03:32 AM
@mashru
I will be in touch as soon as I have finished the test file. You are a godsend! Thank you...

Nabodita
08-16-2013, 05:34 AM
2. Android. Aldilko and Bluefire need embedded fonts and they do a good job of displaying Indic characters. They work correctly under android OS versions > 4.


Using Aldiko v2.2.3 and Android v4.1.2 for testing a file. Inspite of having publisher formatting enabled, margin/padding and <br/> is being ignored. Font embedding and display is fine. See this post. (http://www.mobileread.com/forums/showthread.php?p=2595511#post2595511)

Any help appreciated! :rolleyes:

Thanks
Nabodita