Understanding foreign fonts and HTML conversion

06-21-2012, 04:19 PM
I was trying to convert a html page into epub book, but the output epub fonts are messed up. The source html has Tamil Unicode, I just saved the html page on my computer then I used Calibre to convert into epub (dragged .html file into Calibre and clicked on convert from zip to epub). Also, I embed unicode font (Latha) into epub, still I can't get back Tamil words (OS: macOS).

this is the page I converted (

On the other hand the same html can be converted into epub without messing the font using Sigil, but none of hyper link works in Sigil.

Pdf fonts

Also, I was trying to convert/copy tamil pdf files, but again I'm messing up with fonts.

Can anyone provide some source or idea; where can I get more info about this fonts like unicode/TSCII , how to find my source font (html/pdf) is unicode(utf-8/16/32)/TSCII or something else and also which one I need to have to work in my nook

thanks m8

06-21-2012, 07:06 PM
You may want to PM Raja1205 (, since he seems to have figured out how to create valid Tamil epubs.

Here are some general hints:

- By definition, epubs need to contain Unicode files only. I.e. TSCII source files and TSCII fonts cannot be used.
- Calibre might not be a good choice for converting .html files to epubs. It's better to generate the epub in Sigil.
- Make sure to set the correct language code in the Metadata dialog box.
- You'll need to embed a Tamil Unicode font for pretty much all epub readers, except for iPads.
- Font embedding can be tricky. Even if the epub looks OK in Sigil, it doesn't mean that font was correctly embedded. Make sure to click the green checkmark button to check the epub for errors. This should also identify any link errors.
- AFAIK, Latha is a Microsoft Tamil Unicode font that you cannot legally embed in epubs that you intend to distribute, because Latha was licensed for Windows only. Try to find a similar Open Source Unicode font.