View Full Version : How to correct word breakage in ePUB (Tamil font embedded)


Raja1205
05-08-2012, 03:41 AM
Could you help me, how to correct word breakage in ePUB (Tamil font embedded).

My problem is:
I have created one ePUB (TAMIL font "SHREE-TAM-0800.TTF" embedded) and loaded in Ipad ibooks. It displays well but some word unwantedly broken and gives bad reading experience at the end of each line. (Refer attached screenshots)

So kindly help me how can i solve this issue.


Thanks in advance for suggestions and help.

Doitsu
05-08-2012, 04:36 AM
Since Tamil is a relatively rare language. It'd help, if you:

- posted a short ePub excerpt
- clearly indicated in a screen capture where unwanted line-breaks occur and where they should occur

Try the following:

- open the ePub in Sigil (http://code.google.com/p/sigil/downloads/list) and ADE (http://www.adobe.com/products/digitaleditions/#fp) and other ePub readers
- double-check the language metadata in Sigil or your authoring tool
- check (http://validator.idpf.org/) the validity and well-formedness of the ePub

Toxaris
05-08-2012, 04:48 AM
Turn of hyphenation in iBooks.

Raja1205
05-08-2012, 09:17 AM
I have tried all the suggestions which you mentioned but still unwanted word/line break occurs.

Also attached sample ePUB for your reference.


Since Tamil is a relatively rare language. It'd help, if you:

- posted a short ePub excerpt
- clearly indicated in a screen capture where unwanted line-breaks occur and where they should occur

Try the following:

- open the ePub in Sigil (http://code.google.com/p/sigil/downloads/list) and ADE (http://www.adobe.com/products/digitaleditions/#fp) and other ePub readers
- double-check the language metadata in Sigil or your authoring tool
- check (http://validator.idpf.org/) the validity and well-formedness of the ePub

Doitsu
05-08-2012, 12:40 PM
I have tried all the suggestions which you mentioned but still unwanted word/line break occurs.

Also attached sample ePUB for your reference.

I had a quick look at your ePub and noticed that the source .html file is not Unicode encoded and moreover lacks a language declaration.
However, the ePub standard requires all source files to be encoded as UTF-8 or UTF-16 files.
You'll have to convert your source files to Unicode and embed a Unicode compatible Tamil font.

Raja1205
05-09-2012, 07:47 AM
Thanks for the reply.

I have updated Unicode encoding & language declaration in .xhtml file but no favorable result for unwanted word break. (Attached updated sample for your reference)

I have also attached screenshots for correct & unwanted words breaks example.

It would be more helpful to me if you could provide procedures for source files to Unicode conversion or any website link.
Once again thank you so much for your reply.

I had a quick look at your ePub and noticed that the source .html file is not Unicode encoded and moreover lacks a language declaration.
However, the ePub standard requires all source files to be encoded as UTF-8 or UTF-16 files.
You'll have to convert your source files to Unicode and embed a Unicode compatible Tamil font.

Doitsu
05-09-2012, 08:30 AM
Simply adding "encoding="utf-8"" to the .html file does not work. You'll need to actually convert the .html file to Unicode.
There seem to be several different Tamil code pages in use; you'll have to find out the encoding of your .html file and then use a converter to convert it to Unicode.
Simply google for Tamil Uncode converter and Tamil Unicode fonts and pick one that works.
Alternatively, save your source file as Unicode with your word processor/editor.
You could also try to copy the text to the clipboard, paste it into BabelPad (http://www.babelstone.co.uk/software/babelpad.html) and then save it as a Unicode text file.

BTW, even if you manage to convert your source file to Unicode there's no guarantee that this will solve your problem.
I wouldn't bet on ADE, because the current version is pretty limited when it comes to non-Latin alphabets. However, there's a good chance that a properly encoded Tamil ePub with Unicode .html source files might work at least on the iPad.

Toxaris
05-09-2012, 08:30 AM
Search for the HTML entity & shy; (remove space) in the code and remove those.

Raja1205
06-18-2012, 08:18 AM
Hi Doitsu, you are right.

As you said, I have found some solution.

By changing the actual content (i.e .html/.xhtml) into UNICODE text using UNICODE converter we can solve this unwanted word break issues.

Note: Fonts not required for the UNICODE formatted ePub and looks good in all devices.

mrmikel
06-18-2012, 08:53 AM
Thanks, Raja1205 for posting your solution. I didn't even know such things existed, but as I Google I see many many of them.

chittu
06-19-2012, 10:45 AM
Hi Raja, can you explain a bit more of your experience here. I was also working around on this, but I had some success

I converted html pages (from project madurai) to epub using sigil then I did embed with SUNDARAM tamil font. This works great on my nook.

I was trying to convert tamil pdf into epub using calibre, but the fonts are messed up and I can't see font any more on my system, still I did embed with S08000F0.TTF font that you have used on your test epub, but no success.

It'll be a great share if you can explain a bit brief on this issue.