View Full Version : Regional language (kannada) epub convesion.


anandudapudi
01-03-2013, 02:39 AM
Hi all,

I just want to know how we can get regional language text exactly copied in html or xhtml page.
i found out two cases here.
1. I copied web page content to html page by keeping encoding type UTF-8 without BOM. It looked like this.

ಪಾತ್ರೆಯಲ್ಲಿ ಎಣ್ಣೆ ಕಾದನಂತರ ಮೊದಲು ತೊಗರಿಬೇಳೆ ಹಾಕುವುದು. ಬೇಳೆ ಕೆಂಪಗಾಗುತ್ತಿದಂತೆ, ಮೆಣಸು,ಶುಂಠಿ,ಬೆಳ್ಳುಳ್ಳಿ,ಈರುಳ್ಳಿ ಒಂದರನಂತರ ಒಂದನ್ನು ಹಾಕಿ ಎಣ್ಣೆಯಲ್ಲಿ 3-4 ನಿಮಿಷ ಹುರಿಯಬೇಕು. ಈ ಮಿಶ್ರಣವನ್ನು ಮಿಕ್ಸರಿನಲ್ಲಿ ಹಾಕಿ,ಕಾಯಿತುರಿ,ಹುಣಸೆಹಣ್ಣು,ಉಪ್ಪಿನೊಂದಿಗೆ ಹದಕ್ಕೆ ಬೇಕಾಗುವಷ್ಟು ನೀರು (1 ಬಟ್ಟಲು) ಹಾಕಿ ನುಣ್ಣಗೆ ರುಬ್ಬಬೇಕು.ಕೊತ್ತಂಬರಿ/ಪುದೀನಾ ಸೊಪ್ಪು ಕೂಡ ರುಬ್ಬುವ ಮುಂಚೆ ಹಾಕಬಹುದು.

2. I copied one more from kannada pdf, with keeping sme encoding type as above, but it looked like this.:rolleyes:

~Kdg }}| ~tX~Kd[{ {Xgg }pm} v
~Kdg E. d uC} {p d Iyb yⰪ. {d}Q
U{ⷰ ~{ d. a{곪{ g ~Kd}Q A ~Kd{
I~곩gd"}p갩 e{d.

So i want to know why second one didn't display in kannada language, is there problem with static pdf content And dynamic kannada web content.
I want to see pdf content also sme in html page.

Please do favour to me, thanks in advance, any reply related to this highly appreciated.:help:

Jellby
01-03-2013, 05:17 AM
2. I copied one more from kannada pdf, with keeping sme encoding type as above, but it looked like this.:rolleyes:

~Kdg }}| ~tX~Kd[{ {Xgg }pm} v
~Kdg E. d uC} {p d Iyb yⰪ. {d}Q
U{ⷰ ~{ d. a{곪{ g ~Kd}Q A ~Kd{
I~곩gd"}p갩 e{d.

So i want to know why second one didn't display in kannada language, is there problem with static pdf content And dynamic kannada web content.
I want to see pdf content also sme in html page.

It's probably because fonts in PDF can have very exotic encodings. Actually, PDF is a bit like those blackmail notes with cut-out letters (http://i281.photobucket.com/albums/kk224/gamezebo/000Gamezebo1/000Gamezebo2/The%20Lost%20Cases%20of%20Sherlock%20Holmes%202/LC2SH051.jpg), it simply contains information on which squiggle goes where, and it doesn't care much about encoding, as long as each squiggle looks like a letter.

anandudapudi
01-03-2013, 06:17 AM
:thanks:It's probably because fonts in PDF can have very exotic encodings. Actually, PDF is a bit like those blackmail notes with cut-out letters (http://i281.photobucket.com/albums/kk224/gamezebo/000Gamezebo1/000Gamezebo2/The%20Lost%20Cases%20of%20Sherlock%20Holmes%202/LC2SH051.jpg), it simply contains information on which squiggle goes where, and it doesn't care much about encoding, as long as each squiggle looks like a letter.

I agree with your points. but i want to know how i can copy text from pdf to HTML page exactly without this encoding hassle. i request suggest me any valid solution for this. Thanks in advance.:thanks:

DaleDe
01-03-2013, 12:45 PM
:thanks:

I agree with your points. but i want to know how i can copy text from pdf to HTML page exactly without this encoding hassle. i request suggest me any valid solution for this. Thanks in advance.:thanks:

To do what you asked above you will need to do an image and paste the image into the html. You cannot expect all the different formats to behave exactly the same so there is always some encoding hassle.

Dale

Jellby
01-03-2013, 01:33 PM
but i want to know how i can copy text from pdf to HTML page exactly without this encoding hassle.

There's no solution guaranteed to work, other than OCR, because a PDF is more concerned about its looks than about the underlying meaning. In some cases, if the PDF font uses some known/standard encoding, you can maybe copy and paste with the right settings, or do a conversion afterwards. You may be lucky, and maybe the PDF encoding is ISCII (http://en.wikipedia.org/wiki/Indian_Script_Code_for_Information_Interchange) (in that case it should be possible to find a converter), but it could be some ad-hoc encoding used only in that particular document.

anandudapudi
01-04-2013, 01:40 AM
To do what you asked above you will need to do an image and paste the image into the html. You cannot expect all the different formats to behave exactly the same so there is always some encoding hassle.

Dale

@Dale: your response is well accepted and understood, i need to juggle with encoding hassle, no qualms.:thanks:

anandudapudi
01-04-2013, 01:44 AM
There's no solution guaranteed to work, other than OCR, because a PDF is more concerned about its looks than about the underlying meaning. In some cases, if the PDF font uses some known/standard encoding, you can maybe copy and paste with the right settings, or do a conversion afterwards. You may be lucky, and maybe the PDF encoding is ISCII (http://en.wikipedia.org/wiki/Indian_Script_Code_for_Information_Interchange) (in that case it should be possible to find a converter), but it could be some ad-hoc encoding used only in that particular document.

@Jellyby: your response to is appreciated, i understood that need to cope up with pdf the way it generates text or code. no qualms. :thanks: