Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-29-2010, 10:40 AM   #1
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
Converting Sanskrit PDF to epub

I have used Calibre to convert a Sanskrit PDF document to epub format. When I open that in Stanza app on my itouch all the Sanskrit characters are rendered gibberish. I hope I have done something wrong. If any of you have tried this successfully then I would appreciate your guidance. I can read the Sanskrit PDF on my itouch using various reader apps (scrolling. sizing, etc. are bit annoying).
sriniamble is offline   Reply With Quote
Old 09-29-2010, 11:18 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Have you checked it in Calibre reader? My suspicion would be that the pdf's you are using have the characters/fonts embedded. Calibre is probably creating the correct output, but without an embedded font. Stanza probably doesn't have access to those characters, so you don't see anything. You would need to use Sigil to embed a font, and even then I'm not certain Stanza is compliant enough to use it.

Then again you could be having encoding problems. All depends on what you really mean by 'gibberish'
ldolse is offline   Reply With Quote
 
Enthusiast
Old 09-29-2010, 11:56 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
I suspect it's just images of the pages of Sanskrit, not an actual Sanskrit font. If so, this is doubly hard to fix. Images of pages have to be OCR'd, and that's hard enough even for a normal font. I have serious doubts that any OCR program will do Sanskrit. Now if it had been Linear A or Proto-Elamite, it would have been easy
Starson17 is offline   Reply With Quote
Old 09-29-2010, 11:58 AM   #4
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
Thank you very much. You are right about the font being embedded in the PDF. What I meant by gibberish was it comes up as Roman characters with diacritical marks. I will try your suggestion regarding using Caliber reader and also trying sigil.
sriniamble is offline   Reply With Quote
Old 09-29-2010, 12:19 PM   #5
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
Here is how it appears in the Calibre reader.

┤╔¤SĎa╔╔´x╔mi╔z╔i╔¤

Sorry, I cannot paste the equivalent in Sanskrit here.
sriniamble is offline   Reply With Quote
Old 09-29-2010, 01:32 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by sriniamble View Post
Here is how it appears in the Calibre reader.

┤╔¤SĎa╔╔´x╔mi╔z╔i╔¤

Sorry, I cannot paste the equivalent in Sanskrit here.
That doesn't tell us much. My best guess is that it's OCR'd crud behind images of pages with Sanskrit text. Calibre doesn't do any OCR, so it would have happened previously in the original pdf. Are you looking at images of text, or actual text? Can you select individual characters of Sanskrit text in your pdf? If so, what happens when you copy and paste that selected/copied text into something? If you can select a single word of text, try copying that text and pasting it into the search function of your pdf reader and searching for that selected word. If the pasted text looks like the accented "crud" you posted, and you can search it and find the same Sanskrit in your pdf that you selected, then it's likely to be just OCR'd crud behind images of Sanskrit text.

If none of that made sense, feel free to post it here, or PM a copy to me, and I'll take a look.
Starson17 is offline   Reply With Quote
Old 09-29-2010, 01:41 PM   #7
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
I would like to attach my PDF file to the post. Can you please let me know how to do it?.
sriniamble is offline   Reply With Quote
Old 09-29-2010, 04:56 PM   #8
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
Here is the PDF file I am using as the input.
Attached Files
File Type: pdf durgAsaptashati.san.pdf (556.3 KB, 235 views)
sriniamble is offline   Reply With Quote
Old 09-29-2010, 06:41 PM   #9
thrawn_aj
quantum mechanic
thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.
 
thrawn_aj's Avatar
 
Posts: 705
Karma: 483827
Join Date: Aug 2010
Location: NorCal
Device: Nook1, Samsung Transform, Nook2
There are subsets of Baraha fonts (BRH Devanagari Extra) embedded in the PDF. Luckily, these are free fonts. If you'd had embedded subsets that are proprietary fonts, there wouldn't have been much you could have done.

Just search for these fonts and embed them using Sigil and you should be good to go (assuming, as Idolse wrote, that Sigil is cimpliant enough to use embedded fonts). I think it's worth a shot. If you have trouble finding the fonts separately, just install Baraha (it's a free Devanagari wordprocessor that I've used for Marathi in the past ). These fonts come with it.

Note: just to be extra careful, open the pdf on your PC (or Mac ) and go to its properties (the fonts tab). Make a note of all the fonts used that are not standard (as far as I could tell, it's just the one I mentioned above). Download them all and embed them.
thrawn_aj is offline   Reply With Quote
Old 09-29-2010, 08:21 PM   #10
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Sigil will definitely let you embed fonts - there are some threads over in the Sigil sub-forum on how to do it, and there are plenty of discussions on Mobileread on how to do it. Look for "Three Men and a Boat" to see some examples. What I'm less sure of is Stanza's support for embedded fonts - google searches seem to show there is some level of support, but it's problematic. That said, I would expect Apple iBooks might be better in this respect.
ldolse is offline   Reply With Quote
Old 09-29-2010, 11:39 PM   #11
thrawn_aj
quantum mechanic
thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.
 
thrawn_aj's Avatar
 
Posts: 705
Karma: 483827
Join Date: Aug 2010
Location: NorCal
Device: Nook1, Samsung Transform, Nook2
Quote:
Originally Posted by ldolse View Post
Sigil will definitely let you embed fonts
Oh Lord, my mind's going on vacation . This is what I meant to say: "(assuming, as Idolse also noted, that Stanza (not Sigil) is compliant enough to use embedded fonts)". No idea how I ended up writing what I did

Sorry 'bout that . Yes, Sigil does support embedded fonts - just found that out a couple days ago.
thrawn_aj is offline   Reply With Quote
Old 09-30-2010, 10:00 AM   #12
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
I appreciate all the responses so far. Here is what I have done so far.

1. I generated the epub version of my PDF using Calibre.
2. When I use the Calibre reader to read the contents the generated cover page keeps the fonts and I can read the text. The body of the book is mostly gibberish.
3. In Sigil also I can read the cover page and the rest is same as in #2.
4. ibook application on my itouch is similar to #2 and #3.

I have not embedded the fonts yet. My question is how come Calibre is generating the cover page keeping the fonts in tact while the body of the book does not keep the fonts?
Attached Files
File Type: epub DurgAsaptashati - srini.epub (239.4 KB, 120 views)
sriniamble is offline   Reply With Quote
Old 09-30-2010, 10:15 AM   #13
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
The cover page uses a pdf library which turns the front page into an image. Not really acceptable for the actual book contents.

Embedding the fonts is key - it won't look like anything until you do that.
ldolse is offline   Reply With Quote
Old 09-30-2010, 03:26 PM   #14
sriniamble
Junior Member
sriniamble began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2010
Device: itouch
I was getting a bit frustrated trying to embed the fonts. I used Atlantis to generate the epub with embedded fonts. The generated ebook was rendered very well on both Calibre reader and Sigil. The Stanza and ibooks apps were unable to handle the embedded font (Sanskrit characters were all gibberish). I got in touch with Atlantis support team and they mentioned that they were able to read the generated ebook on 'Adobe Digital Editions' and 'Sony reader'. Please let me know if you have any other ideas.
Attached Files
File Type: epub durgasaptashati.test.epub (961.6 KB, 113 views)
sriniamble is offline   Reply With Quote
Old 09-30-2010, 11:01 PM   #15
thrawn_aj
quantum mechanic
thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.
 
thrawn_aj's Avatar
 
Posts: 705
Karma: 483827
Join Date: Aug 2010
Location: NorCal
Device: Nook1, Samsung Transform, Nook2
It looks like an Apple problem at this point (not being able to read embedded fonts in epub or something like that). Can you check with them (or the Stanza/ibooks devs) whether Stanza/ibooks supports embedded fonts? Nothing you can do if they don't. Also, see if you can find any other epub readers for your itouch. I'm not an Apple user so I have no idea if this is an OS problem (no fonts other than system fonts allowed) or if it's at the level of the app (i.e. Stanza and iBooks are just deficient in that regard).

Essentially, since the Atlantis team has confirmed that the fonts are embedded properly, since the Sony reader can read it, it is no longer an issue with creating the epub but being able to read it.
thrawn_aj is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
converting pdf to epub Gagan ePub 43 06-21-2013 01:42 AM
Problem with accents converting PDF to EPUB madeira Calibre 0 07-09-2010 05:15 PM
Problem converting pdf to epub smartin Calibre 3 05-02-2010 06:55 AM
Help with converting PDF to epub neilmarr Sigil 6 11-14-2009 09:26 AM
Best device for reading Sanskrit from PDF R o d Which one should I buy? 4 01-08-2009 06:30 AM


All times are GMT -4. The time now is 01:05 AM.


MobileRead.com is a privately owned, operated and funded community.