View Full Version : Calibre convert Chinese PDF to EPUB well, but not TXT and HTML


jimmyzou
03-23-2009, 04:17 PM
I know this is my 2nd post regarding EPUB converting Chinese, I think posting another is easier to understand...

These 2 days I tried using Calibre 5.0 to convert Chinese books into EPUB format. It actually works well, but only on PDFs so far...
I did not do anything to my PRS505, just use the universal flasher flashed into Chinese fonts which work perfectly with Chinese LRF I made witt Calibre.

The steps I made the EPUB are all the same with making LRF, except a little extra:
1. Set output format to EPUB - of course.
2. After click“Convert Ebooks", In "Look&Feel" option, paste code below into“Override CSS”

@font-face { font-family:"Swis721 BT"; src:url(res:///Data/fonts/tt0003m_.ttf), url(res:///tt0003m_.ttf); }

body {
font-family: "Swis721 BT", serif;
}

The Chinese PDF(not fonts embeded) books are very nicely been converted to EPUB, but TXT and HTML were convert to EPUB with all "????". Any idea?

kovidgoyal
03-23-2009, 05:03 PM
Use the encoding setting when converting txt and html to specify the character encoding the original files are in

jimmyzou
03-23-2009, 05:12 PM
Use the encoding setting when converting txt and html to specify the character encoding the original files are in

Thanks! I will try it..
Question: What should I put in "Source Encoding"? "Unicode" or "ANIS"?

Another question: Why it's missing a few letter at the right edge?

jimmyzou
03-24-2009, 09:06 AM
OK, thanks for kovidgoyal's answer, I managed to convert a TXT file successfully. I don't know what's the code name of UNICODE and ANSI, as I saved the TXT in these 2 code and put the name in the command line, it does not work.
So I tried UTF-8 txt, it's working!

But HTML, I use IE to open the TXT file, and save as UTF-8 HTML, no luck in Calbire

stewrat
02-25-2010, 01:13 AM
Hello and greetings. I've got a small issue can't work out - here are the facts.
prs600. setup to run chinese language - menu and books - works with unicode txt files to read. ( standard simplified font set )

Using calibre to load books - and want to format - as the txt files need some reformatting - they put extra spaces between lines .. whatever.

Can reformat using calibre a standard txt file in chinese ( using the input utf-8 ) - and the output, when viewed in calibre view looks good - chinese with no spacing between lines etc.. And when using calibre to view the copy on the ereader itself, it looks good there.

But ereader views all as ????? once use calibre to convert. No matter lrf or epub. Ereader views ok if leave as unicode txt file ( just format bad ). Calibre views ok if formatted on ereader.

Just want to convert using calibre to epub and put onto the 600 and be able to read it - in chinese.

Any help appreciated - read lots, don't get it so far on how to fix - thanks in advance.

stewrat
02-25-2010, 01:15 AM
yeah the sony does glare and the nook though gimpy to start with in os has a better look and feel - not related to above post - just using the characters so can put the os down :) 6.27

kovidgoyal
02-25-2010, 02:11 AM
The fonts used to render EPUb files are different from the fonts used to render TXT files. Have you replaced both sets of fonts on the reader?

stewrat
02-25-2010, 07:16 AM
Thanks for information and your help, I learn.
Given that there is a different font file on ereader for epub and txt then you'd think that has to be the problem.
Replaced 2 files on the sony prior to the problem - tt0011m_.ttf and tt003m_.ttf

Not sure if either of them is epub one.
Is one of these the one ? Or is it the 3rd font in there.
I have read solutions applied with varying degree of understanding.
Nothing that says "here is solution for applying chinese fonts to a sony reader and converting .txt and reading on the ereader"

fyi - calibre is without a doubt best package for manipulating and managing an ebook library. I'll be donating as soon as a prise my cc from my wifes purse *grin*.

kovidgoyal
02-25-2010, 03:09 PM
Both those fonts are for the LRF/TXT/RTF renderer. I think there's are a couple of threads about replacing fonts for the EPUB renderer in the sony forum. Also you can embed fonts in the EPUB. There is a thread about that in the epub forum

vho3000
07-05-2010, 05:26 PM
I know this is my 2nd post regarding EPUB converting Chinese, I think posting another is easier to understand...

These 2 days I tried using Calibre 5.0 to convert Chinese books into EPUB format. It actually works well, but only on PDFs so far...
I did not do anything to my PRS505, just use the universal flasher flashed into Chinese fonts which work perfectly with Chinese LRF I made witt Calibre.

The steps I made the EPUB are all the same with making LRF, except a little extra:
1. Set output format to EPUB - of course.
2. After click“Convert Ebooks", In "Look&Feel" option, paste code below into“Override CSS”

@font-face { font-family:"Swis721 BT"; src:url(res:///Data/fonts/tt0003m_.ttf), url(res:///tt0003m_.ttf); }
body {
font-family: "Swis721 BT", serif;
}

The Chinese PDF(not fonts embeded) books are very nicely been converted to EPUB, but TXT and HTML were convert to EPUB with all "????". Any idea?

I could not get this to work in my PRS 600 even with chinese pdf. By the way, all pdf created by Adobe Distiller are automatically having fonts embedded. I don't know how you can made pdf without embedded font.

Please let me know if you can just put the tt0003m.ttf font in both the root directory and \Data\fonts directory in the Reader. And you can get this work just like that?

Are you talking tt0011m.ttf instead of tt0003m.ttf?

Thanks.

vho3000
07-05-2010, 06:38 PM
Use the encoding setting when converting txt and html to specify the character encoding the original files are in

I tried converting Hmtl files to ePub using Calibre with the suggested encoding setting UTF-8 and extra CSS suggested cannot get them displayed correctly in PRS 600; all chinese characters are in ????????; while other characters like numbers, alphabets are displayed correctly.

Is this encoding problem or font set problem. Please advise.

vho3000
07-09-2010, 10:52 AM
Hello and greetings. I've got a small issue can't work out - here are the facts.
prs600. setup to run chinese language - menu and books - works with unicode txt files to read. ( standard simplified font set )

Using calibre to load books - and want to format - as the txt files need some reformatting - they put extra spaces between lines .. whatever.

Can reformat using calibre a standard txt file in chinese ( using the input utf-8 ) - and the output, when viewed in calibre view looks good - chinese with no spacing between lines etc.. And when using calibre to view the copy on the ereader itself, it looks good there.

But ereader views all as ????? once use calibre to convert. No matter lrf or epub. Ereader views ok if leave as unicode txt file ( just format bad ). Calibre views ok if formatted on ereader.

Just want to convert using calibre to epub and put onto the 600 and be able to read it - in chinese.

Any help appreciated - read lots, don't get it so far on how to fix - thanks in advance.

That is exactly what I am looking for in PRS 600; be able to read Chinese ePub without embedded font. Instead I got ?????? for all the Chinese characters in the ePub. The same ePub files read well in iPod Touch with Stanza installed; the same ePub files read well on Stanza Desktop and on Calibre.

I guess PRS 600 is looking for embedded fonts in ePub files and cannot find it
so the it display all ????????????. At the monent, I can read Chinese embedded pdf files, TXT and RTF.

Would anyone show us how to get on top of this problem. Thanks.

stewrat
07-18-2010, 02:48 AM
Couldn't work it out - even with advice as given in this thread from the experts - it was just beyond me - sadly.

What I did - I took the unicode, converted to rtf, edited in word using a self-built macro ( basically it just takes out Line breaks / spaces if they are not following a punctuation mark ) - then reconverted back to unicode.
ie - I wrote a damn macro to have to parse every file wanted to read in the 600 individually. Dumb huh.
Laborious, painful and without doubt a dumb way to do it - but my wife can use the 600 to read unicode now and looks passable to read.

If there is a way that works better than that, quicker, and can use epubs - then I'm all for it - but my brain not the best with this stuff, so when it gets beyond - do a, then b, then c - I give in.

vho3000
07-20-2010, 06:10 AM
Couldn't work it out - even with advice as given in this thread from the experts - it was just beyond me - sadly.

What I did - I took the unicode, converted to rtf, edited in word using a self-built macro ( basically it just takes out Line breaks / spaces if they are not following a punctuation mark ) - then reconverted back to unicode.
ie - I wrote a damn macro to have to parse every file wanted to read in the 600 individually. Dumb huh.
Laborious, painful and without doubt a dumb way to do it - but my wife can use the 600 to read unicode now and looks passable to read.

If there is a way that works better than that, quicker, and can use epubs - then I'm all for it - but my brain not the best with this stuff, so when it gets beyond - do a, then b, then c - I give in.

I get Chinese or others ePub displayed in PRS 600 eventually!

In 1.05d or 1.05f, Porkupan has already hacked the PRS 600 to read ePub using the 3 styles defined by Porkupan. Those are userStyle, userStyle.dflt, userStyle.droid. Just edit these Styles and put in whatever Chinese fonts in both the Styles and \ePub\Font folder. Do not use extra CSS in Calibre.

It is that simple! But Porkupan never explain that in any manuals. Those are designed to display Russian. For Chinese display, these all need to be replaced.

oldchum62
12-27-2013, 03:00 PM
Hi. I've been trying, without success, to convert a simplified Chinese pdf book to an fb2 format for the jetbook mini. I tried using utf8, gb18030, gb2312 and ascii character inputs in the 'look and feel' section and when that didn't work I also pasted, for each of the above, into the 'extra css' box the following:

@font-face { font-family:"Swis721 BT"; src:url(res:///Data/fonts/tt0003m_.ttf), url(res:///tt0003m_.ttf); }
body {
font-family: "Swis721 BT", serif;
}

still no luck. Please advise

Toxaris
12-27-2013, 04:02 PM
Again a resurrected thread...

Are you sure that the PDF contains characters and is not just a collection of images? If the source is PDF, usually the best results are with OCR. However, with chinese that might result in mediocre results.