Errors with diacritic characters

mosker · 08-17-2011, 03:58 AM

I'm trying to convert some .pdf with diacritic characters but the final .epub shows many lines inserted in a random way and breaking paragraphs.

Documentation says: NO RESULTS.
http://manual.calibre-ebook.com/sear...s&area=default

Is there any help or specification before converting files with these characters?

DoctorOhh · 08-17-2011, 07:54 AM

Quote:

Originally Posted by mosker

I'm trying to convert some .pdf with diacritic characters but the final .epub shows many lines inserted in a random way and breaking paragraphs.

I don't know the answer to your question, but have you read this sticky post - Read this before Posting PDF Questions?

mosker · 08-21-2011, 03:25 AM

yes. That FAQ is useless because it don't give information on diacritic characters neither the conversion using own fonts.

When I try to convert the PDF to EPUB, in many diacritic characters there are jumps of one or more lines. Changes in heuristic options has no effect.

Is there any information in the Calibre documentation on diacritic characters?.

ldolse · 08-21-2011, 05:01 AM

Your problem description isn't clear - is the diacritic character itself displayed correctly and the paragraph breaks on the character, or is the character not rendered correctly?

PDFs define diacritics in a lot of ways, Calibre handles some of the common occurrences, but taking care of some of the more obscure ones can be difficult. Beyond that support for diacritics will depend on your reading system - most reading systems don't have comprehensive fonts that cover all languages.

mosker · 08-21-2011, 06:35 PM

no, the diacritic character is not displayed correctly and also there are paragraph breaks on the characters.
However, I have some e-pubs files dowwloaded from internet and using diacritic characters, and I know they have been converted using Calibre.

I'm using XP and the Calibre viewer to check the result. When I decompress these files, just I see UTF-8 codification and the following CSS specification:

font-family: "Times Ext Roman", "Indic Times", "Doulos SIL", Tahoma, "Arial Unicode MS", Gentium;

Then I try:
1 - decompress the wrong converted epub
2 - change the CSS specification to include that same CSS family specification of those epub files
3 - rebuild the e-pub.

but no success. In the wrong decompress epub, the paragraphs are already broken with <p></p> at every place in where there is a diacritic character, and the next rebuild process has no effect. I suppose these specifications should be included before converting the pdf.

How one can include the own font to display diacritics?. I cannot find intructions about how to do it.

thx,

ldolse · 08-21-2011, 10:38 PM

If the character itself is not displayed correctly then Calibre doesn't support the way the diacritics are defined. Messing with the fonts won't help.

The list of diacritic characters that Calibre has support for is here:
http://bazaar.launchpad.net/~kovid/c.../preprocess.py

Search for "# Fix Accents" - no quotes - to see the relevant part of the source. If the characters you're concerned about are already covered in that list then there isn't anything to be done, you just have a set of junk pdfs.

The only thing you could do is use the search and replace wizard to replace whatever garbage is being generated with the correct character, but this is dependent on how many you need to do it for.

mosker · 08-26-2011, 12:35 PM

That part of the python code you cited it should cover all the characters of my texts.

What do you mean with "junk PDF"?. I don't know about the inner pdf characteristics although my files seem to be right. (As an example, here one of them: http://www.archive.org/details/Cetasikas )

I don't know if the pdf needs some inner definition already implemented before to be converted.

thanks for the help,

08-17-2011, 03:58 AM	#1
mosker Junior Member Posts: 8 Karma: 10 Join Date: Aug 2011 Device: Kobo Touch	Errors with diacritic characters I'm trying to convert some .pdf with diacritic characters but the final .epub shows many lines inserted in a random way and breaking paragraphs. Documentation says: NO RESULTS. http://manual.calibre-ebook.com/sear...s&area=default Is there any help or specification before converting files with these characters? Last edited by mosker; 08-21-2011 at 03:27 AM.

08-21-2011, 06:35 PM	#5
mosker Junior Member Posts: 8 Karma: 10 Join Date: Aug 2011 Device: Kobo Touch	no, the diacritic character is not displayed correctly and also there are paragraph breaks on the characters. However, I have some e-pubs files dowwloaded from internet and using diacritic characters, and I know they have been converted using Calibre. I'm using XP and the Calibre viewer to check the result. When I decompress these files, just I see UTF-8 codification and the following CSS specification: font-family: "Times Ext Roman", "Indic Times", "Doulos SIL", Tahoma, "Arial Unicode MS", Gentium; Then I try: 1 - decompress the wrong converted epub 2 - change the CSS specification to include that same CSS family specification of those epub files 3 - rebuild the e-pub. but no success. In the wrong decompress epub, the paragraphs are already broken with <p></p> at every place in where there is a diacritic character, and the next rebuild process has no effect. I suppose these specifications should be included before converting the pdf. How one can include the own font to display diacritics?. I cannot find intructions about how to do it. thx, Last edited by mosker; 08-21-2011 at 06:38 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
I get many errors with calibre	desideria	Devices	4	04-14-2011 11:56 AM
Errors	Caitlin	Calibre	7	11-15-2010 03:48 PM
PDF to WORD/HTML conversion, "special characters and marks" errors	chengyibo	PDF	3	11-06-2010 12:43 AM
metadata.db errors	christinerutter	Calibre	20	10-06-2009 12:23 PM
Errors and Errors...	uncultured	Amazon Kindle	7	03-11-2009 05:11 PM

08-21-2011, 03:25 AM	#3
mosker Junior Member Posts: 8 Karma: 10 Join Date: Aug 2011 Device: Kobo Touch	yes. That FAQ is useless because it don't give information on diacritic characters neither the conversion using own fonts. When I try to convert the PDF to EPUB, in many diacritic characters there are jumps of one or more lines. Changes in heuristic options has no effect. Is there any information in the Calibre documentation on diacritic characters?.

08-21-2011, 05:01 AM	#4
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	Your problem description isn't clear - is the diacritic character itself displayed correctly and the paragraph breaks on the character, or is the character not rendered correctly? PDFs define diacritics in a lot of ways, Calibre handles some of the common occurrences, but taking care of some of the more obscure ones can be difficult. Beyond that support for diacritics will depend on your reading system - most reading systems don't have comprehensive fonts that cover all languages.

08-21-2011, 10:38 PM	#6
ldolse Wizard Posts: 1,337 Karma: 123457 Join Date: Apr 2009 Location: Malaysia Device: PRS-650, iPhone	If the character itself is not displayed correctly then Calibre doesn't support the way the diacritics are defined. Messing with the fonts won't help. The list of diacritic characters that Calibre has support for is here: http://bazaar.launchpad.net/~kovid/c.../preprocess.py Search for "# Fix Accents" - no quotes - to see the relevant part of the source. If the characters you're concerned about are already covered in that list then there isn't anything to be done, you just have a set of junk pdfs. The only thing you could do is use the search and replace wizard to replace whatever garbage is being generated with the correct character, but this is dependent on how many you need to do it for.

08-26-2011, 12:35 PM	#7
mosker Junior Member Posts: 8 Karma: 10 Join Date: Aug 2011 Device: Kobo Touch	That part of the python code you cited it should cover all the characters of my texts. What do you mean with "junk PDF"?. I don't know about the inner pdf characteristics although my files seem to be right. (As an example, here one of them: http://www.archive.org/details/Cetasikas ) I don't know if the pdf needs some inner definition already implemented before to be converted. thanks for the help,

Advert

Advert