Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 08-17-2011, 03:58 AM   #1
mosker
Junior Member
mosker began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2011
Device: Kobo Touch
Errors with diacritic characters

I'm trying to convert some .pdf with diacritic characters but the final .epub shows many lines inserted in a random way and breaking paragraphs.

Documentation says: NO RESULTS.
http://manual.calibre-ebook.com/sear...s&area=default


Is there any help or specification before converting files with these characters?

Last edited by mosker; 08-21-2011 at 03:27 AM.
mosker is offline   Reply With Quote
Old 08-17-2011, 07:54 AM   #2
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,895
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by mosker View Post
I'm trying to convert some .pdf with diacritic characters but the final .epub shows many lines inserted in a random way and breaking paragraphs.
I don't know the answer to your question, but have you read this sticky post - Read this before Posting PDF Questions?
DoctorOhh is offline   Reply With Quote
Advert
Old 08-21-2011, 03:25 AM   #3
mosker
Junior Member
mosker began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2011
Device: Kobo Touch
yes. That FAQ is useless because it don't give information on diacritic characters neither the conversion using own fonts.

When I try to convert the PDF to EPUB, in many diacritic characters there are jumps of one or more lines. Changes in heuristic options has no effect.

Is there any information in the Calibre documentation on diacritic characters?.
mosker is offline   Reply With Quote
Old 08-21-2011, 05:01 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Your problem description isn't clear - is the diacritic character itself displayed correctly and the paragraph breaks on the character, or is the character not rendered correctly?

PDFs define diacritics in a lot of ways, Calibre handles some of the common occurrences, but taking care of some of the more obscure ones can be difficult. Beyond that support for diacritics will depend on your reading system - most reading systems don't have comprehensive fonts that cover all languages.
ldolse is offline   Reply With Quote
Old 08-21-2011, 06:35 PM   #5
mosker
Junior Member
mosker began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2011
Device: Kobo Touch
no, the diacritic character is not displayed correctly and also there are paragraph breaks on the characters.
However, I have some e-pubs files dowwloaded from internet and using diacritic characters, and I know they have been converted using Calibre.

I'm using XP and the Calibre viewer to check the result. When I decompress these files, just I see UTF-8 codification and the following CSS specification:

font-family: "Times Ext Roman", "Indic Times", "Doulos SIL", Tahoma, "Arial Unicode MS", Gentium;

Then I try:
1 - decompress the wrong converted epub
2 - change the CSS specification to include that same CSS family specification of those epub files
3 - rebuild the e-pub.

but no success. In the wrong decompress epub, the paragraphs are already broken with <p></p> at every place in where there is a diacritic character, and the next rebuild process has no effect. I suppose these specifications should be included before converting the pdf.

How one can include the own font to display diacritics?. I cannot find intructions about how to do it.

thx,

Last edited by mosker; 08-21-2011 at 06:38 PM.
mosker is offline   Reply With Quote
Advert
Old 08-21-2011, 10:38 PM   #6
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
If the character itself is not displayed correctly then Calibre doesn't support the way the diacritics are defined. Messing with the fonts won't help.

The list of diacritic characters that Calibre has support for is here:
http://bazaar.launchpad.net/~kovid/c.../preprocess.py

Search for "# Fix Accents" - no quotes - to see the relevant part of the source. If the characters you're concerned about are already covered in that list then there isn't anything to be done, you just have a set of junk pdfs.


The only thing you could do is use the search and replace wizard to replace whatever garbage is being generated with the correct character, but this is dependent on how many you need to do it for.
ldolse is offline   Reply With Quote
Old 08-26-2011, 12:35 PM   #7
mosker
Junior Member
mosker began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Aug 2011
Device: Kobo Touch
That part of the python code you cited it should cover all the characters of my texts.

What do you mean with "junk PDF"?. I don't know about the inner pdf characteristics although my files seem to be right. (As an example, here one of them: http://www.archive.org/details/Cetasikas )

I don't know if the pdf needs some inner definition already implemented before to be converted.

thanks for the help,
mosker is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I get many errors with calibre desideria Devices 4 04-14-2011 11:56 AM
Errors Caitlin Calibre 7 11-15-2010 03:48 PM
PDF to WORD/HTML conversion, "special characters and marks" errors chengyibo PDF 3 11-06-2010 12:43 AM
metadata.db errors christinerutter Calibre 20 10-06-2009 12:23 PM
Errors and Errors... uncultured Amazon Kindle 7 03-11-2009 05:11 PM


All times are GMT -4. The time now is 01:00 PM.


MobileRead.com is a privately owned, operated and funded community.