01-12-2011, 05:30 AM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: none
|
PDF-> EPUB conversion splits paragraph
PDF->EPUB conversion incorrectly splits paragraph where the line ends with character containing diacritics. Look into debuging folder shows that input/index.html contains paragraph:
„Já jsem ráda, žes to neudělal, ale jsem ti vděčná, žes mě<br> před ním chránil.“<br> which is processed in parsed/index.html as: <p>„Já jsem ráda, žes to neudělal, ale jsem ti vděčná, žes mě</p> <p>před ním chránil.“</p> what is wrong because it creates two paragraphs from one because of character ě. Do you know how can I fix paragraph splitting to handle also diacritics. thanks |
01-12-2011, 10:04 AM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
PDF conversion currently relies on detecting lower-case characters on the initial line. Unfortunately there isn't any library which defines what those lower-case characters are across all languages. ě wasn't in the list of characters, but it will be added for for one of the upcoming releases.
At some point in the future a new pdf engine will come out which uses other types of tests to decide when to unwrap a line, but for now the code is sticking with lowercase characters without punctuation. |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Help: pdf to epub conversion in Calibre splits paragraphs | leday | Calibre | 13 | 09-15-2013 02:10 PM |
PDF to EPUB Conversion | LuchoResto | General Discussions | 1 | 11-19-2010 04:54 PM |
PDF to EPUB - spurious paragraph breaks | RichieTheK | Calibre | 2 | 09-08-2010 11:27 AM |
TXT conversion to ePub or LRF - paragraph formatting | Zapped | Calibre | 6 | 10-23-2009 05:06 PM |