Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2011, 05:30 AM   #1
marekgregor
Junior Member
marekgregor began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jan 2011
Device: none
PDF-> EPUB conversion splits paragraph

PDF->EPUB conversion incorrectly splits paragraph where the line ends with character containing diacritics. Look into debuging folder shows that input/index.html contains paragraph:

„Já jsem ráda, žes to neudělal, ale jsem ti vděčná, žes mě<br>
před ním chránil.“<br>

which is processed in parsed/index.html as:

<p>„Já jsem ráda, žes to neudělal, ale jsem ti vděčná, žes mě</p>
<p>před ním chránil.“</p>

what is wrong because it creates two paragraphs from one because of character ě.

Do you know how can I fix paragraph splitting to handle also diacritics.

thanks
marekgregor is offline   Reply With Quote
Old 01-12-2011, 10:04 AM   #2
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
PDF conversion currently relies on detecting lower-case characters on the initial line. Unfortunately there isn't any library which defines what those lower-case characters are across all languages. ě wasn't in the list of characters, but it will be added for for one of the upcoming releases.

At some point in the future a new pdf engine will come out which uses other types of tests to decide when to unwrap a line, but for now the code is sticking with lowercase characters without punctuation.
ldolse is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help: pdf to epub conversion in Calibre splits paragraphs leday Calibre 13 09-15-2013 02:10 PM
PDF to EPUB Conversion LuchoResto General Discussions 1 11-19-2010 04:54 PM
PDF to EPUB - spurious paragraph breaks RichieTheK Calibre 2 09-08-2010 11:27 AM
TXT conversion to ePub or LRF - paragraph formatting Zapped Calibre 6 10-23-2009 05:06 PM


All times are GMT -4. The time now is 07:31 PM.


MobileRead.com is a privately owned, operated and funded community.