View Single Post
Old 03-19-2016, 05:00 AM   #1
pokeba
Junior Member
pokeba began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2015
Location: USA
Device: iPad
Convert a Chinese book, markers based on punctuation caused an erranour page break

Hi,

I have converted many Chinese books from Word files to epub with Calibre successfully. I recently encountered this issue that for a book, Calibre generates an extra page break.

The error happens in the only place where I have a short English phrase, as shown in the attached jpg file ( I have turned on "show all marks" in Word).

I checked the output of debug. In the input/index.html file, the line is correct:

<p class="block_33">*</p>
<p class="block_5 text_10">Mee Soto</p>
<p class="block_7">(词/曲:疏效平、李家欣)</p>

But in the parsed/index.html, it became:
<p class="block_33">*</p>
<p class="block_5 text_10" style="page-break-before:always">Mee Soto</p>
<p class="block_7">(词/曲:疏效平、李家欣)</p>

Note the extra "page-break-before:always" was added.

In the log file, it says:
...
Median line length is 135, calculated with html format
Looking for more split points based on punctuation, currently have 2
marked 3 section markers based on punctuation. - Mee Soto</p>
...

So somehow Calibre thinks there is a punctuation in "Mee Soto</p>
But I don't see it and I have spent quite a few days try to get rid of the extra page break.

I also found that if I change the "Mee Soto" to other English text, the page break will still be there. But if I change "Mee Soto" to some Chinese characters, then Calibre will not generate the extra page break.

I'd appreciate if anyone can help or point me why Calibre see a punctuation in "Mee Soto</p>.

Thanks
Attached Thumbnails
Click image for larger version

Name:	Mee_soto_in_Word_file.JPG
Views:	198
Size:	30.0 KB
ID:	147189  
pokeba is offline   Reply With Quote