MobileRead Forums - View Single Post - Convert a Chinese book, markers based on punctuation caused an erranour page break

pokeba · 03-19-2016, 06:00 AM

Hi,

I have converted many Chinese books from Word files to epub with Calibre successfully. I recently encountered this issue that for a book, Calibre generates an extra page break.

The error happens in the only place where I have a short English phrase, as shown in the attached jpg file ( I have turned on "show all marks" in Word).

I checked the output of debug. In the input/index.html file, the line is correct:

*
Mee Soto
（词／曲：疏效平、李家欣）

But in the parsed/index.html, it became:
*
Mee Soto
（词／曲：疏效平、李家欣）

Note the extra "page-break-before:always" was added.

In the log file, it says:
...
Median line length is 135, calculated with html format
Looking for more split points based on punctuation, currently have 2
marked 3 section markers based on punctuation. - Mee Soto
...

So somehow Calibre thinks there is a punctuation in "Mee Soto
But I don't see it and I have spent quite a few days try to get rid of the extra page break.

I also found that if I change the "Mee Soto" to other English text, the page break will still be there. But if I change "Mee Soto" to some Chinese characters, then Calibre will not generate the extra page break.

I'd appreciate if anyone can help or point me why Calibre see a punctuation in "Mee Soto.

Thanks

03-19-2016, 06:00 AM	#1
pokeba Junior Member Posts: 2 Karma: 10 Join Date: Nov 2015 Location: USA Device: iPad	Convert a Chinese book, markers based on punctuation caused an erranour page break Hi, I have converted many Chinese books from Word files to epub with Calibre successfully. I recently encountered this issue that for a book, Calibre generates an extra page break. The error happens in the only place where I have a short English phrase, as shown in the attached jpg file ( I have turned on "show all marks" in Word). I checked the output of debug. In the input/index.html file, the line is correct: <p class="block_33"></p> <p class="block_5 text_10">Mee Soto</p> <p class="block_7">（词／曲：疏效平、李家欣）</p> But in the parsed/index.html, it became: <p class="block_33"></p> <p class="block_5 text_10" style="page-break-before:always">Mee Soto</p> <p class="block_7">（词／曲：疏效平、李家欣）</p> Note the extra "page-break-before:always" was added. In the log file, it says: ... Median line length is 135, calculated with html format Looking for more split points based on punctuation, currently have 2 marked 3 section markers based on punctuation. - Mee Soto</p> ... So somehow Calibre thinks there is a punctuation in "Mee Soto</p> But I don't see it and I have spent quite a few days try to get rid of the extra page break. I also found that if I change the "Mee Soto" to other English text, the page break will still be there. But if I change "Mee Soto" to some Chinese characters, then Calibre will not generate the extra page break. I'd appreciate if anyone can help or point me why Calibre see a punctuation in "Mee Soto</p>. Thanks Attached Thumbnails