View Single Post
Old 01-20-2020, 07:51 AM   #39
geek1011
Wizard
geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.geek1011 ought to be getting tired of karma fortunes by now.
 
Posts: 2,808
Karma: 7423683
Join Date: May 2016
Location: Ontario, Canada
Device: Kobo Mini, Aura Edition 2 v1, Clara HD
Quote:
Originally Posted by rtiangha View Post
OK, I've got a question about this change:

If kepubify 3.0 better mimics Kobo's way of doing things, will that make the span fragmentation problem better or worse (I don't know how kepubify 2.0 did it instead)? Is there a way for a future version of kepubify to do a better job in reducing span fragmentation even if it means not following Kobo's conventions in applying spans during conversion (maybe by following jackie_w's hacked KTE algorithm)?
Well, a few of the main changes cause there to be less spans. For one, kepubify doesn't split in colons anymore, as I didn't find any recent books which did it, but I found ones which didn't. Another is that kepubify doesn't swallow whitespace under various conditions anymore. It won't eat or wrap whitespace directly under the body tag (it used to eat it), and it won't eat multiple spaces between sentences or when it is all that is in a node. Also, it won't break up punctuation without a space after it (this causes a bunch of layout issues, and doesn't even match Kobo, see the test cases). In addition, it won't wrap whitespace sitting by itself, neither will it increase the paragraph counter for it (see the test cases). A few other improvements to the span code include not incrementing the counters if no spans were added to a node due to an exception, and wrapping the entire IMG tag rather than just putting a span before it. Kepubify will also wrap text in heading tags too.

Quite a few of these improvements aren't in the Calibre extension, and I think they should be implemented (e.g. whitespace behavior, not splitting on colons).

Although, one behavior I haven't managed to replicate is how Kobo sometimes also increments the paragraph counter on span and a tags. This only causes an offset of one or two under a few books, so I don't consider it a big issue.

You can test how kepubify differs from an official book (or even from a calibre-converted book) by running kobotest (it's in the repo) with an xhtml file with spans as stdin.

P.S. Read the test suite in the kepubify code, it will explain a bunch of this with examples.
geek1011 is offline   Reply With Quote