Quote:
Originally Posted by chrisridd
What does the underlying HTML actually look like in the cases of strange spacing?
|
I don't think there's a simple answer to that. Someone feel free to correct me if the following isn't right.
The KoboTouchExtended plugin's format-shift from epub to kepub 'fragments' the HTML text content and wraps each fragment in a Kobo span, e.g <span class="koboSpan" id="kobo.2.1">The cat sat on the mat.</span>.
Understandably KTE tries to match the algorithm Kobo themself use, which appears to be, basically, a simplistic attempt to fragment into sentences. Unfortunately there are many ways for this algorithm to create over-aggressive fragmentation. For example, the Kobo-style algorithm would result in the single sentence (
admittedly contrived 
)
Code:
<p>“It’s 1:05 p.m. on Friday ... already too late!”</p>
becoming
Code:
<p><span class="koboSpan" id="kobo.1.1">“It’s 1:</span><span class="koboSpan" id="kobo.1.2">05 p.</span><span class="koboSpan" id="kobo.1.3">m. </span><span class="koboSpan" id="kobo.1.4">on Friday .</span><span class="koboSpan" id="kobo.1.5">.</span><span class="koboSpan" id="kobo.1.6">. </span><span class="koboSpan" id="kobo.1.7">already too late!”</span></p>
It has been split into 7 fragments where 1 would probably be sufficient. If a couple of words in the original sentence had had italic tags around them the fragmentation would have been even worse.
I don't think this aggressive fragmentation helps the kepub renderer's attempts at attractive full-justification. As GeoffR demo'd, it seems to be even less attractive when optimizeLegibility is enabled. In addition, the larger your preferred font-size, the worse it's likely to get. Standard epubs don't have any of these koboSpans getting in the way.
Just as an exercise (
entirely non-scientific) I hacked my copy of KTE to try a simple, less aggressive fragmentation algorithm. You can see the results of a single page below. Kobo-algorithm on the left, less aggressive algorithm on the right. optimizeLegibility is enabled on both of them.
As you can see there are 3 lines (first, last, 8th-from-bottom) where the letter-spacing of the first word is no longer quite so odd. A sample of one page doesn't prove anything, of course. In addition I have no idea whether it would have a really bad effect on other things, e.g. annotations, bookmarks, text selection etc.