Quote:
Originally Posted by CyberPaul
@jackie_w
This is the KEPUB code:
I think it is pretty much as you described, right?
|
Yes.
Quote:
Originally Posted by CyberPaul
What I do not understand is why the algorithm is insisting on that specific sequence (Ma...)? Why not adding spaces within other words? I think it can be related to periods interpreted as single words, because usually it is a separator of words.
|
If you're asking how the kepub reading app decides where to "inject unwanted spaces" - the simple answer is I don't know, other than it must feel it's necessary to get the neatly justified right edge. Why it would think it's OK to create spaces within a word, rather than adding more space to the existing gaps between words, is anyone's guess.
If you're asking why the kepub creation algorithm chooses to fragment paragraphs the way it does - it's for koboSpan purposes. It tries to create (at least) one per sentence. However a koboSpan must only contain text, not other tags, so it has to end the old one and start a new one when it encounters inline tags such as <i>, <em>, <span>, ... etc. in the middle of a sentence. On top of that, the algorithm used to determine what will be considered 'end of sentence' is a somewhat simplistic list of punctuation characters (period, colon, ellipsis, ... etc) in a regex search.
With current tools, books using a lot of any of the following will have a lot of unnecessary (IMO) koboSpan fragmentation during kepub creation:
- 3 consecutive periods (...) instead of a single ellipsis (…)
- abbreviations, Mr. Mrs. Dr. U.S.A. U.K.
- time-related, A.M. P.M. 12:30