MobileRead Forums - View Single Post

jackie_w · 03-25-2019, 01:36 PM

Quote:

Originally Posted by Gorcsev

Unfortunately you are right it was discussed earlier, but as I read through that post it was not a settled down solution. As I searched the net, the non-break space is not 100% means "not-variable length space". Better solution (I tried in kepub) would be the fix length "1/4 em-space" but the device displays a □ when the "text-rendering: optimezedlegibility/speed" is used in body tag.

Interim solution a fix length "en-space" but again, it means should be touched all the books what I sideload to KOBO .

There are 3 (at least) different things to consider here when trying to "beautify" full-justified text:

Font used: The problem with using some of those 'special spaces' is that most freely available fonts don't actually contain those glyphs (e.g. U+2003, U+2004, U+2005), so you see the generic 'glyph unavailable' character instead. Did you check that? I'd be surprised if any of the built-in fonts have them.

Even if you do find a suitable font to sideload you may still suffer the 'stretchy spaces' problem. If a non-breaking-space doesn't work is there any reason to believe one of the other spaces will be any better?
Hyphenation: Hungarian isn't one of Kobo's officially supported languages so they don't supply a hyph_hu.dic hyphenation dictionary. Have you been able to install a custom one? Do you see any hyphenation in your Hungarian books?

I know it's easy to replace the Kobo default English hyphenation dictionary with a better one but I'm not sure whether there's more to it when adding a new language.
koboSpans algorithm: If you're comfortable hacking your own copy of KTE ... Rather than trying to decide "when is a sentence not a sentence", just don't split into sentences at all. Try this simple hack to greatly reduce the number of koboSpans added.

Inside the KoboTouchExtended.zip file find the container.py file, change lines 540-544 from
Code:
```
                groups = re.split(
                    r'(.*?[\.\!\?\:][\'"\u201d\u2019“…]?\s*)',
                    text,
                    flags=re.UNICODE | re.MULTILINE,
                )
```
to
Code:
```
                groups = [text]
```
This is the first hack I was referring to in post #60. It has the benefit of being language independent, but I can't promise there won't be unintended consequences with annotations and bookmarking. You'll need to experiment.

I know you would prefer to use kepub but did you actually test whether all the problems disappear if you use epub?

Is it possible you could post a 1-page test epub containing the kind of HTML you see when you buy a retail Hungarian book with a lot of dialogue? I'm having difficulty understanding whether your problems are with the original epub or with a manually edited version. There's not much likelihood of successfully hacking KTE to auto-edit books on-the fly if no-one knows what the original HTML looks like.