MobileRead Forums - View Single Post

colinsky · 12-02-2022, 07:58 PM

Quote:

Originally Posted by jhowell

I don’t think that MOBI supports Japanese.

KF8 (azw3) does but it relies on an included word boundary table (GESW records) that is generated during the publishing process. I do not think that there is any way to add that to a book that was not sold by Amazon.

I am not sure about KFX. It might work better in that format.

Update: Like KF8, KFX format also includes word boundary information, but only in published boooks.

Thanks. Most everything I have played with has been from an original EPUB source (converted by Calibre) so I'm not sure how the behavior I am seeing is arising.

Basically: MOBI doesn't seem to attempt any word segmentation (you can select any combination of characters. AZW/KFX only let you select along some notion of word boundaries (kanji + inflection). I'll have to do some more experimentation, to see if it is deriving from a source feature (like a RUBY tag) or some other process.

Are you aware of documentation (or reverse engineering) that describes the word boundary information in KF8 or KFX?