Quote:
Originally Posted by jhowell
I don’t think that MOBI supports Japanese.
KF8 (azw3) does but it relies on an included word boundary table (GESW records) that is generated during the publishing process. I do not think that there is any way to add that to a book that was not sold by Amazon.
I am not sure about KFX. It might work better in that format.
Update: Like KF8, KFX format also includes word boundary information, but only in published boooks.
|
Thanks. Most everything I have played with has been from an original EPUB source (converted by Calibre) so I'm not sure how the behavior I am seeing is arising.
Basically: MOBI doesn't seem to attempt any word segmentation (you can select any combination of characters. AZW/KFX only let you select along some notion of word boundaries (kanji + inflection). I'll have to do some more experimentation, to see if it is deriving from a source feature (like a RUBY tag) or some other process.
Are you aware of documentation (or reverse engineering) that describes the word boundary information in KF8 or KFX?