View Single Post
Old 12-02-2022, 07:58 PM   #17
colinsky
Addict
colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.
 
colinsky's Avatar
 
Posts: 240
Karma: 3500000
Join Date: Sep 2009
Device: Sony PRS-300, PRS-T1, PRS-T3
Quote:
Originally Posted by jhowell View Post
I don’t think that MOBI supports Japanese.

KF8 (azw3) does but it relies on an included word boundary table (GESW records) that is generated during the publishing process. I do not think that there is any way to add that to a book that was not sold by Amazon.

I am not sure about KFX. It might work better in that format.

Update: Like KF8, KFX format also includes word boundary information, but only in published boooks.
Thanks. Most everything I have played with has been from an original EPUB source (converted by Calibre) so I'm not sure how the behavior I am seeing is arising.

Basically: MOBI doesn't seem to attempt any word segmentation (you can select any combination of characters. AZW/KFX only let you select along some notion of word boundaries (kanji + inflection). I'll have to do some more experimentation, to see if it is deriving from a source feature (like a RUBY tag) or some other process.

Are you aware of documentation (or reverse engineering) that describes the word boundary information in KF8 or KFX?
colinsky is offline   Reply With Quote