Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Kindle Formats

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2016, 03:04 PM   #1
Kest
Member
Kest began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Feb 2016
Device: Kindle PW3 (5.6.5 JB)
Word boundaries on Japanese text

Every time I convert a Japanese EPUB to AZW3, when I try to select a word on the book, Kindle can't detect the word boundaries and instead select the whole sentence (everything surrounded by dots or commas).

This only happens with my own converted books, the Japanese ones downloaded from Amazon gets the word boundaries without problems.

I've tried converting from Calibre, from Kindlegen, and even with a working book from amazon doing AZW3->EPUB->AZW3 cycle with KindleUnpack.

I'll like to ask if anyone had experienced the same issue or if anyone had got a converted Japanese book working on the Kindle without that problem.

The books are requesting the proper Japanese dictionary and showing the text vertically. Even the language of the book is set on Japanese (checked that on Calibre and with MobiMetaEditor), so I'm really clueless if this is and old issue or I'm doing something wrong.
Kest is offline   Reply With Quote
Old 03-02-2016, 03:14 AM   #2
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,560
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Questions about ebook creation belong in the "Kindle formats" forum, to where I'm now moving this thread.
HarryT is offline   Reply With Quote
Old 03-02-2016, 03:16 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,560
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
Have you compared the coding of a book that does work correctly with one that doesn't, to see what the differences are? Perhaps the Kindle requires some sort of non-printing word separator character in order to delineate words for dictionary lookup?

Last edited by HarryT; 03-02-2016 at 04:12 AM.
HarryT is offline   Reply With Quote
Old 03-02-2016, 10:10 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,070
Karma: 6361556
Join Date: Nov 2009
Device: many
Hi,
My guess is that that the Japanese language uses a different whitespace character than English does. I found this in a google search:

Quote:
in Japanese, there is another space character, commonly called a 'full-width space'. According to my Mac's Character Viewer utility, this is U+3000 "IDEOGRAPHIC SPACE". This is (usually) what results when a user presses the space bar while typing in Japanese input mode.
So you could try a global find and replace for \u3000 to be a normal space " ", or you could try converting them to a numeric entity to see if that is really what is going on.

Code:
& # x 3 0 0 0 ;
In Sigil, you can edit your PreserveEntities Preferences in order to make these show up if that is what they are using.

Hope this helps,

KevinH
KevinH is offline   Reply With Quote
Old 03-02-2016, 04:31 PM   #5
Kest
Member
Kest began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Feb 2016
Device: Kindle PW3 (5.6.5 JB)
Japanese words usually are not separated by spaces or anything. It's true that there is a 'full-width space' when there is a space, but they are not used often, especially on books.

I'm not really sure how the Kindle detects the boundaries between Japanese words. I thought the Kindle itself with some help with dictionaries were in charge of that but i'm not sure anymore.
Kest is offline   Reply With Quote
Old 03-02-2016, 04:45 PM   #6
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,560
Karma: 93980341
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
That's why I suggested that you examine the coding of a Kindle book for which dictionary lookup does work, so you can see if there is any word delineation there.
HarryT is offline   Reply With Quote
Old 03-02-2016, 05:28 PM   #7
Kest
Member
Kest began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Feb 2016
Device: Kindle PW3 (5.6.5 JB)
Yes, sorry, I forgot to say that. I did look at the HTML code and there isn't any delimiter between words. Just plain text.

That at least on the html file extracted with KindleUnpack, but I doubt that kind of thing would be lost in that step
Kest is offline   Reply With Quote
Old 03-02-2016, 08:17 PM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,855
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by Kest View Post
Yes, sorry, I forgot to say that. I did look at the HTML code and there isn't any delimiter between words. Just plain text.

That at least on the html file extracted with KindleUnpack, but I doubt that kind of thing would be lost in that step
Not lost, but quite possibly invisible. There are a host of zero-width characters that you'd never see by "looking at" the html code.
DiapDealer is online now   Reply With Quote
Old 03-02-2016, 09:16 PM   #9
Kest
Member
Kest began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Feb 2016
Device: Kindle PW3 (5.6.5 JB)
Ok, I did an exhaustive search with a hex editor on the main HTML file extracted from a working book from Amazon. Using an UTF8/hex table in some sentences I confirmed that indeed there are no hidden characters between words.

I doubt no one noticed this problem before if this is indeed a problem, so I'll greatly appreciate any past experiences from anyone converting Japanese text to Kindle.
Kest is offline   Reply With Quote
Old 03-02-2016, 10:14 PM   #10
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,855
Karma: 207000000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Have you tried using the DumpMobiHeader python script (included with the KindleUnpack archive) to compare the headers of a commercial (working) text and one you've converted yourself?
DiapDealer is online now   Reply With Quote
Old 03-03-2016, 10:13 AM   #11
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,762
Karma: 24088559
Join Date: Dec 2010
Device: Kindle PW2
@Kest: IIRC, when Amazon introduced the Kindle in Japan they added XMDF support to KindleGen. Maybe all the books that work on your Kindle were converted from the XMDF format.
Try finding a free XMDF book on the Internet, unpack the book folder and convert it with KindleGen or create one from scratch following the instructions in Appendix D of the Kindle Publishing Guidelines.

It also couldn't hurt to also test the working Japanese books and your home-made books with the Kindle for iOS/Android/PCs apps to exclude eInk Kindle specific issues. (I'm assuming that you've installed the latest firmware and no homebrew apps that might interfere with the rendering of Japanese characters.)

Also remove all stylesheets and inline styles from your home-made books before you compile them with KindleGen to find out if this makes any difference. If it does, post the relevant stylesheets.
Doitsu is offline   Reply With Quote
Old 08-23-2016, 07:34 AM   #12
nidl
Junior Member
nidl began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Aug 2016
Device: Kindle Paperwhite
I have the same problem and still found no solution to it.

Surely the issue with the book formatting, but how to fix it or format properly I have no idea.

@Kest: did you find something else regarding this problem?
nidl is offline   Reply With Quote
Old 11-30-2022, 01:48 AM   #13
yoshinama
Junior Member
yoshinama began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2022
Device: Android
Sorry to revive this thread but I am facing the same issue and can't seem to find a solution. Has anyone managed to find a fix for this?

I can open a new thread if necroing is against the forum guidelines.
yoshinama is offline   Reply With Quote
Old 11-30-2022, 05:33 AM   #14
colinsky
Addict
colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.colinsky ought to be getting tired of karma fortunes by now.
 
colinsky's Avatar
 
Posts: 241
Karma: 3500000
Join Date: Sep 2009
Device: Sony PRS-300, PRS-T1, PRS-T3
Not sure if this will help you directly, but I have found Japanese word-boundary segmentation for dictionary lookup work very differently depending on the book format (MOBI, AZW, KFX). You might want to try converting a book to all three and see if any of them work in the way you'd like them too.
colinsky is offline   Reply With Quote
Old 11-30-2022, 07:34 PM   #15
jhowell
Grand Sorcerer
jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.jhowell ought to be getting tired of karma fortunes by now.
 
jhowell's Avatar
 
Posts: 7,155
Karma: 92500001
Join Date: Nov 2011
Location: Charlottesville, VA
Device: Kindles
I don’t think that MOBI supports Japanese.

KF8 (azw3) does but it relies on an included word boundary table (GESW records) that is generated during the publishing process. I do not think that there is any way to add that to a book that was not sold by Amazon.

I am not sure about KFX. It might work better in that format.

Update: Like KF8, KFX format also includes word boundary information, but only in published boooks.

Last edited by jhowell; 12-01-2022 at 09:38 AM. Reason: Missing ‘not’. Oops
jhowell is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Converting a Japanese Word doc to Mobi help, please ImogenRose Conversion 1 06-12-2013 01:20 PM
Need help w/very simple task: page of Word text > Kindle text I can share w/friends kearnine Conversion 1 10-17-2012 08:25 PM
Japanese Text in KT firmaware v2.0 kumaryu Kobo Reader 18 07-17-2012 01:43 AM
Displays Japanese text roquet Bookeen 5 11-07-2007 09:30 AM
Can I read Japanese text with it? ChristSchmidt Sony Reader 2 01-27-2007 11:14 AM


All times are GMT -4. The time now is 04:59 AM.


MobileRead.com is a privately owned, operated and funded community.