|  03-01-2016, 03:04 PM | #1 | 
| Member  Posts: 10 Karma: 10 Join Date: Feb 2016 Device: Kindle PW3 (5.6.5 JB) | 
				
				Word boundaries on Japanese text
			 
			
			Every time I convert a Japanese EPUB to AZW3, when I try to select a word on the book, Kindle can't detect the word boundaries and instead select the whole sentence (everything surrounded by dots or commas). This only happens with my own converted books, the Japanese ones downloaded from Amazon gets the word boundaries without problems. I've tried converting from Calibre, from Kindlegen, and even with a working book from amazon doing AZW3->EPUB->AZW3 cycle with KindleUnpack. I'll like to ask if anyone had experienced the same issue or if anyone had got a converted Japanese book working on the Kindle without that problem. The books are requesting the proper Japanese dictionary and showing the text vertically. Even the language of the book is set on Japanese (checked that on Calibre and with MobiMetaEditor), so I'm really clueless if this is and old issue or I'm doing something wrong. | 
|   |   | 
|  03-02-2016, 03:14 AM | #2 | 
| eBook Enthusiast            Posts: 85,560 Karma: 93980341 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 | 
			
			Questions about ebook creation belong in the "Kindle formats" forum, to where I'm now moving this thread.
		 | 
|   |   | 
|  03-02-2016, 03:16 AM | #3 | 
| eBook Enthusiast            Posts: 85,560 Karma: 93980341 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 | 
			
			Have you compared the coding of a book that does work correctly with one that doesn't, to see what the differences are? Perhaps the Kindle requires some sort of non-printing word separator character in order to delineate words for dictionary lookup?
		 Last edited by HarryT; 03-02-2016 at 04:12 AM. | 
|   |   | 
|  03-02-2016, 10:10 AM | #4 | |
| Sigil Developer            Posts: 9,070 Karma: 6361556 Join Date: Nov 2009 Device: many | 
			
			Hi, My guess is that that the Japanese language uses a different whitespace character than English does. I found this in a google search: Quote: 
 Code: & # x 3 0 0 0 ; Hope this helps, KevinH | |
|   |   | 
|  03-02-2016, 04:31 PM | #5 | 
| Member  Posts: 10 Karma: 10 Join Date: Feb 2016 Device: Kindle PW3 (5.6.5 JB) | 
			
			Japanese words usually are not separated by spaces or anything. It's true that there is a 'full-width space' when there is a space, but they are not used often, especially on books. I'm not really sure how the Kindle detects the boundaries between Japanese words. I thought the Kindle itself with some help with dictionaries were in charge of that but i'm not sure anymore. | 
|   |   | 
|  03-02-2016, 04:45 PM | #6 | 
| eBook Enthusiast            Posts: 85,560 Karma: 93980341 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 | 
			
			That's why I suggested that you examine the coding of a Kindle book for which dictionary lookup does work, so you can see if there is any word delineation there.
		 | 
|   |   | 
|  03-02-2016, 05:28 PM | #7 | 
| Member  Posts: 10 Karma: 10 Join Date: Feb 2016 Device: Kindle PW3 (5.6.5 JB) | 
			
			Yes, sorry, I forgot to say that. I did look at the HTML code and there isn't any delimiter between words. Just plain text. That at least on the html file extracted with KindleUnpack, but I doubt that kind of thing would be lost in that step | 
|   |   | 
|  03-02-2016, 08:17 PM | #8 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Not lost, but quite possibly invisible. There are a host of zero-width characters that you'd never see by "looking at" the html code.
		 | 
|   |   | 
|  03-02-2016, 09:16 PM | #9 | 
| Member  Posts: 10 Karma: 10 Join Date: Feb 2016 Device: Kindle PW3 (5.6.5 JB) | 
			
			Ok, I did an exhaustive search with a hex editor on the main HTML file extracted from a working book from Amazon. Using an UTF8/hex table in some sentences I confirmed that indeed there are no hidden characters between words. I doubt no one noticed this problem before if this is indeed a problem, so I'll greatly appreciate any past experiences from anyone converting Japanese text to Kindle. | 
|   |   | 
|  03-02-2016, 10:14 PM | #10 | 
| Grand Sorcerer            Posts: 28,855 Karma: 207000000 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD | 
			
			Have you tried using the DumpMobiHeader python script (included with the KindleUnpack archive) to compare the headers of a commercial (working) text and one you've converted yourself?
		 | 
|   |   | 
|  03-03-2016, 10:13 AM | #11 | 
| Grand Sorcerer            Posts: 5,762 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | 
			
			@Kest: IIRC, when Amazon introduced the Kindle in Japan they added XMDF support to KindleGen. Maybe all the books that work on your Kindle were converted from the XMDF format. Try finding a free XMDF book on the Internet, unpack the book folder and convert it with KindleGen or create one from scratch following the instructions in Appendix D of the Kindle Publishing Guidelines. It also couldn't hurt to also test the working Japanese books and your home-made books with the Kindle for iOS/Android/PCs apps to exclude eInk Kindle specific issues. (I'm assuming that you've installed the latest firmware and no homebrew apps that might interfere with the rendering of Japanese characters.) Also remove all stylesheets and inline styles from your home-made books before you compile them with KindleGen to find out if this makes any difference. If it does, post the relevant stylesheets. | 
|   |   | 
|  08-23-2016, 07:34 AM | #12 | 
| Junior Member  Posts: 1 Karma: 10 Join Date: Aug 2016 Device: Kindle Paperwhite | 
			
			I have the same problem and still found no solution to it.  Surely the issue with the book formatting, but how to fix it or format properly I have no idea. @Kest: did you find something else regarding this problem? | 
|   |   | 
|  11-30-2022, 01:48 AM | #13 | 
| Junior Member  Posts: 1 Karma: 10 Join Date: Nov 2022 Device: Android | 
			
			Sorry to revive this thread but I am facing the same issue and can't seem to find a solution. Has anyone managed to find a fix for this?  I can open a new thread if necroing is against the forum guidelines. | 
|   |   | 
|  11-30-2022, 05:33 AM | #14 | 
| Addict            Posts: 241 Karma: 3500000 Join Date: Sep 2009 Device: Sony PRS-300, PRS-T1, PRS-T3 | 
			
			Not sure if this will help you directly, but I have found Japanese word-boundary segmentation for dictionary lookup work very differently depending on the book format (MOBI, AZW, KFX).  You might want to try converting a book to all three and see if any of them work in the way you'd like them too.
		 | 
|   |   | 
|  11-30-2022, 07:34 PM | #15 | 
| Grand Sorcerer            Posts: 7,155 Karma: 92500001 Join Date: Nov 2011 Location: Charlottesville, VA Device: Kindles | 
			
			I don’t think that MOBI supports Japanese. KF8 (azw3) does but it relies on an included word boundary table (GESW records) that is generated during the publishing process. I do not think that there is any way to add that to a book that was not sold by Amazon. I am not sure about KFX. It might work better in that format. Update: Like KF8, KFX format also includes word boundary information, but only in published boooks. Last edited by jhowell; 12-01-2022 at 09:38 AM. Reason: Missing ‘not’. Oops | 
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Converting a Japanese Word doc to Mobi help, please | ImogenRose | Conversion | 1 | 06-12-2013 01:20 PM | 
| Need help w/very simple task: page of Word text > Kindle text I can share w/friends | kearnine | Conversion | 1 | 10-17-2012 08:25 PM | 
| Japanese Text in KT firmaware v2.0 | kumaryu | Kobo Reader | 18 | 07-17-2012 01:43 AM | 
| Displays Japanese text | roquet | Bookeen | 5 | 11-07-2007 09:30 AM | 
| Can I read Japanese text with it? | ChristSchmidt | Sony Reader | 2 | 01-27-2007 11:14 AM |