MobileRead Forums - View Single Post

kovidgoyal · 08-06-2022, 11:24 PM

There's no way to bypass it. Tokenization of text into words id done at indexing time, and once done its done. calibre uses the ICU library to do this tokenization and that uses language sensitive rules, for a number of languages including european ones.

08-06-2022, 11:24 PM	#2
kovidgoyal creator of calibre Posts: 46,067 Karma: 29579912 Join Date: Oct 2006 Location: Mumbai, India Device: Various	There's no way to bypass it. Tokenization of text into words id done at indexing time, and once done its done. calibre uses the ICU library to do this tokenization and that uses language sensitive rules, for a number of languages including european ones.