View Single Post
Old 08-01-2025, 09:03 AM   #56
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,467
Karma: 27757440
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by arjaybe View Post
Isn't text-to-speech these days using AI to read?
As somebody that has implemented a TTS system in the calibre reader, no it isnt. What happens is the text is converted to phonemes based on the language the text is in. Training data is thus a sequence of phonemes along with the waveform they generate. When you send text to the model to convert to speech that text also gets converted to phonemes before being fed to the model.

As an aside, these TTS models run on exactly the same architecture as LLMs. Indeed LLMs dont care that they are being fed tet or phonemes or pixel data or whatever, it's all just treated as sequences of bytes.
kovidgoyal is offline   Reply With Quote