MobileRead Forums - View Single Post - Telling a text-to-speech reader how to pronounce things?

Tex2002ans · 11-13-2021, 02:55 PM

Quote:

Originally Posted by Quoth

But that's exactly what people were telling me in late 1980s to late 1990s. I don't hear much evidence that it's much better than state of the art then.

Again, this is absurd. (And I think we had this conversation years ago.)

Listen to that Witcher 3 video above. It sounds near-exact to the actual actor. That isn't him speaking in the video, it's the TTS-trained-on-his-voice.

Compare to the actual voice actor in the game.

Side Note: Another cool thing, using this narrowly refined/trained TTS, is obscure words/terms/pronunciations are automatically correct as well.

Like Geralt's name is actually pronounced with the "G" sound + the accent is on the second syllable:

- geh-RALT
--- "ralt" like "salt"

Not like Gerald:

- JEH-ruld

Ciri (one of the character's names) will be spoken like:

- See-ree

not like:

- Ky-ree

"Kaer Morhen" (made-up place within the books). Well, it'll be spoken just like the game.

All the modder had to do was feed it the text, and the neural network took it from there.

Quote:

Originally Posted by Quoth

I need to figure out Linux and Android Text to Speech for my friend who is now almost blind with Macular Degeneration.

PocketBook Reader is what I use on Android to read EPUBs. You can just press the TTS button, and it'll speak using the built-in Android TTS.

On Android OS itself, you'd enable TalkBack... but that takes over the full functionality of the phone. If you want to see some of that, see the recent Techmoan video: "An app that sees for those who can’t", especially at 19:04 where he covers TalkBalk (and its iOS equivalent).

Quote:

Originally Posted by Quoth

Recognition has been slower and gone backwards, needing always on Internet.

No. Google Text-to-Speech is all on-device. No internet needed.

What happens is you may need internet + "the cloud"... if you want much more accurate speech (like you've been bringing up). But that's only because the computing power needed is enormous (sucking up a cellphone's battery for example) + the amount of data needed is staggering.

For more technical information on that, see Computerphile's fantastic video: "GPT3: An Even Bigger Language Model".

For example, GPT3 is 570 GBs of text:

"How Large Language Models Will Transform Science, Society, and AI" (Stanford University)

Can you fit that on your cellphone? Will you spend enough CPU power on your cellphone, and wait around minutes/hours, trying to generate that audio? (It sure as hell won't happen in real-time or at a speed you'd like.)

Or, you can use the Google Text-to-Speech built into Android, around 250 MBs, and get yourself 95% of the way there in real-time.

Quote:

Originally Posted by Quoth

Something using someone else's server isn't a solution. Google's AI is also dumb pattern matching using misappropriated information. They have no AI.

Please, just stop. You're beginning to embarrass yourself.

Check out some of those videos if you're interested. Perhaps take a look at the past 30 years of advancements in the field.

I even just showed you enormous strides taken within the past 5!

Quote:

Originally Posted by Quoth

Best practice is use the speech patterns and accent the author intends OR to use the local dialect? Which?
Narration is hard work and needs skill. A sample in my sig.

But the TTS is getting you "good enough".

It also won't make professionally-produced audiobooks go away, but it's tackling completely different use-cases (or many things that would never be economically viable to produce in the first place... like the rare journal articles).

TTS is a much larger category—and books are just a small subset. (Completely dwarfed by the sheer amount of non-book content like forum posts, emails, documents, etc.)