Thread: Audiobooks & AI
View Single Post
Old 05-02-2025, 04:41 PM   #26
salamanderjuice
Guru
salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.salamanderjuice ought to be getting tired of karma fortunes by now.
 
Posts: 932
Karma: 13014268
Join Date: Jul 2017
Device: Boox Nova 2
Quote:
Originally Posted by Quoth View Post
There are none that can read text properly without extra contextual commands put by humans. It's all a scam (AI). It just sounds better in demos. I was invited and used Google's bleeding edge to a book. I did just one chapter and while it sounded nice, it was a failure. You could script intonation, emotion, loudness and phonetic hints in the 1980s.

It's even worse if it's not a USA English text. In other news, AI powered self-service supermarket checkouts train humans to steal, but since it's cheaper than the Irish minimum wage including losses they don't care.
Did you listen to example I linked? I didn't even script any specific intonation, emotion or loudness or phonetic hints.

This is the entirety of what I fed parler-TTS to make that snippet:

Code:
prompt =  "A long line of light still lingered behind the purple hills upon which there stood out a few houses silhouetted against the primrose yellow of the sky. Within the nearer confines of the lawn the leafless trees looked dim and shadowy. The whirl of eddying leaves against the wall sounded ghostly, and the rattling of the vines against the side of the house suggested wintriness."

description = "Laura clamly reads a book to her child. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
The description is specifying a specific voice and not much else.

I've heard much worse from real people.
salamanderjuice is offline   Reply With Quote