Quote:
Originally Posted by salamanderjuice
Code:
description = "Laura clamly reads a book to her child. The recording is of very high quality, with the speaker's voice sounding clear and very close up."
The description is specifying a specific voice and not much else.
|
How do you clamly read?
Quote:
Originally Posted by salamanderjuice
I've heard much worse from real people.
|
And so have I. OTOH, professional voice actors definitely do a better job compared to AI.
One author I know was looking into using AI for audiobooks with a lower cost than hiring professionals. What was found was that to get a decent result from AI, it was pretty much required to rewrite the book to add the pitch, tone, pace, etc. indicator elements that a voice actor did not need. Then there was the issue with having invented words pronounced correctly. For humans, they were told the pronunciation. For AI, the entire book was converted to use the IPA though most of that work was automated and about 5% of the words needed to be manually converted (basically look for words that had * around them to say they weren't found in the dictionary). It turned out to be a small group of words so the manual work wasn't that bad, mostly character and place names.
They also looked at using ssml and pronunciation lexicons but that was a lot more work since it required multiple spans and you need to add the ipa bits to the header though it did give an ebook that could be read in English.
<p><span ssml:alphabet="ipa" ssml

h="ðə ˈkʌrᵊnt steɪt ɒv ði ɑːt ɪz nɒt ʌp tuː ðæt ˈlɛvᵊl ɒv səˌfɪstɪˈkeɪʃᵊn.">The current state of the art is not up to that level of sophistication.</span></p>
An sample paragraph using ssml and IPA.