MobileRead Forums - View Single Post - Understandability Text-to-speech

Markismus · 12-01-2019, 10:23 AM

@ezdiy Listening to the audio samples and especially the failures of Tacotron2, I do realize that there is rather a lot of room for improvement!

It seems NVIDIA published a tacotron2 version without wavenet. Would it be possible to couple it to a less computationally intensive synthesizer? They apparently have tensor cores dedicated to their Waveglow synthesizer. So it seems unfeasible to try and implement that on the pocketbook. Another possibility could be Mamah's implementation.

What about Polly? It seems Amazon asks for a subscription fee to use that. Are there ways around that? Or alternatives? How about using your own NAS as a server for the sound processing?

@Tarana Good to hear that there is a reasonably small learning curve. Too bad fantasy is harder. I already have problems with understanding names in real life (no context, just an unintelligible sound), so I'll probably never understand the TTS system.

12-01-2019, 10:23 AM	#5
Markismus Guru Posts: 897 Karma: 149877 Join Date: Jul 2013 Location: Netherlands Device: Cracked HiSenseA5ProCC, Cracked OnyxNotePro, Note5, Kobo Glo, Aura	@ezdiy Listening to the audio samples and especially the failures of Tacotron2, I do realize that there is rather a lot of room for improvement! It seems NVIDIA published a tacotron2 version without wavenet. Would it be possible to couple it to a less computationally intensive synthesizer? They apparently have tensor cores dedicated to their Waveglow synthesizer. So it seems unfeasible to try and implement that on the pocketbook. Another possibility could be Mamah's implementation. What about Polly? It seems Amazon asks for a subscription fee to use that. Are there ways around that? Or alternatives? How about using your own NAS as a server for the sound processing? @Tarana Good to hear that there is a reasonably small learning curve. Too bad fantasy is harder. I already have problems with understanding names in real life (no context, just an unintelligible sound), so I'll probably never understand the TTS system. Last edited by Markismus; 12-01-2019 at 03:23 PM.