@ezdiy Listening to the audio samples and especially the failures of Tacotron2, I do realize that there is rather a lot of room for improvement!
It seems
NVIDIA published a tacotron2 version without wavenet. Would it be possible to couple it to a less computationally intensive synthesizer? They apparently have
tensor cores dedicated to their Waveglow synthesizer. So it seems unfeasible to try and implement that on the pocketbook. Another possibility could be
Mamah's implementation.
What about
Polly? It seems Amazon asks for a subscription fee to use that. Are there ways around that? Or alternatives? How about using your own NAS as a server for the sound processing?
@Tarana Good to hear that there is a reasonably small learning curve. Too bad fantasy is harder. I already have problems with understanding names in real life (no context, just an unintelligible sound), so I'll probably never understand the TTS system.