Your complaint is EXTREMELY COMMON with PRIMITIVE synthesis like IVONA. Yes, it's the SAME GIRL who reads Alexa. She sounds SO PRETTY, but there are BAD NEWS. Turns out Ivona is RETARDED, as she never understood PROSODY. Probably because it was never MARKED EXPLICITLY in books like I'm doing it RIGHT NOW.
There are far more clever TTS systems, the recent ones are Polly and Tacotron. Tacotron is opensource, including (reasonably useful) pretrained models, I think Polly is cloud only or something. TTS on pocketbook are pluggable, that is each installed voice provides their own libttsengine.so exposing ABI of
https://github.com/blchinezu/pocketb...de/ttsengine.h
The reader then calls that when reading with that installed voice. If you were to go and implement state of the art TTS like Tacotron, you'd need to implement this wrapper library to glue it together. Currently, the wavenet synthesizer is research grade (you need to run whole tensorflow to evaluate the model), so I'm not sure PB would have enough horsepower to run it.