Linux user's could use something like this:
https://github.com/mozilla/TTS
Ideally you'd run a docker with an imported model already trained and then have the calibre plugin interface with that.
Google's TTS is rather expensive at $16/1m characters, the first 1m/month is free though, perhaps an implementation to that would be worthwhile as well.