MobileRead Forums - View Single Post - [GUI Plugin] TTS to MP3: Create MP3 audiobook using Windows TTS

jackie_w · 10-23-2020, 02:43 PM

Quote:

Originally Posted by kovidgoyal

You dont actually need events, just ignore that.

call

speak('some text')

to speak it out loud.

WAV and MP3 creation is now working OK.

I do still have 3 (hopefully) small problems, though...

When testing a voice I can start a speak() event but I can't stop it mid-flow. I think this may be because the original speak method took 2 args, i.e. say('sometext', flag) where flag was
- constants.SVSFDefault - when wanting to speak the whole text, i.e. create_recording_wav()
- constants.SVSFlagsAsync - for ad-hoc speech which was interruptible by pause(), i.e. the voice tester.
I see your winsapi.cpp contains 2 items, SPF_DEFAULT and SPF_ASYNC which may be the 2 equivalent constants (but I could be wrong)? The latter doesn't appear to be used anywhere.

Perhaps the flag in speak() could default to using flag=SPF_ASYNC and create_recording_wav() is OK as-is using SPF_DEFAULT.
When the voice tester starts a speak() event and is allowed to run to completion, i.e. stop/pause button not pressed. To reproduce the existing plugin I need to be able to detect programmatically when the speech stream ends so that various widget states can be updated accordingly.

I think the old event stuff I mentioned was handling this. How do I go about reproducing this with the new ISpVoice?
Hopefully this may not be a problem at all but I can't test it with the current 64-bit beta.

This plugin has had a problem since initial release when running on calibre 32-bit. I think it was a problem with the underlying win32xxx module and I did add some code so it failed gracefully. It occurred when trying to get a list of all voices.

Are you able to tell me what happens when the ISpVoice.get_all_voices() method is run in calibre 32-bit? I'd be happy to do the testing myself but I'd need a 32-bit beta which I'm aware is even more work for you.

What I think I'd like to happen, if possible, is, if the underlying Windows code is still broken, rather than return an error or an empty list, return a list containing 1 dictionary (the usual keys) for the system's default voice.