MobileRead Forums - View Single Post

GregS · 12-06-2009, 11:39 PM

I have had a chance now to look a bit closer at voice synthesis as a whole and what I have found is profoundly disappointing.

There is not the consistency between voices that allow them to be reliably used by different engines. Proprietorial engines predominate and basic standards appear absent.

Even in terms of marking up texts SSML seems inadequate and TEI Speak better but is unsupported.

I had given up the idea having a single text marked up for reading and performance, rather an EPUB having two within it. But I cannot see SSML providing enough tools for scoring a very good performance (I may be mistaken).

MP3 seems just too bulky and besides the problem is not in this, but getting a good digital voice performance in the first place.

Other problems arise in predicting where TTS might have pronunciation troubles and words like “read” which are pronounced differently in different contexts, let alone foreign words etc.,.

Potentially all that could be corrected by having a very simple TTS engine, a lot of very good voices and transliterating the whole text into IPA (International Phonetic Alphabet).

Then two courses are open:

1) use standard dictionaries and look fro exceptions which then get listed for each publication.

2)render the whole text into IPA and hand transliterate problem words.

The text would need then to be “scored” as a performance.

Then we might have something for the future – what exists now is simply not all that better than the voice presently used on the EZReader.

12-06-2009, 11:39 PM	#31
GregS Zealot Posts: 107 Karma: 308 Join Date: Oct 2007 Location: Perth Australia Device: EZ Reader 5", Iliad	TTS further thoughts I have had a chance now to look a bit closer at voice synthesis as a whole and what I have found is profoundly disappointing. There is not the consistency between voices that allow them to be reliably used by different engines. Proprietorial engines predominate and basic standards appear absent. Even in terms of marking up texts SSML seems inadequate and TEI Speak better but is unsupported. I had given up the idea having a single text marked up for reading and performance, rather an EPUB having two within it. But I cannot see SSML providing enough tools for scoring a very good performance (I may be mistaken). MP3 seems just too bulky and besides the problem is not in this, but getting a good digital voice performance in the first place. Other problems arise in predicting where TTS might have pronunciation troubles and words like “read” which are pronounced differently in different contexts, let alone foreign words etc.,. Potentially all that could be corrected by having a very simple TTS engine, a lot of very good voices and transliterating the whole text into IPA (International Phonetic Alphabet). Then two courses are open: 1) use standard dictionaries and look fro exceptions which then get listed for each publication. 2)render the whole text into IPA and hand transliterate problem words. The text would need then to be “scored” as a performance. Then we might have something for the future – what exists now is simply not all that better than the voice presently used on the EZReader.