06-10-2013, 08:11 PM
Does any sort of standardized HTML mark-up exist that can supply pronunciation hints to a speech-synthesis (TTS) engine? This is mainly useful for foreign or science-fiction-y names, or for things like stuttering in dialog.
(Using FBReader on Android with the Hyperionics TTS plugin and the Ivona engine, the text “but sh-she said…” is synthesized as “but ess aich she said.”)
Or even in plain English: I’d like to coax a piece of coax cable through a small hole.
Do any of the TTS-capable EPUB readers implement a non-standard method for achieving this?
06-10-2013, 10:54 PM
The only epub reader I had which could do TTS was the Pocket EZ reader, and it was pretty horrible. As for coax as noun vs coax as verb, there are lots of other examples in English; I'm not sure how many TTS systems try to parse the sentences before speaking. It's been 25 years since I worked seriously with TTS, so I'm not up to date. Ideally, one should be able to drop into the IPA character subset to force a particular pronunciation and/or set up a lookup dictionary. Maybe an addition to Festival?
You only have to wait for EPUB3 general acceptance – it's just around the corner now, y'know:p
From IDPF epub3 overview (http://www.idpf.org/epub/30/spec/#sec-tts):
EPUB 3 provides the following text-to-speech (TTS) facilities for controlling aspects of speech synthesis, such as pronunciation, prosody and voice characteristics:
The inclusion of generic pronunciation lexicons using the W3C PLS format [PLS] enables Authors to provide pronunciation rules that apply to the entire EPUB Publication. Refer to PLS Documents [ContentDocs30] for more information.
Inline SSML Phonemes
The incorporation of SSML phonemes functionality [SSML] directly into a EPUB Content Document [ContentDocs30] enables fine-grained pronunciation control, taking precedence over default pronunciation rules and/or referenced pronunciation lexicons (as provided by the PLS format mentioned above). Refer to SSML Attributes [ContentDocs30] for more information.
CSS Speech Features
The inclusion of a select set of features from the CSS 3 Speech Module [CSS3Speech] (previously known as CSS 2.1 Aural Stylesheets [CSS2.1]) enables Authors to control further speech synthesis characteristics. Refer to CSS 3.0 Speech [ContentDocs30] for more information.
On Android, I've been impressed with CoolReader+Ivona – not that it knows the difference between coax and coax, but even so, I think the combination gives LibriVox (http://librivox.org/) a good run for its money.
Edit: Just found out that Ivona has support for SSML. Haven't found any reader that supports it, though, but it might be a fun exercise to make an epub2speech converter with SSML support.
06-18-2013, 07:33 PM
And by the same token as the paucity of EPUB3 readers, there probably aren’t many EPUB books “in the wild” with PLS dictionaries or SSML phonemics. I’d be interested to find any, actually.
06-19-2013, 12:47 PM
There is, however, a sample document at code.google.com/p/epub-samples/wiki/SamplesListing#georgia (https://code.google.com/p/epub-samples/wiki/SamplesListing#georgia) using these techniques.