View Single Post
Old 11-13-2021, 12:52 PM   #8
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by Quoth View Post
There is such a disconnect written & pronounced and so many exceptions to rules that really natural text to speech needs a separate file.
Just like grammarchecking, you need a completely different level of parsing to break down words.

Language also changes over time, and new spellings/usages/accents/pronunciations constantly come into play.

Take this example:
  • The bow bowed back, then I shot across the bow. In awe, the servants bowed before me.

1 = bow, as in bow and arrow
2 = bowed, as in bending
3 = bow, as in a warning shot
4 = bowed, as in kneeling + lowering head

The first 2 are said with 'b' + "OH" sound.

The next 2 are said with 'b' + "OW" sound.

Another good example is:
  • The colonels popped kernels of popcorn in the microwave.

Both words are spoken exactly the same (in current-day English), but that's not how it always was.

For more information on this, I recommend the fantastic podcast, "Lexicon Valley" by John McWhorter.

Here's a few episodes covering:

Side Note: Just a few months ago, McWhorter handed the podcast off to two other people (so now the original podcast has confusingly been name-changed to "Spectacular Vernacular").

But you can find him at the new "Lexicon Valley":

https://www.booksmartstudios.org/s/lexicon-valley

Here's the first episode from the new version:

where he explains where the heck "bee" in "spelling bee" comes from. (And other fascinating stuff.)

Quote:
Originally Posted by Quoth View Post
But now is an actual audio book better than that and simply doing NOTHING to the source text and leaving it up to a best effort speech engine better than CSS speech extensions or SSML rules?

And people not visually impaired now use audio books which was not the case 1899 to 1979.
The better TTS engines/networks get, the better these things can do with plaintext input. (Toss some samples into Google's Cloud Text-to-Speech and see how it sounds.)

The fantastic thing about Text-to-Speech is you don't need a human middleman to read the stuff.

99%+ of written text wouldn't be accessible to the blind—think bills/letters/flyers/boxes/cans + dynamically generated content (phone numbers, addresses, dates, names, $ amounts, auto-translated text).

And many times, there's very personal information inside—think texts between spouses or emails between friends. (Are blind people supposed to have zero privacy?)

One of the best talks I ever saw on this topic was from 2013:

Definitely give it a listen.

Side Note: Personally, a lot of the journals/books I read are so obscure that there would never be a market for human-read audiobook versions. But with Text-to-Speech, I can listen to anything/everything while I work.

A "90% good" TTS version of the ebook is 100% better than 0% human-read.

And if you compare the quality of Android/Google's TTS vs. the robotic crap on Windows, it's pretty close to a human reading to me (besides wrongly pronouncing odd names, obscure words, and "bow" vs. "bow").

That high-quality, bleeding-edge TTS will trickle its way down into the OSes themselves, and if we stop back in another 10 years, you'll see all that breathing+mood+other enhancements make their way down to the free version sitting right inside your pocket.

And those that create ebooks can do their best to take reasonable measures with markup... like marking the proper language so "tacos" (English) + "tacos" (Spanish) can be pronounced correctly (at some near-future date!). That would be infinitely more helpful than manually trying to insert CSS Speech + you can actually benefit from language markup now.

Last edited by Tex2002ans; 11-13-2021 at 01:31 PM.
Tex2002ans is offline   Reply With Quote