|03-30-2012, 02:22 PM||#1|
Join Date: Oct 2011
Device: kindle 3
Custom TextToSpeech "engine" as a way to create custom real audiobooks with texts
This is only an idea, however it would be great if someone could explain is it possible to do such thing...
As far as I know, there is no possibility of creating real audiobooks, since .aa and .aax are are not open formats protected by DRM.
Regular TTS voice is not so great.
There is possibility, to change the standard "voice" of TTS mechanism. I guess that this is not a "voice" but a special kind of program which know how to pronounce each letter in specific language.
Is it possible to write on TTS program, which will play proper mp3 file from data store according to specified in information delivered in customizable new file format, like .openaa ( if paragraph starts with "Once upon a time" play Cinderella/1.mp3, etc)
The idea is to give possibility of creating audiobooks associated with real text of whole book.
I can buy an audiobook, I can buy an ebook, but (on kindle) I can't listen conveniently audiobook while reading a book (what would be great for learning language purpose).
|03-30-2012, 05:39 PM||#2|
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
if I correctly understood what you mean, this is indeed an interesting thought.
I'll try to use an analogy, even if it doesn't fit entirely: So what you have in mind is something similar to what "subtitles" are for movies, or lyrics for songs, but "the other way around", right? Like, you are reading a chapter, and the Kindle would speak along as you read.
In what follows, I'm only addressing the technical issues that come to my mind.
First, if you're going in the "text->sound" direction, it's (at least initially) pretty hard to match what you're seeing (a page of text) to what you're hearing (a stream of words). You can probably reproduce that by simply turning on TTS somewhere in a book. I always find myself scanning the page for up to 10 seconds to even find which passage is currently being read. Even assuming the text output was perfect (audiobook quality), this problem would persist, because you have no visual clue of what is currently being read. This is less of a problem once you found the current "audio" position and are then just following along reading.
Still, it suggests that you most probably want to synchronize the text output with the audio output, and not the other way around. Example:
"This is a sample text" (2 secs audio).
This would provide the advantage of always knowing what is currently being spoken, but has two major disadvantages: It would require an enormous amount of metadata (1 entry per word), which has to be manually created (quite simply impossible, unless you have loads of $$$ to throw out of the window), and it would actually be stressing the reader. So a more reasonable version might be to associate sound chunks with paragraphs of text (pretty useful IMO) or even pages.
Another method might be to insert "synchronization marks" every x seconds. I'm not an audiobook aficionado (in fact I only ever listened to "audiobooks" by coincidence on the radio, while driving on the highway), but from what I quickly googled, the complete LOTR audiobook is about 55 hours (3300 mins) and ~ 1200 (physical) pages, so (very) roughly 3 mins/page.
I would personally think the "middle ground" is acceptable here, both in terms of "finding yourself around", i.e., synchronizing what you're reading and hearing, and in terms of "picking up where you left from". It would still mean that to prepare such a book/audiobook combination, someone would have to listen AND read for 55 hours, clicking on the "I am here" word every 30 seconds. I've done similar (not identical) tasks before, and I can assure you it's extremely tedious: you are doing an extremely dumb job, yet you must be totally concentrated.
That put aside, a method of combining the abovementioned media formats must be found. Is it an MP3 file, a MOBI file, and a (for instance) SYNC file? What if either of these files is not in sync with the other two? Is a single container file (say, .EWA -- ebook with audio --) the better choice to go? Does one have to invent such a format from the ground up, or could other existing formats (like subtitles, lyrics,...) be reused or at least be used for inspiration?
And finally: how could this be implemented, and integrated, on the Kindle at all? Would it be possible to write this once, then use everywhere (K2, K3, K5)* ? Or would one need to adapt it more or less heavily to every single model?
OK, I realize I wrote quite a bit of text. The purpose was not to intimidate you, or to slay your question. On the contrary -- as said, I do find this a very interesting topic. Otherwise, I wouldn't have spent more than an hour writing this and researching some of the background. I'm only trying to realistically answer your question about the feasibility of such a project, but I don't know how experienced you are in developing software.
So to wrap it up in one line, and from my perspective: It's indeed a very ambitious project, which will need a lot of time, a lot of smart ideas, and even more dedication. Conclusion? Go for it! Come up with ideas, proof-of-concepts, alpha versions etc. There are a lot of very smart people around in this area of mobileread, so I bet that you will find some talented folks who are interested, and willing to join in and contribute.
(*) K4 not considered because it doesn't have speakers (AFAIK). I may have gotten other models wrong as well.
|03-31-2012, 09:42 AM||#3|
Join Date: Oct 2011
Device: kindle 3
I think this will be better for shorter texts, even for podcats which provide transcription.
I assume, that original TTS mechanism gets part of text and keeps it in some kind of buffer. The question is how to write program, which could pretend TTS and use this same kind of buffer.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Custom column not recognized in "Sending books to devices"||cheveguerra||Devices||4||12-05-2011 02:58 PM|
|Custom column: "Updated date", when adding new "versions" of the same file?||enriquep||Library Management||16||11-03-2011 11:46 AM|
|Sony PRS-T1 and plugboards "tags" from custom column||salines||Devices||8||10-31-2011 04:00 AM|
|Custom boot logo, "freezing" the screen||guylhem||Sony Reader Dev Corner||1||11-09-2008 12:45 PM|