MobileRead Forums - View Single Post - Librerator - multi-format e-reader, fork of KPV

geekmaster · 05-31-2016, 09:37 PM

Downloaded. I am tempted to emulate the opto-mechanical voice recognizer I was so fascinated with in my youth, in software. It used a bunch of glass fibers with one end glued to a glass microscope slide, all cut to different lengths to resonate at different frequencies in the voice band. Training consisted of placing photographic film under the glass slide and speaking a digit (zero to nine) to it so the free ends were exposed to light as they vibrated. Or something similar to that -- those memories are fuzzy now (maybe the light source and film positions were swapped from what I recall). The developed film negative contained a mask that blocked the recognized digit while the OTHER digit masks (with their own mask and fibers and light source) vibrated. The total light output triggered detection, with the LEAST light being the digit spoken. Or something like that. These days we could replace the fiber and film mask detectors with FFT pattern matching, and perhaps simple time-delay neural-network (TDNN) feeding a simple Markov chain detector, for more complex sound groups (i.e. multi-word commands). Of course, the chosen spoken commands must be different enough to avoid command collisions. The stuff I just described was from the 1950's -- Googles voice recognition (and CMU's for that matter) are six decades beyond that. But still, for page turn commands, the spoken equivalent of Palm Pilot "graffiti" handwriting recogition (i.e. "specially adapted, almost but not quite text characters') would be great for handicapped enhanced page turning, and more. But extrememly simple.

EDIT: It appears that there are modern equivalents to that "pre-computer" opto-mechanical voice command recognizer that I attempted to describe above.

Or perhaps emulate that early TI chip from the "Speak and Spell" days (they had inexpensive chips for both text-to-speech and for spoken-command-recognition back in the late 1970's, or maybe 1980's?). Anyway, those things did not need a database. Just simple pattern recognition, though they only accurately recognized a limited number of spoken commands that were sufficiently different enough to be detected by such a simple device.

All it takes for vocal operation of the page turn keys for apps like "Librerator" (or other KPV forks) would be a small and simple vocal command detector such as I described -- no need for something as advanced as the CMU app.

05-31-2016, 09:37 PM	#385
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	Downloaded. I am tempted to emulate the opto-mechanical voice recognizer I was so fascinated with in my youth, in software. It used a bunch of glass fibers with one end glued to a glass microscope slide, all cut to different lengths to resonate at different frequencies in the voice band. Training consisted of placing photographic film under the glass slide and speaking a digit (zero to nine) to it so the free ends were exposed to light as they vibrated. Or something similar to that -- those memories are fuzzy now (maybe the light source and film positions were swapped from what I recall). The developed film negative contained a mask that blocked the recognized digit while the OTHER digit masks (with their own mask and fibers and light source) vibrated. The total light output triggered detection, with the LEAST light being the digit spoken. Or something like that. These days we could replace the fiber and film mask detectors with FFT pattern matching, and perhaps simple time-delay neural-network (TDNN) feeding a simple Markov chain detector, for more complex sound groups (i.e. multi-word commands). Of course, the chosen spoken commands must be different enough to avoid command collisions. The stuff I just described was from the 1950's -- Googles voice recognition (and CMU's for that matter) are six decades beyond that. But still, for page turn commands, the spoken equivalent of Palm Pilot "graffiti" handwriting recogition (i.e. "specially adapted, almost but not quite text characters') would be great for handicapped enhanced page turning, and more. But extrememly simple. EDIT: It appears that there are modern equivalents to that "pre-computer" opto-mechanical voice command recognizer that I attempted to describe above. Or perhaps emulate that early TI chip from the "Speak and Spell" days (they had inexpensive chips for both text-to-speech and for spoken-command-recognition back in the late 1970's, or maybe 1980's?). Anyway, those things did not need a database. Just simple pattern recognition, though they only accurately recognized a limited number of spoken commands that were sufficiently different enough to be detected by such a simple device. All it takes for vocal operation of the page turn keys for apps like "Librerator" (or other KPV forks) would be a small and simple vocal command detector such as I described -- no need for something as advanced as the CMU app. Last edited by geekmaster; 05-31-2016 at 09:57 PM.