Wake Words Revisited in the Cloud
Sensory has always had a forte in wake words. We first developed what we called “voice triggers” back in the early 2000’s as a way for Hallmark to introduce stories with plush pets that would interact as you spoke certain words (see Interactive Story Buddies).
Sensory was able to get the accuracy up and the power consumption down and introduced wake words to the mobile phone and tablet vendors…Hey Siri to Apple, Hey Cortana to Microsoft, and OK Google to Google. Sensory made the first “Hi Galaxy” for Samsung’s phones, and Sensory was the “voice” in MotoVoice and the handsfree MotoX. We even helped Amazon with low power wake words for their first handsfree Alexa tablets.
Sensory has always had the best performing on device wake words and we have beat out many others in statistically significant competitive shootouts by independent firms. (Just ask and we can provide detailed testing reports).
Many big companies were able to get by with good (not great) on device performance by having the audio go to the cloud for a secondary review. This happens for example when your Echo lights up, but no response is given. The Echo thinks it heard the wake word but upon reviewing what you said in the cloud, it decides you did not intend to talk to Alexa.
This process compromises security and privacy by sending your data off at random times. One of the bad offenders is my Android phone which seems to start listening whenever I talk about Google, because it thinks I said “Hey Google” but Google does a reasonable job in the cloud of “revalidating” what was intended. For many cloud implementations the revalidation includes listening to personally identifiable speech before and after the perceived wake word, once again at the cost of privacy!
Sensory has a new approach to using the cloud and it’s a super accurate approach that doesn’t compromise on privacy. Sensory doesn’t need to take pre and post wake word private data to make it more accurate. And the results of Sensory’s initial wake word revalidation tests are nothing short of amazing. We tested two models each using “Hey” followed by a two and three syllable word (meaning three and four syllable words were tested)
The improvements in accuracy were substantial and can be used to decrease false accepts (when the device mistakenly hears the wake word) or false rejects (when the device doesn’t recognize the correct word). In initial testing, with cloud revalidation, Sensory held the False Reject (FR) rate constant and at a low rate of less than 10% FR (1 out of 10 times the correct wake word did not illicit a response across a variety of noise backgrounds and speaker voices).
Cloud revalidation substantially reduced the False Accept (FA) rate from a rate of a few false accepts per week to a few false accepts per month! We subsequently held the FA constant and reduced FR via cloud revalidation and found similar results. Cloud techniques can also be tuned reduce both False Accept and False Rejects simultaneously.