With the AI boom, smart assistants and voice-enabled speakers have never been more popular. According to Voicebot.ai, more than 45 million adults in the United States have access to a smart speaker and over half of smartphone owners report using voice assistants on their mobile devices. The accuracy of technology in the growing $20 billion global voice and speech recognition market however remains limited. Anyone who has tried using Cortana, Siri or Alexa in a noisy environment can attest to the futility of the endeavor.
Inspired by a child with autism, Ken Sutton has created voice technology company Yobe that makes smart assistants better listeners.
A self-professed unconventional tech founder, Sutton had no background in technology when he set out. It all started with his friend’s son with autism who had difficulty listening to music inside a car. Sutton and his friend James did some research and discovered that it was a matter of frequency manipulation and the way the child perceives frequencies. They went to a studio with millions of dollars of equipment and began bending frequencies. Initially, the duo started with music and found a process that James’ son responded to along the way.
“And so what it came down to was how his autistic brains perceive frequencies which made it difficult and uncomfortable for him to listen to echoes and reverberations that you would have in a close environment like a car,” Sutton said about a child with autism inspired his groundbreaking technology.
Consequently, Sutton and James created sophisticated AI data processing algorithms and enhanced music in real time. They hired a lawyer, got their creation patented and founded Yobe in 2016. When the founders entered the industry, they found that the music market had bigger problems than fidelity and pivoted into voice. The company’s AI software and ability to track different types of biometrics like voice was uniquely suited to solving the ‘cocktail party problem’.
The cocktail party problem is caused by the fact that it is hard for machines and humans to pull out a particular voice when many others are speaking. Even with the commercialization of voice, there are instances where voice-driven systems don’t work all that well because of background noise or other voices. Yobe is using edge-based AI to unlock the potential of voice technologies for modern brands. It is purpose-built for live crowds and noisy environments to identify and decode human voices.
Yobe is modeled on human hearing with its signal processing techniques that substantially increase SNRs (signal-to-noise ratio) in noisy environments. Sound contains vast amounts of data which the brain processes and decodes quickly – often unconsciously. Sutton’s avant-garde software gives machines the ability to decipher emotion, intent, mood, and other biological markers for an added layer of meaning. Yobe is ensuring that voice technology meets the human standard.
“So much work in the voice recognition space is and has been done in controlled, sterile environments, which just isn’t where we as humans live, work, play, and talk. We took a different approach, and it has paid off,” Sutton said about how his innovation process yielded novel results.
Yobe has streamlined and optimized its technology for deployment in the market. The company has partnered with Boston-based PLNT Burger fast food chain to create a voice solution that allows guests to speak their order at a kiosk and pick it up in a few minutes. Not only does the self-service technology make ordering meals faster, but it also grants guests the opportunity to customize their order in their own words and have that order recognized in real-time. Sutton’s voice technology consequently enables smart assistants to forge a strong emotional connection with users.
So if you’re at a party one of these days and Alexa or Siri can hear your commands amidst the noise, you have a child with autism and Sutton’s ingenuity to thank for that. Yobe can pinpoint a voice based on biometric markers, aggressively enhance the volume and then use AI to smooth it out. As the technology permeates the market, users will no longer have to shout into headphones or experience the frustration from the in-car voice assistant’s limited hearing.