Using Voice Assistants for Self-Service: A Vision for the Future

Kindra Cooper

Voice assistants

Web-based self-service capabilities have made huge strides in allowing customers to autonomously handle routine tasks like initiating wire transfers, changing account details and buying items online.

However, completing a wire transfer through a mobile app or website still takes about 10-15 steps, from login to account overview to inputting the destination bank and transfer date – interactions that are sequential and “sterile.”

A voice assistant, on the other hand, can complete the transaction without the intervening steps by saving your preferences and personalizing the experience accordingly.

CEO of PindropTherefore, when you say, “Wire $10,000 to my bank in England,” the device already knows to and from which account based on your previous tendencies, allowing you to get things done “at the speed of thought,” says Vijay Balasubramaniyan, CEO and co-founder of Pindrop, an authentication software provider.

This is what makes voice assistants so promising as the next user interface for self-service.

“Just imagine if you can do everything on the go without having to stop – for instance, while I’m driving I can just bark out a bunch of interactions and they get done.”

Given that the company just dropped a new voice authentication platform specific to voice-activated IoT devices, Balasubramaniyan envisions a future where voice assistants become what he calls “enterprise assistants” capable of not only playing Taylor Swift's music on command but buying 20 shares of Cisco stock or checking the status of an insurance claim.

Currently, most voice assistant skills and functions are for simple Q&A-type interactions, essentially delivering FAQs in audio format. Skills are used for a narrow purpose like controlling smart home devices or making brand-specific purchases, such as Domino’s Dom, a version of Siri developed by Nuance to enable voice ordering for the pizza chain.

Voice assistants for self service - why aren’t we there yet?

Most current phone IVRs fail spectacularly at self-service for two reasons: 1) They aren’t hardwired for the right use cases; and 2) Natural language processing doesn’t recognize enough words to act on a command. On the other hand, NLP error rates are much lower for tech giants focused on developing voice assistants.

Amazon Alexa

“Right now, Amazon, Google and Microsoft are at a 95 percent accuracy or 5 percent word error rate,” says Balasubramaniyan. “If I say 20 words, the system can identify 19 of those words and guess the 20th word with extremely high accuracy.”

Read more: Holland America Line on Embracing Emerging Tech While Delivering a Human Touch 

However, even voice assistants marketed as “conversational” assistants don’t have the domain-independent NLP capabilities to conduct conversations on topics outside of the knowledge base, or react to non-linear commands.

Voice assistants“The issue with [assistants like] Siri is you can ask, What’s the weather in Boston and Where does my sister live but you can’t say What’s the weather where my sister lives because that’s a whole different use case,” Adam Cheyer, VP of research & development at Samsung said at a recent AI conference in New York City.

How to authenticate seamlessly while using a voice assistant

Voice won’t become the predominant interface for self-service until it provides a customer experience that’s superior to a website or mobile app. To get there, organizations should focus on building domain-specific skills addressing the most frequent caller requests, such as replacing a lost credit card.

However, before consumers can trust voice assistants to buy things and transfer money on their behalf, these devices need authentications systems beyond voice biometrics, authenticating through a companion mobile app or a spoken four-digit PIN.  

Makers of IoT devices tend to favor the latter, which Balasubramaniyan says is fundamentally unsafe and easily replicated.

“A PIN is a bad idea and a spoken four-digit PIN for everyone to hear is the worst of those ideas.”

Unveiled at CES 2019, Pindrop’s Voice Identity Platform is used for voice assistants, smart homes and connected cars and uses a proprietary Deep Voice biometrics platform. Balasubramaniyan says that completing these tasks via voice assistant can actually be more secure, not less, in addition to providing a better user experience.

Read more: Sorry, But There's No Such Thing As Work/Life Balance

“Voice is the only interface that elegantly combines content, intent and identity, allowing frictionless authentication at the same time as communicating at the speed of thought.”

Like the random number generator in the security tokens used by banks, the multi-factor authentication technology randomly generates an audio tone which is played by the voice assistant.

“We use audio tones to establish that your mobile device is close the voice assistant,” said Balasubramaniyan. “It then plays a one-time passcode and that passcode is heard by the microphone on your mobile device. Your mobile phone tells Pindrop it’s the same passcode that you sent to the  Amazon Alexa, so you know these two devices are next to each other.”

The audio tones are used in conjunction with voice biometrics, which creates a distinct voice print for each user, to authenticate who is speaking and further verify their identity by their mobile device in a “completely hands-free fashion.”