Amazon Alexa and Echo Chief Evangelist on the Future of Voice Assistants

Voice should be about speed, reducing effort and telling stories

Add bookmark

Kindra Cooper
08/14/2019

Few companies can truly claim to offer an omnichannel customer experience. It’s a nebulous concept espoused by thought leaders as the gold standard to which all businesses should aspire. For a relatable, real-life example of what omnichannel looks like, look no further than Amazon Alexa.

What started as a device – a voice assistant – has evolved into a digital marketplace for brands and developers to build and share skills, turning Alexa’s developer platform into a testing ground for voice assistant machine learning, where every interaction across every device and Alexa Skill enriches its deep learning capabilities.

Secondly, the device is personalized and synced up with your purchase history and credentials on Amazon.com. Finally, Alexa has evolved into an operating system of sorts, licensed by third-party consumer electronics manufacturers to turn their smart home appliances, cameras, computers, headphones and even cars into voice-enabled devices.

At the VOICE Summit in Newark, Dave Isbitski, chief evangelist for Amazon’s Alexa and Echo discussed what’s next for voice.

1. Voice is about speed and reducing friction

As brands start to explore voice as the next self-service interface, they’re recognizing its power to reduce friction for transactions, deliver a personal touch and empower their customers to get things done faster.

“If you’re going to create a conversation with your customer and you already have good engagement on your mobile app, don’t just create an Alexa skill because you can,” said Isbitski. “You want to create something that’s going to be faster to do.”

Browsing and general information queries are not suitable for voice, he warned. If you’ve ever asked “Alexa, what is…?” and listened to her dryly recite a Wikipedia entry, you understand the futility of it.

Also, when a user searches for a product, there needs to be a follow-up conversation pathway for the user to complete the purchase, such as having links to relevant products sent to them via email or SMS, or requesting a price-comparison with other e-commerce sites.

“You have the ability to switch between all these different modalities, but you should base it on what is the fastest way you can get that information,” Isbitski added.

2. The focus should always be on completing tasks

Most of the tasks handled by voice assistants can be done on mobile – it’s simply a faster experience with voice. Logging medication intake, reordering milk, activating or deactivating a smart lock, transferring money and tracking calories are all activities that can be done with one simple command.

However, the problem is that in order to do each of these tasks, the user still has to specify which Alexa skill to use. Ideally, said Isbitski, the AI would get smarter over time where it can remember which music streaming service you intend to use when you say, “Alexa, play classical music,” or that your preferred banking service is Capital One after using voice to wire money.

3. Making branded experiences and conversations

The main issue on engineer’s minds is teaching voice assistants to understand intent. Traditionally, conversation pathways were built like corporate decision trees with “a lot of manual labeling and effort,” and the assistant could only understand predictable human responses.

For instance, if it asked a user if they wanted order a cup of coffee, and they responded, “Coffee keeps me awake,” it wouldn’t occur to the assistant to consider the time of day.

Today’s conversation pathways are multi-turn and complex, but still haven’t reached the level of real human conversations, which Ibitski describes in AI terms as “multi-session and ambiguous.”

At the re:Mars conference in Las Vegas this June, Amazon unveiled Alexa Conversations, a deep learning-based way to make Alexa skills with multi-turn dialogue that can interconnect with other Alexa skills on the platform, thereby shrinking the lines of code it takes to create a voice app down from 5,500 to 1,700 lines of code.

4. Turning voice experiences into interactive stories

At VOICE, Isbitski announced the launch of the Amazon Skill Flow Builder, which enables brands to create story-based game skills faster, including interactive fiction, branching narratives and role-playing games. The interface is designed for content creators who don’t code, enabling them to design story trees structured as a series of connected scenes. Writers work directly with the editor and can test their work directly instead of having to go through IT.