Sign up to get full access to all our latest content, research, and network for everything customer contact.

Amazon Alexa and Echo Chief Evangelist on the Future of Voice Assistants

Voice should be about speed, reducing effort and telling stories

Add bookmark
Kindra Cooper
Kindra Cooper
08/14/2019

Few companies can truly claim to offer an omnichannel customer experience. It’s a nebulous concept espoused by thought leaders as the gold standard to which all businesses should aspire. For a relatable, real-life example of what omnichannel looks like, look no further than Amazon Alexa. 

What started as a device – a voice assistant – has evolved into a digital marketplace for brands and developers to build and share skills, turning Alexa’s developer platform into a testing ground for voice assistant machine learning, where every interaction across every device and Alexa Skill enriches its deep learning capabilities. 

Secondly, the device is personalized and synced up with your purchase history and credentials on Amazon.com. Finally, Alexa has evolved into an operating system of sorts, licensed by third-party consumer electronics manufacturers to turn their smart home appliances, cameras, computers, headphones and even cars into voice-enabled devices. 

At the VOICE Summit in Newark, Dave Isbitski, chief evangelist for Amazon’s Alexa and Echo discussed what’s next for voice. 

1. Voice is about speed and reducing friction 

As brands start to explore voice as the next self-service interface, they’re recognizing its power to reduce friction for transactions, deliver a personal touch and empower their customers to get things done faster. 

“If you’re going to create a conversation with your customer and you already have good engagement on your mobile app, don’t just create an Alexa skill because you can,” said Isbitski. “You want to create something that’s going to be faster to do.” 

Browsing and general information queries are not suitable for voice, he warned. If you’ve ever asked “Alexa, what is…?” and listened to her dryly recite a Wikipedia entry, you understand the futility of it.

Also, when a user searches for a product, there needs to be a follow-up conversation pathway for the user to complete the purchase, such as having links to relevant products sent to them via email or SMS, or requesting a price-comparison with other e-commerce sites. 

“You have the ability to switch between all these different modalities, but you should base it on what is the fastest way you can get that information,” Isbitski added.

2. The focus should always be on completing tasks

Most of the tasks handled by voice assistants can be done on mobile – it’s simply a faster experience with voice. Logging medication intake, reordering milk, activating or deactivating a smart lock, transferring money and tracking calories are all activities that can be done with one simple command.

However, the problem is that in order to do each of these tasks, the user still has to specify which Alexa skill to use. Ideally, said Isbitski, the AI would get smarter over time where it can remember which music streaming service you intend to use when you say, “Alexa, play classical music,” or that your preferred banking service is Capital One after using voice to wire money. 

Read more: Samsung VP of Research & Development on Voice as the Next UI

“It needs to understand the person, their preferences and the things that are happening over time,” he said. “It’s also multimodal, so whether it’s happening on my mobile device, in my car or on my TV or computer, it needs to understand that.” 

Customer data on Alexa use across a range of devices should contribute to maturing the machine learning capabilities for the platform as a whole.

3. Making branded experiences and conversations

The main issue on engineer’s minds is teaching voice assistants to understand intent. Traditionally, conversation pathways were built like corporate decision trees with “a lot of manual labeling and effort,” and the assistant could only understand predictable human responses. 

For instance, if it asked a user if they wanted order a cup of coffee, and they responded, “Coffee keeps me awake,” it wouldn’t occur to the assistant to consider the time of day. 

Today’s conversation pathways are multi-turn and complex, but still haven’t reached the level of real human conversations, which Ibitski describes in AI terms as “multi-session and ambiguous.” 

At the re:Mars conference in Las Vegas this June, Amazon unveiled Alexa Conversations, a deep learning-based way to make Alexa skills with multi-turn dialogue that can interconnect with other Alexa skills on the platform, thereby shrinking the lines of code it takes to create a voice app down from 5,500 to 1,700 lines of code. 

Read more: Google's Cathy Pearl on 4 Unexpected Use Cases for Voice

On the user side, it facilitates an omnichannel experience. Say your brand designs a skill for ordering movie tickets; you can set up Alexa to make follow-up suggestions based on the inference that someone who orders a movie ticket might also be planning for a night out. Alexa might then suggest the OpenTable skill to make dinner reservations or the Uber skill to reserve a ride.

4. Turning voice experiences into interactive stories

At VOICE, Isbitski announced the launch of the Amazon Skill Flow Builder, which enables brands to create story-based game skills faster, including interactive fiction, branching narratives and role-playing games. The interface is designed for content creators who don’t code, enabling them to design story trees structured as a series of connected scenes. Writers work directly with the editor and can test their work directly instead of having to go through IT. 


RECOMMENDED