The news is full of it lately: voice assistants. Everybody who is anybody is starting to test with it, from news websites to e-commerce. But why, and what does the future hold?

Why voice assistants?

Ever since the beginning of the age of computers we have mostly been dependent on interfaces that require a screen to interact with. Be it in the console space, but also the first personal computers and later on with tablets and smartphones. Screens do have an inherent problem and that is that complex tasks or flows are hard to structure and design. There’s even a whole profession or sector (User Experience or UX for short) dedicated to creating good user interfaces for people to use. But apart from using our hands to control things we have another big tool in our arsenal which is our voice. We can express a lot with our voice, and it’s looking like we’re rapidly moving toward a world controlled by our voice, rather than just screen interaction.

Why now?

The question that comes up is, why now? There is multiple options to answer this question but the big one is probably the vast amounts of data we can control a lot better than before. Couple that with the bigger and faster storage options and you have the perfect conditions to start learning from that data and feed it with questions uttered by voice. Having played around a bit with the Amazon Echo Dot and Alexa, it’s apparent that it’s still in the first phases of being useful.

I did a presentation about creating Alexa Skills a while ago (you can find it here) explaining how to create them and the options and limitations. What is clear though is that voice assistants have huge potential to be groundbreaking in many many ways, but that is also dependant on the following factors.

1. Ecosystem

One of those deciding factors is the ecosystem around the voice assistant. Amazon does this really well, especially with the way they are trying to get more hooked up through their AmazonBasics line. The hard thing when we look at the home space is that a lot of voice assistants are dependant of others (for example hardware manufacturers). They create the integrations necessary to hook everything up to your assistant of choice. From a perspective of a hardware manufacturer I’m curious as to what Samsung will do with Bixby because they, like Amazon, have all the other hardware (TV’s, fridges, microwaves, vacuum cleaners) to create an ecosystem of smart appliances hooked up to a particular voice assistant.

2. Dialogue, context and emotion

The hard thing to get right for all voice assistant is dialog and context. Asking a simple question like ‘How is the weather like in Los Angeles?’ is pretty easy to grasp, but progressing on that question like ‘And in San Francisco?’, is already harder and though some manufacturers are starting to crack that formula, it’s still a long way from real dialogue (or even things like small talk).

Real dialogue between 2 people also has small signs of emotion like excitement or anger, which, for people is pretty ‘easy’ to distinguish but is something that voice assistants really have a hard time getting right. A lot of people have seen the demo Google did with Duplex and how a voice assistant could do a reservation with even small little gestures like ‘hm…’ in there, these kind of small emotions make a dialogue more real.

3. Complex decision making

If we look into the application of voice assistants in for example e-commerce, we see that some businesses are already able to add simple products to your shopping cart. The hard part here is the more complex buying decisions. Say you want to buy a laptop, there are thousands of option ranging from cheap to expensive with all kinds of different configurations. In a lot of the podcasts I’ve done, the complexity is usually in this area and is also what stops the rollout of voice interfaces in it’s tracks.

Another example of why decision making is hard is in the entertainment area. Say you feel like watching something on Netflix which you haven’t seen yet, say, a comedy with a dark twist. Basing a decision based on an algorithm isn’t always correct (we have that in normal interfaces already).

This is where the interfaces and websites we use today come in handy, because you can just apply filters. Applying filters with your voice is a lot tougher, and also if you want to see what something looks like, you can’t without a screen. That’s why the companies pushing for voice dominance often also provide a hardware option that includes voice in combination with a screen. Maybe in the future we will have screens or holographic projections that we can interact with in combination with our voice, creating an even easier experience.

4. Language support

What language do you speak? Chances are the support is not there yet. At the time of writing, most major languages are supported (English, Spanish etc). But if I look at The Netherlands, the support for the Dutch language is coming to Google Home but other voice assistants are still behind. The only exception is maybe Siri, which does support Dutch. Though language support is getting better and better, the other challenge here is dialect. I recently saw a good video about this, where they tested multiple voice assistants with dialects. Most of them did a good job getting what the user was saying (even with errors), but others also failed in a dramatic fashion. The future will tell how this support will come along, but for wider adaptation, getting more and more languages is a must.

Pro’s and cons

Drawbacks and factors aside, voice assistants are a great idea. The reason for that being that the voice is a powerful tool. You can condense a lot of information in just a few sentences. In the example of an AmazonBasics Microwave, it can even help you prepare simple foods. In an example, users would heat a potato. By saying ‘Alexa, microwave one potato’ it would automatically heat it at the right wattage and time. A very basic example, and sounds trivial, but imagine trying to get every type of food in an interface on your microwave. In these situations, language and voice really shines. But there are other concerns.

Google Duplex is a project that has attracted a lot of attention lately. It showed that in a lot of cases we are as far as getting a machine to do a complete reservation, without a human noticing. They have also announced it will be rolling out in a limited public beta for Google Pixel owners. A lot of people would sign up for the idea of a voice assistant making your life easier. Arranging your life around you so you don’t need to think about it, that’s really powerful, especially for humans that are bad at multitasking. But, it does raise some concerns, which are mostly in the ethical realm. Does a voice assistant, being almost human in its interaction, identify as an assistant? Do we also really want situations like these to be handled by a voice assistant?

My opinion

My opinion is that of course, time will tell. Like everything it all depends on the rules and regulations that are going to be set in-place. What you do see is that it takes a long time to get this set up, and technology is not waiting for regulators to catch up. Next to the regulations there is also the never ending discussion about data. Who’s data is the data you are supplying to your assistant? Let’s take the case of the tech giants like Google and Amazon. If I’m not paying for all this data being available (outside of the purchase of appliances), what is the catch? It’s like the age old saying: “If you’re not paying for the product, you are the product”.

Definitely a lot of pros and cons, but what about the future? Does it even have a future or is it just a fad that will die down as the hype starts to build for something new?

The future of voice assistants

There is a lot to say for voice assistants, and they have come a long way from a few years ago. Of course it’s like a gimmick, a lot of early innovations are. For me personally, it’s still awkward to talk to a non-living organism. But if I look around me, the next generation is picking up the interface quickly. Like a sponge they just mop it up, using it for all kinds of tasks.

I think that our voice is the maybe the strongest tool we have as human beings. Just think about how many interactions we have using our voice every day in the real world. What if we could take that same tool and take it to a digital space where possibilities are almost endless? That for sure would be game-changing in every way. One thing is for sure, voice assistants are here, and they are here to stay.

If you enjoyed this blog, you will also enjoy this tutorial to creating serverless microservices.

Why Do People Think Voice Assistants are a Good Idea?