Voice assistants suck (empirically)

New research from legendary usability researchers The Nielsen (previously) Norman (previously) Group finds that voice assistants are basically a hot mess that people only use because they are marginally better than nothing.

The research report is more anthropological/ethnographic, rather than statistical, and considers a small number of voice assistant users to investigate their experiences with systems like Alexa, Siri and Google Assistant.

They find that voice assistants are used primarily because it's impossible to use a device while your hands are busy (e.g. in the kitchen), or when you're busy driving, and secondarily because mobile keyboards are even worse.

Nielsen Norman researchers Raluca Budiu and Page Laubheimer start by pointing out that voice assistants are trying for the holy grail of usability: "an interface that requires zero interaction cost." Alas, the services fall far short of this.

They point out that the systems struggle to transcribe speech of non-native speakers, give inaccurate responses that are too long or too short and that interrupt their users, that they sometimes vocalize sensitive personal information inappropriately, and that they don't tie in well to other systems around them.

The history of "AI" is littered with systems that were supposed to "just work," without requiring you to learn arcane commands, but which required their users to master all kinds of arcana and constrain their interactions in many ways — and that voice assistants are no better than these extinct evolutionary dead ends.

People knew that intelligent assistants are imperfect. So, even when the assistant provided an answer, they sometimes doubted that the answer is right – not knowing if the query was correctly understood in its entirety, or the assistant only matched part of it. As one user put it, “I don't trust that Siri will give me an answer that is good for me.”

For example, when asked for a recipe, Alexa provided a “top recipe” with the option for more. But it gave no information about what “top” meant and how the recipes were selected and ordered. Were these highly rated recipes? Recipes published by a reputed blog or cooking website? People had to trust the selections and ordering that Alexa made for them, without any supporting evidence in the form of ratings or number of reviews. Especially with Alexa, where users could not see the results and just listened to a list, the issue of how the list was assembled was important to several users.

However, even phone-based assistants elicited trust issues, even though they could use the screen for supporting evidence. For example, in one of the tasks, users asked Siri to find restaurants along the way to Moss Beach. Siri did return a list of restaurants with corresponding Yelp ratings (seemingly having answered the query), but there was no map to show that the restaurants did indeed satisfy the criterion specified by the user. Accessing the map with all the restaurants was also tedious: one had to pick a restaurant and click on its map; that map showed all the restaurants selected by Siri.

Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri [Raluca Budiu and Page Laubheimer/Nielsen Norman Group]