Chris Maury on voice-first design
The O’Reilly Design Podcast: Designing conversational experiences.
In this week’s Design Podcast, I sit down with designer Chris Maury. Maury is the founder of Conversant Labs, working on projects intended to help improve the lives of the blind. We talk about designing for the blind (as he loses his sight), how chatbots might just make us better listeners, and principles for designing the best conversational UIs.
Here are a few highlights from our conversation:
Designing for voice
I found out that I was going blind. I was diagnosed with a genetic disorder called Stargardt macular degeneration—I was going to lose my central vision and currently was losing my central vision over the course of 10 years or so. That was about four years ago.
Throughout this entire process, I started looking at the tools and the technologies that were available to the blind community to use, or the tools that I would be using to maintain my standard of living and keep being productive, and was really disappointed with the quality of those tools. I set out to try and build a better experience. That’s what got us started with Conversant Labs.
The realization that we had is the way that accessibility tools, especially for the blind, are constructed is fundamentally flawed. The core technology is called a screen reader. It takes what’s displayed visually on the screen and then it reads that aloud to you. All of the work that goes into designing and optimizing this visual experience has been, really, thrown out the door and forced into this single-dimensional audio stream for people who can’t see.
When you think of who the blind population is as a whole, the vast majority of people who are losing their vision are losing their vision from aging related disorders. They have trouble with email, let alone trying to navigate an email inbox with a keyboard and moving this cursor that’s then reading each item out individually. Rather than follow this model, we thought: what if we built applications for audio first? What would that look like? How would you interact with that? We got to this point of voice and conversation being the best way to do that. It’s a much more natural experience. You talk to the product and the product speaks back to you. That’s where the name of the company comes from, Conversant Labs. Our goal is to create a world where everything that I can do on the smartphone that’s in my pocket, I should be able to do non visually or without taking that phone out of my pocket.
The first thing that we did moving down this path was build a fully conversational shopping application and a partnership with Target, called SayShopping. It allows the blind and visually impaired, really anyone, to search for products, to get reviews, and compare those products, and then purchase them all with your voice. It was the first app that ever allowed you to buy something with your voice. It was received really well, and we learned a lot about what it takes to build a voice-first, and a primarily non-visual, application and the different design challenges there. We’ve taken those learnings and are trying to make that available to everyone else through developer tools. We’re releasing an SDK for the Apple products so you can add voice to existing applications.
We’re also looking at design tools so that a designer can go in and build out a very rough version of the conversational interaction and then be able to sit down with a user and have them go through it, and then test it. How does someone express what you want? For a search, for example, how does someone express searching for a product. Are you going to say, “Search for a toothbrush,” or, “I’m looking for a toothbrush,” or, “I need a toothbrush, or, “I ran out of toilet paper”—all of these different ways of expressing this idea of wanting to search for something that you might not necessarily think of and being able to get to those at the design phase, rather than once the voice app is out in the wild.
Deeper insights into designs, problems, and tools
It’s definitely given me something to focus on. When I was in the Bay area, I knew I wanted to start a company and I would have all these side projects working on things, but I could never focus on something for more a month or two before wanting to switch to the new, shiny idea that I had. It’s definitely helped me to focus. In terms of the tools that I use, it’s been a mixed bag. I do use the screen reader for everyday, and I use the free open source one called NVDA, with a mouse. I have a giant monitor with the color inverted and the font blown up pretty big. Then I use the mouse, and whatever the mouse highlights, that text is spoken aloud. It works pretty well.
I think I’m as productive as I was before. I do a lot less mindless web browsing, than I did before, because it’s harder to navigate individual web pages, but when I do find something, I read so much more now because I’ve trained myself to listen to books at a much faster rate, because those books are being read to me using text to speech. I can listen to those at 650 words a minute.
It’s one of these things… it’s helped me to read a lot, but I think the realizations that I’ve had—and going back to this question of how it’s changed the way that I think about designs and problems and tools—is yes, it’s a disability in a sense, but in a broader sense, it’s just a different way of consuming information, and there are pluses and minuses to that different context of interaction with a product. Yes, someone who has a visual impairment can’t see the screen, and they’re going to have to interact with that audibly. What impact does that have? Can you create the same level of intuition and experience and create as efficient an experience through audio as you did visually? I think when you think about things like being able to listen to something at a much faster rate, you can potentially have experiences that are even more efficient or more productive than what you would consider the normal or standard experience.
Lessons learned in voice design
The user should always know where they are. Being able to say, what can I say or where am I, or what can I do, and those types of questions are really important. The other is talk. Have the app speak to the user as little as possible, because the more it’s speaking, it’s latency and the experience are loading times for a page. It just makes the app feel less responsive. The user should always be able to interrupt the app when it’s speaking because, again, waiting for it to finish talking is an unresponsive application. Present only the most important information first and then allow the user to ask for more detailed information. In the shopping example, just saying the title, the price, the star rating, and then a brief description and allowing them to prompt for reviews or product specifications or more detailed description and things like that.