Voice-User InterfaceA war for the family room has begun, and businesses intend to win it. But winning the war may not be as easy as it seems; in fact, few businesses are ready to begin the fight.

Growing numbers of organizations are using natural language processing, the technology that underlies chatbots as well voice-enabled devices such as Amazon Echo and Google Home, to create a more natural and engaging interface with their customers and employees.

My colleague Jason Snook recently blogged about the race by businesses to create voice user interfaces (VUIs) that truly engage customers. He noted that 6.5 million home voice-enabled devices were shipped last year, and that analysts expect the number to nearly quadruple this year.

Chatbots, which are an early form of these more conversational modes of interaction, also are becoming more prevalent, and can be found on websites, mobile phones, and even on Facebook's Messenger application.

Businesses view natural language processing and the technologies that leverage it as a way to reduce barriers between them and consumers. Many also believe that by leveraging technologies developed and maintained by larger companies such as Amazon and Google, they can reduce the costs of application and website development.

Unfortunately, it isn't that easy. There are three complicating factors that businesses must contend with before they can fight effectively for a place in the consumer's family room.

1. Customer experience/user experience (CX/UX).

The screen is getting smaller - from website to phone, and phone to watch, and finally to voice, where there is no screen. This has created a perception that the need for CX/UX design is also shrinking. On the contrary, as you lose visual cues and begin to use newer technologies that people have little experience with, you need more thoughtful interaction design.

One reason is that the lack of visual cues and physical navigation reduces the consumer's time to frustration. When a user asks Siri or Alexa a question and “she" misinterprets the question, the user might repeat the question once or twice before giving up and going to the web or mobile app. Experience mapping helps you anticipate customer/user behavior.

The potential to worsen the relationship with the customer, instead of enhancing it, is very real. A bank that wants to allow customers to use VUIs to conduct basic transactions first must know how customers will use these interfaces. To transfer money from a savings account to a checking account, will the customer say, “I want to transfer money to checking," or “Transfer to checking," or “Transfer money," or “Transfer from savings," or some other utterance?

To understand what people will do when interacting with a voice-enabled device or a chatbot, it's helpful to observe and listen as people actually use VUIs to carry out transactions and request information. Knowing which utterances they are most likely to use will help businesses develop VUIs that connects with customers.

In the case of the bank, experience mapping will help the bank make decisions about what to do when the customer doesn't specify which account the money should be transferred to or from. And that will prevent the bank from asking customers to restate their requests in varied ways until they happen to alight on the precise phrasing that the VUI understands.

Another potential source of frustration worth mentioning: lack of transparency. If you're using chatbots to answer customers' questions, it's important that you be honest about that. When customers are misled, perceptions of your business quickly deteriorate. Once trust is lost, it's always hard to rebuild.

2. Services.

Most businesses already have developed many services for their web and mobile applications. Some assume that this means they already have what they need to leverage the voice channel. That may be true, but a good deal of service mapping is required at the middle tier.

The service mapping will require additional services, because you will need to create more generic services that can map to those finer-grained existing services.

The reason for this is that people tend to ask questions that are more general than existing services may be able to answer. When that happens, the new services will need to handle these requests, decide on the appropriate action and, in many cases, aggregate several finer-grain services. For example, a bank customer who has multiple checking accounts may say, “Give me my checking account balance," but the bank's existing service won't be able to satisfy the request because it doesn't know which account the customer has in mind. What is the appropriate action to take? Should Alexa read back the balances for all accounts, or should Alexa ask a clarifying question? If the request is coming from a chatbot, does this increase the number of accounts the bank is willing to aggregate into a single response? New services are required to support these decisions as well as to make the multiple calls needed for aggregation.

In short, businesses must be able to tie together the intent of the customer's voice (or text, in the case of a chatbot) with the appropriate services at the back end, so that each request is satisfied appropriately.

3. Artificial intelligence and machine learning sentiment and intent detection.

You need to understand what your customers want and how they feel - before you present them with a new channel.

That's key, because the amount of time it takes for a customer to become frustrated with a VUI or chatbot is less than the amount of time it takes for the same person to become frustrated with a web or mobile app.

You need to be able to determine if a person who is sending text to a chatbot, for example, is becoming frustrated. Extensive use of upper-case letters, exclamation points, and obscenities are clues. And if frustration is detected, you need to know how you'll deal with it. Does that mean it's time to move the customer from a chatbot to a real person?

These and other questions about consumer sentiment need to be explored before VUIs and chatbots are developed, so that you truly remove barriers instead rather than create new ones.

A related issue involves intent. You'll need to establish a way to determine whether voice-enabled devices and chatbots truly understand and deliver what consumers are asking for. If you find, for example, that 40 percent of voice requests aren't being answered, you will need to evaluate the unsatisfied requests, identify patterns, and develop appropriate solutions.

Many businesses are forging ahead with proofs of concept for natural language processing. Dealing with the three complicating factors I've noted will help them develop successful proofs of concept and prepare effectively to join the war for the family room.