Given how relatively young these technologies are, many different applications are yet to emerge and be developed for business and consumers alike. Let's take a look at a few applications our teams created as well as some other possible uses for NLP.
- Reminders: Interfacing with your calendar, a NLP-based Assistant can remind you of your next calendar appointment.
- Information seeking: A user can ask questions like "How high is the Eiffel Tower?", or "When does the next train to NY Penn Station leave?"
- Device Control: Many Digital Assistants include the capability to control physical devices, e.g. "Switch off the light in the dining room."
- Hospitality: Having checked into the hotel room, a guest can directly request room service that would typically be handled by the front desk staff: "I would like to order a BLT burger for a dinner in my room at 6pm". Additionally, staff can use NLP systems for tasks like alerting the front desk a room is clean.
- Content Management System: An NLP-based system can monitor, change and delete contents on the CMS via verbal interaction. A restaurant publishing the menu via smart phone app could control the contents with requests such as, "Add Grilled Chicken Salad to the lunch menu for $6.99".
- Scrum Assistant: Each morning, a software development Scrum Team has a stand-up meeting where each team member gives a status update and states his/her plans for the day. A Scrum Assistant can support this meeting for instance by changing the status of the tasks assigned to a developer, which can be done by verbal commands. Monitoring the progress of the meeting, the Scrum Assistant can detect missing action items and prompt the participants verbally, e.g. "Hey John, can you tell me the status of your task 'Change the color of the CLOSE button to RED'?"
- Self-Serve Business Analytics: Currently, self-serve analytics are typically available in the form pre-defined diagrams and dashboards. NLP can help business users to simply describe their desired information: "What were the car sales in our Denver downtown location from January 1st through July 31st?" and "Compare those results to the same time-period last year".
- Text parsing: The applications above all operate via verbal interactions. However, another important field of NLP applications is the intelligent translation of unstructured text into structured information. As an example, an NLP agent could data-mine resumes to create a database containing the technical expertise of all employees. Using aspects from the Self-Service Analytics examples given above, a user could then query the data base: "Who is an expert in NODE.JS?" and obtain a list of those software engineers who can help him/her with that particular technology.
Things to Know to Develop Your Own Successful NLP-driven application
One of the key findings we made during our tech challenge was that scripting an Alexa skill for the sake of a demo is simple. The struggle comes in with establishing a voice and a personality by having a user experience combined with predictive analytics and machine learning to create compelling experiences beyond a simple call and response.
Several vendors like Amazon and Google have opened their NLP systems to outside developers for use and adding extensions to the NLP model. They provide web interfaces and APIs to set-up and interact with the underlying NLP engine. And while it is initially relatively easy to create a system that responds to verbal requests, experimentation has quickly pushed us to the limits of the current systems and shown us some pitfalls. Some of the problems can be avoided through clever implementation. However, some issues are inherent limitations of current NLP systems.
Challenges with currently available NLP systems
Automated Speed Recognition
- Background Noise: Obviously, a vital step in the NLP processing chain is the conversion of speech into digitally interpreted words. While human listeners can focus on the "main speaker", for NLP systems it is important to minimize background noise: E.g. working in an open-office environment, the system tends to pick up all kinds of conversational fragments which make it difficult and sometimes even impossible to achieve reliable speech recognition. One obvious solution is to avoid environments where multiple conversations are going on simultaneously. However, that is not always possible, but a high-quality, directional near-range microphone can help to focus only on the principal user's voice.
- Homophones: These are a big source of problems: Similar sounding words (to / two, four / for, ate / eight, steal / steel, rite / write / right / wright), and false homophones (three / free / tree, faults / false, Betsy / Etsy) are difficult to recognize and translate correctly. This problem can be compounded when non-native English speakers with an accent interact with the NLP system. For instance, the words "App" and "Up" can easily be confused when not pronounced exactly the way the system expects them (training can help, more about that below).
- User Clarity: Clear enunciation is important. As an example, our system always mixed up the words "Betsy" and "Etsy" when the leading "B" was not pronounced in an exaggerated way. Lazy-lipped speakers will encounter more problems than those who enunciate clearly. (As an interesting side-effect, we found that the NLP system can serve as an excellent language training tool for foreign speakers because it forces the users to pronounce words correctly and enunciate them clearly.)
- Names: Names are another problem area. The permutations of spelling a particular name and its variants are seemingly endless (e.g. Kristin / Kristen / Christine, Stephan / Stefan / Stefane). Consequently, it is very difficult for the speech recognition engine to find the correct version. Further, a name usually does not stand in a contextual relationship with any other word in the particular utterance. Hence, we found that achieving correct recognition of names is very difficult. If your application caters only to a specific, limited group of users, back-end logic combined with pre-loaded lists of names might alleviate some of the problems. For an application used by the general public, this is a serious problem for which we have not found a solution yet.
- Input ≠ Output: For some utterances, the same input will produce a different output. An example would be the words "One Two Three" which could be translated into "One Two Three", "123", "1 2 3", or "One Hundred and Twenty-Three". Like before, some backend logic might help.
Once a (hopefully accurate) recognition has been achieved, the NLP engine tries detecting the meaning of those words and maps them to an appropriate "intent" (which is essentially a software function invoked by the system for a specific meaning). Here we cannot emphasize enough that proper planning and design are vital for the creation of a well-functioning system.
All the systems we have worked with require mapping sample utterances to intents which in turn trigger appropriate responses or actions. We found that precise mapping of "sample utterances" (e.g. "Lists tasks", "List all tasks", "List the first five tasks" etc.) to intents make it easier for the system to "understand" what we meant. Further, an overlap of intents should be avoided. For instance, taking the Scrum Assistant as an example, defining one intent for "List all items" and another for "List all tasks" causes problems because "tasks" are also "items" and thus there is an overlap. The clearer separated the intents are the more accurate the NLP system will react.
One way to create a well-functioning system is to establish levels of hierarchies: Once a user has stated interest in a particular topic, specific follow-up questions and answers mapped to specific intents can lead to a satisfying system response. It is also useful to establish the context of any answer the users give thus steering the system in the right directions.
As an example, examine our hospitality use case: the request for room service "I would like to order a dinner" should immediately establish the fact that all subsequent input in this particular conversation is about room service / dinner / food and only related intents should be eligible to be triggered. In this example, asking a question about bringing a toothbrush to the customer's room would not be appropriate. The systems we worked with allow to set "context"-flags which enable sophisticated flow-control of the NLP system.
As mentioned above, sample utterances must be mapped to intents and actions to train the system. When working with the NLP system, and (manually) flagging wrongly interpreted utterances, over time the engine will become more accurate. We found that for consumer tasks, that is, tasks that were already defined in the engine such as information seeking ("How high is the Eiffel Tower?"), and that can draw on a large training set, the NLP engine worked almost flawlessly.
For our own tasks, at first, the accuracy was moderate but improved significantly over time.
However, we encountered an interesting effect: At the outset, only one of our developers interacted with the NLP-driven application, and after a while working with the system a satisfactory performance was achieved. When other users started using the application, initially they had a very frustrating experience because very few of their requests were interpreted correctly. One reason for this was that they used slightly different verbiage than the first user. After manually adding that verbiage to the training set, the performance improved somewhat. However, only after spending considerable time with the application and allowing the NLP voice model to get trained on the new verbiage, and on the new voices, did the performance finally reach an acceptable level for them as well.
An NLP system can be perfectly trained for one group of users, but when a different group starts using the application problems might surface. The solution is to train the NLP system with the largest and most diverse group of users as practical to cover the widest range of voices, nomenclatures and permutations of wordings as possible.
Additionally, secure protocols must be used to pass data and information between the building blocks of the NLP-driven application. Openly available APIs like Amazon's Alexa and Google's Assistant interact with the back-office application logic. Depending on how those systems are set up, it might or might not be possible to establish a secure pipeline end-to-end, that is all the way from voice input to back-end database and back. With NLP driven systems becoming increasingly prevalent and turning into targets for illegitimate and criminal activity, it is imperative to include security considerations from the beginning.
Where does this lead?
Since speaking (and hearing) is the most intuitive way to communicate. To employ the power of NLP can make interactions with applications much easier, and products based on NLP much more appealing to the user. A well-designed NLP user experience can attract positive attention, improve customer experience and can increase customer loyalty. Over time, this technology will become even more mature.
But right now, we are still in the early stages of doing this well. We believe the ability to infer intent and sentiment can provide valuable responses using predictive analytics and machine learning will continue to grow and add to the customer experience. Fact is, today is the perfect environment to get your version 1 NLP app created avoiding these pitfalls but establishing your channel for your customers early. This is the beginning of the wave.