Blog

We believe there is something unique at every business that will ignite the fuse of innovation.

World-Class Digital Experience The Google Assistant was announced at Google I/O in May 2016 as Google's answer to other voice-assistant technologies such as Apple Siri and Amazon Alexa. Allowing you to have your own personal Google bot on your Android phone or on a Google Home device, the Google Assistant made completing common tasks a breeze with very intuitive and personal voice-based conversations. Without having to remember key phrases or application names, users could perform tasks as if they were talking to another human being.

Recently at Google I/O 2017, Google announced support for iPhone and Android devices running the latest versions of the Android OS (Marshmallow 6.0 or Nougat 7.0). This brings the Google Assistant to 100 million devices.

The Google Assistant is supported by Actions on Google. These actions allow your user to interact with your application via the Google Assistant using their voice. Voice inputs to the Google Assistant can be quick commands such as turning on a light, or full conversations like playing a trivia game that may require multiple requests and responses.

API.ai is a conversational platform that features a tight integration with the Google Actions SDK and lets you design and build actions using a simple IDE supported by natural language understanding (NLU) learning modules from the cloud. We'll leverage API.ai to bring our action to life on the Google Assistant.

What is Natural Language Understanding?

Natural Language Understanding is the process by which artificial intelligence agents can take spoken or textual input and apply complex analysis to tag phrases and extract sentiment information. The benefit of this process is the ability for humans to interface with a computer as they would another person using natural spoken language. Unfortunately for computers, humans can sometimes speak with ambiguity, and the human language comes with a lot of rules that differ from one language to the next. But with the advances in cloud computing, developers have access to proven NLP (Natural Language Processing) capabilities without having to learn the complex algorithms that it takes to perform NLP functions such as semantic parsing, paraphrasing, and entity recognition.

What are the benefits?

Companies are starting to take notice of Natural Language Understanding. According to Oracle research, chatbots alone could save $174 Billion across the Insurance, Financial Services, Sales, and Customer industries. With more and more people taking to social media and using mobile devices to interact with each other, companies could leverage NLP and "chatbot" technologies such as API.ai and Actions on Google to provide support and guidance for customers looking to learn about a product or service. Sales teams can take advantage of features such as contexts to allow for deep two-way conversations between a potential sale and a computer to gather metrics and keep track of the subject of the conversation and even provide relevant recommendations based on personalized real-time feedback from the customer.

Building our Action on Google

Actions on Google allow you to build and deploy applications on the Google Assistant. No specific tool or knowledge of a programming language is required, Actions on Google is a simple web-based console and one of Google's many Platform-as-a-Service (PaaS) offerings. We will create our first Action on Google from the Actions on Google console. To start, we create a developer project by clicking the Add/Import Project tile. This project will contain certain properties or metadata of our Action on Google such as our Action name, description, and branding information as well as allow you to manage your action throughout the approval and deployment process.

Creating a new Action on Google developer project.

Next, we give our project a name, for this example, we will use "HelloWorld". Now that our project has been created, we have a few options on how to develop our Action. Google provides an Actions SDK for developers to download and create Actions packages to describe the request and responses to a user's voice query. The other option is to integrate your Action with API.ai. Implementing our Action with API.ai will allow us to leverage NLU in the cloud and take advantage of API.ai's flexible toolset to understand and respond to requests from our user.

Natural Language Understanding with API.ai

Acquired by Google in 2016, API.ai is a tool used to support your application or chatbot with NLU capabilities. A simple user interface allows the developer to provide the domain-specific knowledge and allow API.ai to perform speech recognition, intent recognition, and phrase tagging. With this understanding of what your customer said, your application can focus on exposing the requested data or performing custom business logic to respond in a number of ways such as continuing to engage in conversation or requesting more information while remembering the original context of the conversation.

At the core of API.ai's NLU engine are the complex machine learning algorithms used to map user phrases to specific Intents or Entities. Although this sounds simple enough, the true power of API.ai comes with your agents ability to "learn" from user phrasing you provide as well as applying knowledge from similar models developed and maintained by API.ai. Even with this combination, each agent is extremely unique and based on the data we provide will generate a custom model specific to our own use cases with complex mapping of our specific Intents and Entities.

API.ai Intents and Entities

Intents help us map what the user can say to how our Action should respond. Entities are specific phrases or words that the user will say to help us extract useful data for use within our Intents.

Creating an intent to allow the user to ask about our name.

We can teach our agent which Intent maps to what the user is saying by giving it examples of what our user would say. In our example, we have an intent to help the user get information about our Action's name. Each user expression is written in natural language and any system or custom Entities are automatically annotated.

System Entities are prebuilt entities provided by API.ai to represent common concepts in everyday speech. Dates, colors, numbers are just a few things that generally don't change from one conversation to the next and API.ai can automatically understand and tag these phrases for us. The full list of system entities can be found on API.ai's entity documentation.

We can create our own custom entities for API.ai to understand by giving it a list of words and phrases to listen for. Just like in natural human speech, some objects can be referred to by multiple names, API.ai takes care of this for us by also allowing us to tell it possible synonyms for each Entities entry value. By turning on Automated Expansion, API.ai will be able to recognize Entity values that weren't explicitly listed as an Entity by inferring meaning from similar phrases provided to the Intent.

Creating an entity to manage different types of sports our user could say.

Tagging Entities with API.ai

When providing user expressions for each Intent, API.ai will automatically tag and annotate Entity values that it detects in each phrase. These Entities will then be available as parameters which can be sent to our custom business logic. By default, API.ai will send the Entity value but a new parameter can be added to also send exactly what the user had said. This can be helpful when our business logic needs to understand and pull out certain values from System Entities such as an amount.

Our entity values have been annotated based on example user expressions given to this intent.

Note above that API.ai has automatically annotated the entity values within our user expressions. This entity is now available as a parameter value which can be optional or required. When the user does not provide a required Entity or API.ai could not understand, API.ai can use prompts to follow-up with a text response. In the example above, it wouldn't make much sense to tell the user our favorite sports team if we didn't know which sport they were talking about so we could follow up with a prompt asking the user which sport they were asking about.

We can prompt the user for a missing entity.

Responding To Your User

Intents can respond with a Text Response or a Custom Payload if integrating with another voice assistant. When providing a Text Response, you can give a list of phrases and API.ai will randomly respond with one. If a response requires you to perform custom logic based on what the user has said, API.ai can be setup with a webhook to a custom URL endpoint.

A Webhook is one of API.ai's fulfillment options allowing you to pass information about a matched Intent to a web-service and get a result from it. Your information can be secured in one of two ways: basic authentication with a login and password or by providing additional authentication headers. If no authentication is required, these fields can be left blank. Deciding how to implement the web-service is left entirely up to the developer, as long as it is available via the public web, API.ai will submit information to the endpoint with the only limitations being a 5 second timeout threshold and a maximum size of 65K in the response back to API.ai.

Enable the fulfillment webhook to perform custom logic on intents that API.ai has understood.

When the "Use webhook" fulfillment option is selected from an Intent, API.ai will make a POST request to your endpoint with a JSON payload matching an API.ai Response Object. By traversing through this Response JSON object, the application behind our endpoint can pick out values and apply our business logic to build out a response which is sent back as a JSON object with speech and displayText values.

{
	"speech": "Thank you for placing your order with us today!  It will be shipped shortly.",
	"displayText": "Order placed."
}

The speech value will be used by our Google Action when speaking to the user through a Google Home or similar device, the displayText value will be sent as text only when using the Google Assistant on a phone.

Testing and Publishing Your Google Action

Once we have provided our API.ai Agent with some Intents we can test it through the Actions on Google Simulator. The Simulator is a browser based console that contains a sandbox Google Assistant that can be used to test your Action through voice or text input. If you are logged-in via a Google service account on other devices such as the Google Home or an Android Phone, a test version of your Action will also be made available to test through these devices. Simply invoke your Action by asking to talk to your Action by name.

Use the Actions on Google sandbox environment to test your action using voice or text input.

Once tested, your Action can be submitted for approval and if accepted will available to all users. For full details on how to register and publish your Action, review the publishing guide.

About the Author

Kevin Jamieson
Kevin Jamieson is a Senior Consultant in CapTech’s Charlotte office.  As a Google Certified Professional, Kevin has brought his passion for clean and elegant full-stack web solutions to our clients in the banking and healthcare industries and hopes to leverage advances in cloud computing to integrate Natural Language Processing agents to change the way business and customers interact.