Is your revenue team ready for what's next?
Welcome to our guest blog, courtesy of Martin Reddy, cofounder and CTO at PullString.
Conversational AI is a subfield of artificial intelligence focused on producing natural and seamless conversations between humans and computers. We’ve seen several amazing advances on this front in recent years, with significant improvements in automatic speech recognition (ASR), text to speech (TTS), and intent recognition, as well as the rocketship growth of voice assistant devices like the Amazon Echo and Google Home, with estimates of close to 100 million devices in homes in 2018.
But we’re still a long way away from the fluent human-machine conversation promised in science fiction. Here are some key advances we should see over the next decade that could get us closer to that long-term vision.
Machine learning, and in particular deep learning, has become an extremely popular technique within the field of AI over the past few years. It has already fueled significant advances in domains such as facial recognition, speech recognition, and object recognition, leading many to believe it will solve all of the problems of conversational AI. However, in reality it will be only one valuable tool in our toolbox. We’ll need other techniques to manage all aspects of an effective human-computer conversation.
Machine learning is particularly well suited to problems that involve finding patterns in large corpora of data. Or as Turing Award winner Judea Pearl pithily said, machine learning essentially resolves to curve fitting. There are several problems in conversational AI that map well to this type of solution, such as speech recognition and speech synthesis.
The technique has also been applied to intent recognition (taking a textual sentence of human language and converting that into a high-level description of the user’s intent or desire) with good success, though there are some limitations in using this technique to capture meaning from natural language, which is inherently stateful, sensitive to context, and often ambiguous.
However, there are certainly problems in computer conversation that are not as well suited to machine learning. Think of human-machine conversation as being composed of two parts:
Much of the attention of late has been focused on that first part, but there are many challenges remaining on the generation side, and these tend not to be well suited to machine learning because response generation isn’t simply a product of collecting and analyzing lots of data. The challenge of maintaining a believable, ongoing, and stateful conversation will require more focus on these NLG and dialog management parts of the problem over the coming years.
Conversational experiences today can be quite simple and constrained. In order to move beyond these limitations we will need to support higher fidelity conversations. There are several parts to achieving this, including:
As technologists, we are often driven to try to solve every problem computationally. However, it’s important to note that some domains, such as gaming and entertainment or sales and marketing, may always want to finely craft the voice and personality of the computer responses to match their brand.
Also, it’s been noted recently that trying to produce fully automated natural language generation may not be the best way forward because the most natural human conversations are not the result of rehashing lots of previous conversations but are instead formed by considering the current context, the unique conversational history between the two parties, and a set of broader conversational skills and conventions.
These arguments suggest that keeping a human in the loop of initial dialog generation may actually be a good thing, rather than something we must seek to eradicate. When I worked at Pixar on Finding Nemo, one of the big technical challenges was simulating the appearance and behavior of water. But even more difficult than solving the underlying physics simulation problem was that the water had to be human-directable: The film’s director had to be able to request changes to how the water looked and reacted in a scene.
That same qualifier will be true in the field of conversational AI: Natural language generation solutions must allow for input by a human “creative director” able to control the tone, style, and personality of the synthetic character.
Today, these creative inputs are necessarily at the level of a human writing individual responses for each context that the system can recognize and also defining how the conversation should flow onto the next question or topic. This is how practically all computer conversation experiences work at the moment.
It seems unlikely we will completely remove this human-in-the-loop over the next few years, so as we look toward the future, we will want to build ways that support more scalable and broad mechanisms to define the voice and tone of a computer response, for example, by being able to define its key characteristics at a more abstract level.
The HBO series Westworld does a great job of presenting this view of the world. The artificial “hosts” are obviously very complex and often indistinguishable from flesh and blood humans in terms of their responses and behaviors. However, this is achieved by having many writers in the “narrative” department defining the content for each host and their various high-level personality traits. Creative designers can tweak these factors using powerful visual authoring tools.
Over the coming years, the field could benefit from the development of flexible authoring tools to empower conversation writers in much the same way that tools like Photoshop empowered artists or Final Cut Pro empowered video creators.
A combination of richer tools for language generation and dialog management systems, higher fidelity experiences, and improved use of humans in the loop will produce better content and ultimately launch us forward into a world populated with delightful and seamless computer conversation experiences.
Learn more about how Conversica empowers Conversational AI.
Martin Reddy, is cofounder and CTO at voice technology company PullString. Full article reposted with permisson of the writer. View originally published article on VentureBeat
Conversation Automation tools, like Revenue Digital Assistants™, scale two-way, human-like outreach that engages every contact to find revenue opportunities faster. Learn how to turbocharge your revenue teams to generate more revenue, faster in this eBook.
* By submitting this form, I agree to receive information and updates, including marketing communications, by email about Conversica’s products and services. By submitting this form, I am agreeing to Conversica's privacy policy.
Let us show you how our Powerfully Human®️ digital assistants can help your team unlock revenue. Get the conversation started today.