From shopping to therapy and friendship, there seems to be a chat for everything. And now, researchers from the Institute of Information Sciences (ISI) of the School of Engineering at the University of Southern California (USA) have got a new way to assess the conversational skills of chatbots Cleverbot or Google Meena. In document presented at the 2020 AAAI AI Conference, USC researchers announced a new metric for evaluating chatbot responses.
So why do we need a new metric in the first place? As one of its authors explained, it is much more difficult to evaluate how well something like a chatbot communicates with a user, since it can be an open-domain dialogue system in which the interaction mainly contains open information.
A dialogue system is, in fact, a computer system that includes text, speech, and other gestures for communicating with people. There are two main types. The first is task-oriented dialog systems. They are useful when we want to achieve a specific goal, such as booking a hotel room, buying a ticket or booking a flight. The second is open-domain dialog systems, such as chatbots, that are more focused on interacting with people at a deeper level, and they do this by simulating conversations between people.
Evaluating open-domain conversational systems is one of the most important steps in developing high-quality systems, scientists emphasize. Compared to task-oriented dialog boxes in which the user communicates to achieve a predetermined goal, evaluating dialog systems with an open domain is more complex. A user who communicates with conversational systems with an open domain does not follow any specific goals. Therefore, the score cannot be measured on whether the user has reached the goal.
In their article, ISI researchers emphasized that evaluating conversational systems with an open domain should not be limited to specific aspects such as relevance – the answers should also be really interesting for the user.
Responses generated by an open-domain dialog system are acceptable when they are relevant to users and also interesting, researchers say. Scientists have been able to show that the inclusion of an interesting aspect of the responses, which is called engagement assessment, can be very useful for a more accurate assessment of open-domain dialog systems. Understanding the score will help improve chatbots and other similar conversational systems.
Chatbots such as Cleverbot, Meena, and XiaoIce can engage people in conversations that are more like real-life conversations than task-oriented conversational systems.
For example, XiaoIce, a Microsoft chatbot for 660 million Chinese users, has a character that mimics a smart teenage girl, and along with providing basic AI assistant functions, she can also write original songs and poems, play games, read stories and understand jokes. XiaoIce is described as an “empathic chatbot” because it is trying to connect and create friendships with the person with whom it interacts.
These types of chatbots can be useful for people who are not socialized so that they can learn how to communicate and make new friends, the researchers emphasize.
Open-domain chatbots that attract people at a deeper level are not only gaining in popularity but also becoming more advanced. Nevertheless, the main intention for user interaction with these types of chatbots is not only entertainment but also the acquisition of general knowledge.
For example, chatbots with an open domain can be used to solve more serious problems.
Some are designed to provide support for the mental health of people who are experiencing depression or anxiety. Patients can use these systems to receive free consultations when they need them. A study funded by the U.S. Defense Advanced Research Projects Agency (DARPA) found that it’s easier for people to talk about their feelings and personal problems when they know they’re talking to a chatbot because they feel they won’t judge them.
Open-domain chatbots are also extremely useful for people who are learning a foreign language. This is especially useful for people who are not confident in their language skills or are even very shy to communicate with real people”.
A predictive metric of interaction will help researchers better evaluate these types of chatbots, as well as the open domain dialogue system as a whole.