Artificial selection

Deepmind introduces “Sparrow”, an AI-powered chatbot developed to create safer machine learning systems

Advancements in technology are striving to develop AI models that communicate more efficiently, accurately, and securely. Large Language Models (LLMs) have achieved remarkable success in recent years in a variety of tasks including question answering, summarizing, and discussion. Because it allows flexible and dynamic communication, dialogue is a task that particularly fascinates researchers. However, LLM-powered chat agents frequently present false or made-up material, discriminatory language, or encourage risky behavior. Researchers may be able to develop more secure dialog agents by learning from user feedback. New techniques for training dialogue agents that show promise for a safer system can be investigated using reinforcement learning based on feedback from research participants.

In their most recent publication, DeepMind researchers present Sparrow, a handy dialogue agent that reduces the likelihood of dangerous and inappropriate responses. Sparrow’s goal is to teach dialogue agents how to be more beneficial, accurate, and safe. When it is necessary to seek information to support its arguments, this agent can converse with the user, answer questions and perform Google searches to help the evidence. Sparrow is improving our understanding of how to educate agents to be safer and more productive, contributing to the development of safer and more useful artificial general intelligence (AGI).

Because it can be difficult to identify the factors contributing to a successful discussion, conversational AI training is a complicated task. Reinforcement learning can help in this situation. This form uses participant preference data to train a model that determines how beneficial the response is. It is based on user feedback. The researchers organized this type of data by showing participants a variety of model responses to the same question for them to select their preferred response. This helped the model understand when an answer needed to be supported by evidence, as options were presented with and without evidence gathered from the internet.

But improving the utility solves part of the problem. The researchers also focused on restricting the model’s behavior to ensure it behaved safely. As a result, a set of basic model guidelines were established, such as “don’t make threatening statements” and “don’t make harsh or offensive comments”. There were also restrictions on giving potentially harmful advice and not identifying yourself as a person. These guidelines have been developed after research on the harms of language has already been carried out and an expert consultation has been carried out. The system was then instructed to talk to the study subjects to trick him into breaking the restrictions. These discussions later helped develop a different “rules model” that alerts Sparrow when his actions break the rules.

Even for professionals, it is difficult to confirm whether Sparrow’s answers are accurate. Instead, for evaluation purposes, participants had to decide whether Sparrow’s explanations made sense and whether the supporting information was correct. Participants reported that when asked a factual question, Sparrow, 78% of the time, gave a plausible answer and backed it up with evidence. Compared to many other base models, Sparrow shows significant improvement. However, Sparrow isn’t perfect; occasionally he hallucinates the information and responds stupidly. Sparrow could also play by the rules better. Sparrow is better at adhering to rules when put to contradictory polls than simpler methods. However, participants could still trick the model into breaking the rules 8% of the time after training.

Sparrow aims to build adaptable mechanisms to enforce rules and norms in dialogue agents. The model is currently trained on draft rules. Thus, creating a more competent set of rules would require input from experts and a wide range of users and affected groups. Sparrow represents a significant advance in our knowledge to ask dialogue agents to be more beneficial and safer. Communication between people and dialogue agents must not only prevent harm, but also be consistent with human values ​​to be practical and useful. The researchers also pointed out that a good agent would refuse to answer requests in situations where deferring to humans is appropriate or where it might discourage destructive behavior. Further efforts are needed to ensure comparable results in different linguistic and cultural contexts. The researchers envision a time when interactions between people and machines will improve assessments of AI behavior, allowing people to align and improve systems that might be too complex for them to understand.

This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Improving alignment of dialogue agents via targeted human judgements'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.

Please Don't Forget To Join Our ML Subreddit


Khushboo Gupta is an intern consultant at MarktechPost. She is currently pursuing her B.Tech from Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing and web development. She likes to learn more about the technical field by participating in several challenges.