LAMDA is created by accurate configuration of the family of neural language models based on Transformer, specialized for dialogue, with the model parameters of up to 137b and teaching models to use external sources of knowledge. Lamda pursues three key goals:
• Quality (Quality), which is measured through reasonableness (Sensiblence), specificity (specificity) and interestingness (Interestingness). These indicators are evaluated by people. Reasonability indicates the presence of meaning in the context of dialogue, for example, the absence from the ML model of absurd answers and contradictions with earlier answers. Specificity shows whether the response of the system is specific for the context of the previous dialogue. Interestingness measures the emotional reaction of the interlocutor to the answers of the ML model.
• Safety so that the answers of the model do not contain offensive and dangerous statements.
• objectivity (Groundedness) - modern language models often generates statements that seem plausible, but actually contradict true facts in external sources. Objectivity is defined as the percentage of answers with claims about the outside world that can be confirmed by authoritative external sources. A related metric, information content (Informativence), is defined as the percentage of answers with information about the outside world, which can be confirmed by known sources.
Lamda models undergo two -stage training: preliminary training and thin tuning. The first stage is made at a set of data from 1.56 thousand words from the publicly accessible data of dialogs and public web documents. After tokenize the data set in 2.81T tokens, the model was trained to predict each next token in the sentence, taking into account the previous ones. A pre-trained LAMDA model was also widely used for NLP research in Google, including program synthesis, training with zero shot, etc.
At the stage of thin settings, LAMDA is learning to combine generative tasks to create answers in natural language in given contexts and classification tasks to determine the safety and quality of the model. So it turns out a single multitask model: the LAMDA generator is trained to predict the next token in the dialogue set, and the classifiers are trained to predict security assessments and the quality of the response in the context using annotated data.
Testing results showed that Lamda significantly exceeds a pre -trained model in each dimension and any scale. Quality indicators improve with an increase in the number of model parameters, with fine setting and even without it. Safety does not improve only due to the scale of the model, but is compensated for accurate tuning. Objectivity improves as the size of the model grows, thanks to the ability to remember unusual knowledge. And accurate setting allows the model to access external sources and effectively transfer part of the load to memorize knowledge to them. With the help of accurate tuning, the quality of quality with the human level can be reduced, although the productivity of the model remains below the human level in terms of safety and objectivity.
#machinelearning #artificialintelligence #ai #datascience #python #Programming #TechNOLOGY