RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Authors:
Adrian de Wynter,
Ishaan Watts,
Nektar Ege Altıntoprak,
Tua Wongsangaroonsri,
Minghui Zhang,
Noura Farra,
Lena Baur,
Samantha Claudet,
Pavel Gajdusek,
Can Gören,
Qilong Gu,
Anna Kaminska,
Tomasz Kaminski,
Ruby Kuo,
Akiko Kyuba,
Jongho Lee,
Kartik Mathur,
Petter Merok,
Ivana Milovanović,
Nani Paananen,
Vesa-Matti Paananen,
Anna Pavlenko,
Bruno Pereira Vidal,
Luciano Strika,
Yueh Tsao
, et al. (8 additional authors not shown)
Abstract:
Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transc…
▽ More
Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microagressions, bias). We release of this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
Astrocytes mediate analogous memory in a multi-layer neuron-astrocytic network
Authors:
Yuliya Tsybina,
Innokentiy Kastalskiy,
Mikhail Krivonosov,
Alexey Zaikin,
Victor Kazantsev,
Alexander Gorban,
Susanna Gordleeva
Abstract:
Modeling the neuronal processes underlying short-term working memory remains the focus of many theoretical studies in neuroscience. Here we propose a mathematical model of spiking neuron network (SNN) demonstrating how a piece of information can be maintained as a robust activity pattern for several seconds then completely disappear if no other stimuli come. Such short-term memory traces are prese…
▽ More
Modeling the neuronal processes underlying short-term working memory remains the focus of many theoretical studies in neuroscience. Here we propose a mathematical model of spiking neuron network (SNN) demonstrating how a piece of information can be maintained as a robust activity pattern for several seconds then completely disappear if no other stimuli come. Such short-term memory traces are preserved due to the activation of astrocytes accompanying the SNN. The astrocytes exhibit calcium transients at a time scale of seconds. These transients further modulate the efficiency of synaptic transmission and, hence, the firing rate of neighboring neurons at diverse timescales through gliotransmitter release. We show how such transients continuously encode frequencies of neuronal discharges and provide robust short-term storage of analogous information. This kind of short-term memory can keep operative information for seconds, then completely forget it to avoid overlap** with forthcoming patterns. The SNN is inter-connected with the astrocytic layer by local inter-cellular diffusive connections. The astrocytes are activated only when the neighboring neurons fire quite synchronously, e.g. when an information pattern is loaded. For illustration, we took greyscale photos of people's faces where the grey level encoded the level of applied current stimulating the neurons. The astrocyte feedback modulates (facilitates) synaptic transmission by varying the frequency of neuronal firing. We show how arbitrary patterns can be loaded, then stored for a certain interval of time, and retrieved if the appropriate clue pattern is applied to the input.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.