Search | arXiv e-print repository

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Authors: Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao , et al. (8 additional authors not shown)

Abstract: Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transc… ▽ More Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microagressions, bias). We release of this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Work in progress

arXiv:cs/9907017 [pdf, ps, other]

A Bootstrap Approach to Automatically Generating Lexical Transfer Rules

Authors: Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole

Abstract: We describe a method for automatically generating Lexical Transfer Rules (LTRs) from word equivalences using transfer rule templates. Templates are skeletal LTRs, unspecified for words. New LTRs are created by instantiating a template with words, provided that the words belong to the appropriate lexical categories required by the template. We define two methods for creating an inventory of templ… ▽ More We describe a method for automatically generating Lexical Transfer Rules (LTRs) from word equivalences using transfer rule templates. Templates are skeletal LTRs, unspecified for words. New LTRs are created by instantiating a template with words, provided that the words belong to the appropriate lexical categories required by the template. We define two methods for creating an inventory of templates and using them to generate new LTRs. A simpler method consists of extracting a finite set of templates from a sample of hand coded LTRs and directly using them in the generation process. A further method consists of abstracting over the initial finite set of templates to define higher level templates, where bilingual equivalences are defined in terms of correspondences involving phrasal categories. Phrasal templates are then mapped onto sets of lexical templates with the aid of grammars. In this way an infinite set of lexical templates is recursively defined. New LTRs are created by parsing input words, matching a template at the phrasal level and using the corresponding lexical categories to instantiate the lexical template. The definition of an infinite set of templates enables the automatic creation of LTRs for multi-word, non-compositional word equivalences of any cardinality. △ Less

Submitted 9 July, 1999; originally announced July 1999.

Comments: 8 pages, 1 figure, to be presented at Machine Translation Summit VII, September 13-17, 1999, Singapore

ACM Class: I.2.7

arXiv:cs/9907008 [pdf, ps, other]

Explanation-based Learning for Machine Translation

Authors: Janine Toole, Fred Popowich, Devlan Nicholson, Davide Turcato, Paul McFetridge

Abstract: In this paper we present an application of explanation-based learning (EBL) in the parsing module of a real-time English-Spanish machine translation system designed to translate closed captions. We discuss the efficiency/coverage trade-offs available in EBL and introduce the techniques we use to increase coverage while maintaining a high level of space and time efficiency. Our performance result… ▽ More In this paper we present an application of explanation-based learning (EBL) in the parsing module of a real-time English-Spanish machine translation system designed to translate closed captions. We discuss the efficiency/coverage trade-offs available in EBL and introduce the techniques we use to increase coverage while maintaining a high level of space and time efficiency. Our performance results indicate that this approach is effective. △ Less

Submitted 6 July, 1999; originally announced July 1999.

Comments: 12 pages, 3 figures, To appear in Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation

ACM Class: J.5

arXiv:cs/9906034 [pdf, ps, other]

A Unified Example-Based and Lexicalist Approach to Machine Translation

Authors: Davide Turcato, Paul McFetridge, Fred Popowich, Janine Toole

Abstract: We present an approach to Machine Translation that combines the ideas and methodologies of the Example-Based and Lexicalist theoretical frameworks. The approach has been implemented in a multilingual Machine Translation system. We present an approach to Machine Translation that combines the ideas and methodologies of the Example-Based and Lexicalist theoretical frameworks. The approach has been implemented in a multilingual Machine Translation system. △ Less

Submitted 30 June, 1999; originally announced June 1999.

Comments: 11 pages, to be presented at the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-99)

ACM Class: I.2.7

arXiv:cmp-lg/9807010 [pdf, ps, other]

Automatically Creating Bilingual Lexicons for Machine Translation from Bilingual Text

Authors: Davide Turcato

Abstract: A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences. A method is presented for automatically augmenting the bilingual lexicon of an existing Machine Translation system, by extracting bilingual entries from aligned bilingual text. The proposed method only relies on the resources already available in the MT system itself. It is based on the use of bilingual lexical templates to match the terminal symbols in the parses of the aligned sentences. △ Less

Submitted 20 July, 1998; originally announced July 1998.

Comments: Latex file, uses colacl.sty file, 7 pages

Journal ref: Proceedings of COLING-ACL'98

arXiv:cmp-lg/9706024 [pdf, ps, other]

A Lexicalist Approach to the Translation of Colloquial Text

Authors: Fred Popowich, Davide Turcato, Olivier Laurens, Paul McFetridge, J. Devlan Nicholson, Patrick McGivern, Maricela Corzo Pena, Lisa Pidruchney, Scott MacDonald

Abstract: Colloquial English (CE) as found in television programs or typical conversations is different than text found in technical manuals, newspapers and books. Phrases tend to be shorter and less sophisticated. In this paper, we look at some of the theoretical and implementational issues involved in translating CE. We present a fully automatic large-scale multilingual natural language processing syste… ▽ More Colloquial English (CE) as found in television programs or typical conversations is different than text found in technical manuals, newspapers and books. Phrases tend to be shorter and less sophisticated. In this paper, we look at some of the theoretical and implementational issues involved in translating CE. We present a fully automatic large-scale multilingual natural language processing system for translation of CE input text, as found in the commercially transmitted closed-caption television signal, into simple target sentences. Our approach is based on the Whitelock's Shake and Bake machine translation paradigm, which relies heavily on lexical resources. The system currently translates from English to Spanish with the translation modules for Brazilian Portuguese under development. △ Less

Submitted 18 June, 1997; originally announced June 1997.

Comments: 11 pages, LaTeX, uses tmi.sty

Journal ref: Proceedings of the 7th International Conference on Theoretical Issues in Machine Translation (TMI '97), Santa Fe, NM, 23-25 July 1997.

Showing 1–6 of 6 results for author: Turcato, D