-
Evolving to be Your Soulmate: Personalized Dialogue Agents with Dynamically Adapted Personas
Authors:
Yi Cheng,
Wenge Liu,
Kaishuai Xu,
Wenjun Hou,
Yi Ouyang,
Chak Tou Leong,
Xian Wu,
Yefeng Zheng
Abstract:
Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its per…
▽ More
Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its persona. This paradigm could enable better personalization for each user, but also introduce unique challenges, which mainly lie in the process of persona adaptation. Two key issues include how to achieve persona alignment with the user and how to ensure smooth transition in the adaptation process. To address them, we propose a novel framework that refines the persona at hierarchical levels to progressively align better with the user in a controllable way. Experiments show that integrating the personas adapted by our framework consistently enhances personalization and overall dialogue performance across various base systems.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
Authors:
Chak Tou Leong,
Yi Cheng,
Kaishuai Xu,
Jian Wang,
Hanlin Wang,
Wenjie Li
Abstract:
The existing safety alignment of Large Language Models (LLMs) is found fragile and could be easily attacked through different strategies, such as through fine-tuning on a few harmful examples or manipulating the prefix of the generation results. However, the attack mechanisms of these strategies are still underexplored. In this paper, we ask the following question: \textit{while these approaches c…
▽ More
The existing safety alignment of Large Language Models (LLMs) is found fragile and could be easily attacked through different strategies, such as through fine-tuning on a few harmful examples or manipulating the prefix of the generation results. However, the attack mechanisms of these strategies are still underexplored. In this paper, we ask the following question: \textit{while these approaches can all significantly compromise safety, do their attack mechanisms exhibit strong similarities?} To answer this question, we break down the safeguarding process of an LLM when encountered with harmful instructions into three stages: (1) recognizing harmful instructions, (2) generating an initial refusing tone, and (3) completing the refusal response. Accordingly, we investigate whether and how different attack strategies could influence each stage of this safeguarding process. We utilize techniques such as logit lens and activation patching to identify model components that drive specific behavior, and we apply cross-model probing to examine representation shifts after an attack. In particular, we analyze the two most representative types of attack approaches: Explicit Harmful Attack (EHA) and Identity-Shifting Attack (ISA). Surprisingly, we find that their attack mechanisms diverge dramatically. Unlike ISA, EHA tends to aggressively target the harmful recognition stage. While both EHA and ISA disrupt the latter two stages, the extent and mechanisms of their attacks differ significantly. Our findings underscore the importance of understanding LLMs' internal safeguarding process and suggest that diverse defense mechanisms are required to effectively cope with various types of attacks.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Enhancing Multi-Domain Automatic Short Answer Grading through an Explainable Neuro-Symbolic Pipeline
Authors:
Felix Künnecke,
Anna Filighera,
Colin Leong,
Tim Steuer
Abstract:
Grading short answer questions automatically with interpretable reasoning behind the grading decision is a challenging goal for current transformer approaches. Justification cue detection, in combination with logical reasoners, has shown a promising direction for neuro-symbolic architectures in ASAG. But, one of the main challenges is the requirement of annotated justification cues in the students…
▽ More
Grading short answer questions automatically with interpretable reasoning behind the grading decision is a challenging goal for current transformer approaches. Justification cue detection, in combination with logical reasoners, has shown a promising direction for neuro-symbolic architectures in ASAG. But, one of the main challenges is the requirement of annotated justification cues in the students' responses, which only exist for a few ASAG datasets. To overcome this challenge, we contribute (1) a weakly supervised annotation procedure for justification cues in ASAG datasets, and (2) a neuro-symbolic model for explainable ASAG based on justification cues. Our approach improves upon the RMSE by 0.24 to 0.3 compared to the state-of-the-art on the Short Answer Feedback dataset in a bilingual, multi-domain, and multi-question training setup. This result shows that our approach provides a promising direction for generating high-quality grades and accompanying explanations for future research in ASAG and educational NLP.
△ Less
Submitted 19 March, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue
Authors:
Jian Wang,
Chak Tou Leong,
Jiashuo Wang,
Dongding Lin,
Wenjie Li,
Xiao-Yong Wei
Abstract:
Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency…
▽ More
Tuning language models for dialogue generation has been a prevalent paradigm for building capable dialogue agents. Yet, traditional tuning narrowly views dialogue generation as resembling other language generation tasks, ignoring the role disparities between two speakers and the multi-round interactive process that dialogues ought to be. Such a manner often leads to unsatisfactory chat consistency for the built agent. In this work, we emphasize the interactive, communicative nature of dialogue and argue that it is more feasible to model the speaker roles of agent and user separately, enabling the agent to adhere to its role consistently. With this in mind, we propose an efficient Multi-round Interactive Dialogue Tuning (Midi-Tuning) framework. It models the agent and user individually with two adapters built upon large language models. The adapters make use of respective utterances round by round in alternating order and they are tuned via a round-level memory caching mechanism. Extensive experiments demonstrate that, our framework performs superior to traditional fine-tuning and harbors the tremendous potential for improving dialogue consistency.
△ Less
Submitted 30 May, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback
Authors:
Jiashuo Wang,
Chunpu Xu,
Chak Tou Leong,
Wenjie Li,
**g Li
Abstract:
An emotional support conversation system aims to alleviate users' emotional distress and assist them in addressing their challenges. To generate supportive responses, it is critical to consider multiple factors such as empathy, support strategies, and response coherence, as established in prior methods. Nonetheless, previous models occasionally generate unhelpful responses, which intend to provide…
▽ More
An emotional support conversation system aims to alleviate users' emotional distress and assist them in addressing their challenges. To generate supportive responses, it is critical to consider multiple factors such as empathy, support strategies, and response coherence, as established in prior methods. Nonetheless, previous models occasionally generate unhelpful responses, which intend to provide support but display counterproductive effects. According to psychology and communication theories, poor performance in just one contributing factor might cause a response to be unhelpful. From the model training perspective, since these models have not been exposed to unhelpful responses during their training phase, they are unable to distinguish if the tokens they generate might result in unhelpful responses during inference. To address this issue, we introduce a novel model-agnostic framework named mitigating unhelpfulness with multifaceted AI feedback for emotional support (Muffin). Specifically, Muffin employs a multifaceted AI feedback module to assess the helpfulness of responses generated by a specific model with consideration of multiple factors. Using contrastive learning, it then reduces the likelihood of the model generating unhelpful responses compared to the helpful ones. Experimental results demonstrate that Muffin effectively mitigates the generation of unhelpful responses while slightly increasing response fluency and relevance.
△ Less
Submitted 17 June, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
COOPER: Coordinating Specialized Agents towards a Complex Dialogue Goal
Authors:
Yi Cheng,
Wenge Liu,
Jian Wang,
Chak Tou Leong,
Yi Ouyang,
Wenjie Li,
Xian Wu,
Yefeng Zheng
Abstract:
In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems. Apart from the requirement for much more sophisticated strategic reasoning and communication skills, a significant challenge of these tasks lies in the difficulty of objectively measu…
▽ More
In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems. Apart from the requirement for much more sophisticated strategic reasoning and communication skills, a significant challenge of these tasks lies in the difficulty of objectively measuring the achievement of their goals in a quantifiable way, making it difficult for existing research to directly optimize the dialogue procedure towards them. In our work, we emphasize the multifaceted nature of complex dialogue goals and argue that it is more feasible to accomplish them by comprehensively considering and jointly promoting their different aspects. To this end, we propose a novel dialogue framework, Cooper, which coordinates multiple specialized agents, each dedicated to a specific dialogue goal aspect separately, to approach the complex objective. Through this divide-and-conquer manner, we make complex dialogue goals more approachable and elicit greater intelligence via the collaboration of individual agents. Experiments on persuasion and emotional support dialogues demonstrate the superiority of our method over a set of competitive baselines.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Adapting pretrained speech model for Mandarin lyrics transcription and alignment
Authors:
Jun-You Wang,
Chon-In Leong,
Yu-Chen Lin,
Li Su,
Jyh-Shing Roger Jang
Abstract:
The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue…
▽ More
The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue, we adapt pretrained Whisper model and fine-tune it on a monophonic Mandarin singing dataset. With the use of data augmentation and source separation model, results show that the proposed method achieves a character error rate of less than 18% on a Mandarin polyphonic dataset for lyrics transcription, and a mean absolute error of 0.071 seconds for lyrics alignment. Our results demonstrate the potential of adapting a pretrained speech model for lyrics transcription and alignment in low-resource scenarios.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Authors:
Shester Gueuwou,
Sophie Siake,
Colin Leong,
Mathias Müller
Abstract:
Advancements in sign language processing have been hindered by a lack of sufficient data, impeding progress in recognition, translation, and production tasks. The absence of comprehensive sign language datasets across the world's sign languages has widened the gap in this field, resulting in a few sign languages being studied more than others, making this research area extremely skewed mostly towa…
▽ More
Advancements in sign language processing have been hindered by a lack of sufficient data, impeding progress in recognition, translation, and production tasks. The absence of comprehensive sign language datasets across the world's sign languages has widened the gap in this field, resulting in a few sign languages being studied more than others, making this research area extremely skewed mostly towards sign languages from high-income countries. In this work we introduce a new large and highly multilingual dataset for sign language translation: JWSign. The dataset consists of 2,530 hours of Bible translations in 98 sign languages, featuring more than 1,500 individual signers. On this dataset, we report neural machine translation experiments. Apart from bilingual baseline systems, we also train multilingual systems, including some that take into account the typological relatedness of signed or spoken languages. Our experiments highlight that multilingual systems are superior to bilingual baselines, and that in higher-resource scenarios, clustering language pairs that are related improves translation quality.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Moral consensus and divergence in partisan language use
Authors:
Nakwon Rim,
Marc G. Berman,
Yuan Chang Leong
Abstract:
Polarization has increased substantially in political discourse, contributing to a widening partisan divide. In this paper, we analyzed large-scale, real-world language use in Reddit communities (294,476,146 comments) and in news outlets (6,749,781 articles) to uncover psychological dimensions along which partisan language is divided. Using word embedding models that captured semantic associations…
▽ More
Polarization has increased substantially in political discourse, contributing to a widening partisan divide. In this paper, we analyzed large-scale, real-world language use in Reddit communities (294,476,146 comments) and in news outlets (6,749,781 articles) to uncover psychological dimensions along which partisan language is divided. Using word embedding models that captured semantic associations based on co-occurrences of words in vast textual corpora, we identified patterns of affective polarization present in natural political discourse. We then probed the semantic associations of words related to seven political topics (e.g., abortion, immigration) along the dimensions of morality (moral-to-immoral), threat (threatening-to-safe), and valence (pleasant-to-unpleasant). Across both Reddit communities and news outlets, we identified a small but systematic divergence in the moral associations of words between text sources with different partisan leanings. Moral associations of words were highly correlated between conservative and liberal text sources (average $ρ$ = 0.96), but the differences remained reliable to enable us to distinguish text sources along partisan lines with above 85% classification accuracy. These findings underscore that despite a shared moral understanding across the political spectrum, there are consistent differences that shape partisan language and potentially exacerbate political polarization. Our results, drawn from both informal interactions on social media and curated narratives in news outlets, indicate that these trends are widespread. Leveraging advanced computational techniques, this research offers a fresh perspective that complements traditional methods in political attitudes.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Self-Detoxifying Language Models via Toxification Reversal
Authors:
Chak Tou Leong,
Yi Cheng,
Jiashuo Wang,
Jian Wang,
Wenjie Li
Abstract:
Language model detoxification aims to minimize the risk of generating offensive or harmful content in pretrained language models (PLMs) for safer deployment. Existing methods can be roughly categorized as finetuning-based and decoding-based. However, the former is often resource-intensive, while the latter relies on additional components and potentially compromises the generation fluency. In this…
▽ More
Language model detoxification aims to minimize the risk of generating offensive or harmful content in pretrained language models (PLMs) for safer deployment. Existing methods can be roughly categorized as finetuning-based and decoding-based. However, the former is often resource-intensive, while the latter relies on additional components and potentially compromises the generation fluency. In this paper, we propose a more lightweight approach that enables the PLM itself to achieve "self-detoxification". Our method is built upon the observation that prepending a negative steering prompt can effectively induce PLMs to generate toxic content. At the same time, we are inspired by the recent research in the interpretability field, which formulates the evolving contextualized representations within the PLM as an information stream facilitated by the attention layers. Drawing on this idea, we devise a method to identify the toxification direction from the normal generation process to the one prompted with the negative prefix, and then steer the generation to the reversed direction by manipulating the information movement within the attention layers. Experimental results show that our approach, without any fine-tuning or extra components, can achieve comparable performance with state-of-the-art methods.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation
Authors:
Jian Wang,
Yi Cheng,
Dongding Lin,
Chak Tou Leong,
Wenjie Li
Abstract:
Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a <dialogue act, topic> pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accompli…
▽ More
Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a <dialogue act, topic> pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accomplishment process. However, there remains an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To address this, we propose an automatic dataset curation framework using a role-playing approach. Based on this framework, we construct a large-scale personalized target-oriented dialogue dataset, TopDial, which comprises about 18K multi-turn dialogues. The experimental results show that this dataset is of high quality and could contribute to exploring personalized target-oriented dialogue.
△ Less
Submitted 13 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Language Models Can Learn Exceptions to Syntactic Rules
Authors:
Cara Su-Yi Leong,
Tal Linzen
Abstract:
Artificial neural networks can generalize productively to novel contexts. Can they also learn exceptions to those productive rules? We explore this question using the case of restrictions on English passivization (e.g., the fact that "The vacation lasted five days" is grammatical, but "*Five days was lasted by the vacation" is not). We collect human acceptability judgments for passive sentences wi…
▽ More
Artificial neural networks can generalize productively to novel contexts. Can they also learn exceptions to those productive rules? We explore this question using the case of restrictions on English passivization (e.g., the fact that "The vacation lasted five days" is grammatical, but "*Five days was lasted by the vacation" is not). We collect human acceptability judgments for passive sentences with a range of verbs, and show that the probability distribution defined by GPT-2, a language model, matches the human judgments with high correlation. We also show that the relative acceptability of a verb in the active vs. passive voice is positively correlated with the relative frequency of its occurrence in those voices. These results provide preliminary support for the entrenchment hypothesis, according to which learners track and uses the distributional properties of their input to learn negative exceptions to rules. At the same time, this hypothesis fails to explain the magnitude of unpassivizability demonstrated by certain individual verbs, suggesting that other cues to exceptionality are available in the linguistic input.
△ Less
Submitted 9 June, 2023;
originally announced June 2023.
-
The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages
Authors:
Vesa Akerman,
David Baines,
Damien Daspit,
Ulf Hermjakob,
Taeho Jang,
Colin Leong,
Michael Martin,
Joel Mathew,
Jonathan Robie,
Marcus Schwarting
Abstract:
Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely l…
▽ More
Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely low resource languages. We introduce the eBible corpus: a dataset containing 1009 translations of portions of the Bible with data in 833 different languages across 75 language families. In addition to a BT benchmarking dataset, we introduce model performance benchmarks built on the No Language Left Behind (NLLB) neural machine translation (NMT) models. Finally, we describe several problems specific to the domain of BT and consider how the established data and model benchmarks might be used for future translation efforts. For a BT task trained with NLLB, Austronesian and Trans-New Guinea language families achieve 35.1 and 31.6 BLEU scores respectively, which spurs future innovations for NMT for low-resource languages in Papua New Guinea.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages
Authors:
Colin Leong,
Herumb Shandilya,
Bonaventure F. P. Dossou,
Atnafu Lambebo Tonja,
Joel Mathew,
Abdul-Hakeem Omotayo,
Oreen Yousuf,
Zainab Akinjobi,
Chris Chinenye Emezue,
Shamsudeen Muhammad,
Steven Kolawole,
Younwoo Choi,
Tosin Adewumi
Abstract:
Many natural language processing (NLP) tasks make use of massively pre-trained language models, which are computationally expensive. However, access to high computational resources added to the issue of data scarcity of African languages constitutes a real barrier to research experiments on these languages. In this work, we explore the applicability of low-compute approaches such as language adapt…
▽ More
Many natural language processing (NLP) tasks make use of massively pre-trained language models, which are computationally expensive. However, access to high computational resources added to the issue of data scarcity of African languages constitutes a real barrier to research experiments on these languages. In this work, we explore the applicability of low-compute approaches such as language adapters in the context of this low-resource double-bind. We intend to answer the following question: do language adapters allow those who are doubly bound by data and compute to practically build useful models? Through fine-tuning experiments on African languages, we evaluate their effectiveness as cost-effective approaches to low-resource African NLP. Using solely free compute resources, our results show that language adapters achieve comparable performances to massive pre-trained language models which are heavy on computational resources. This opens the door to further experimentation and exploration on full-extent of language adapters capacities.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors:
BigScience Workshop,
:,
Teven Le Scao,
Angela Fan,
Christopher Akiki,
Ellie Pavlick,
Suzana Ilić,
Daniel Hesslow,
Roman Castagné,
Alexandra Sasha Luccioni,
François Yvon,
Matthias Gallé,
Jonathan Tow,
Alexander M. Rush,
Stella Biderman,
Albert Webson,
Pawan Sasanka Ammanamanchi,
Thomas Wang,
Benoît Sagot,
Niklas Muennighoff,
Albert Villanova del Moral,
Olatunji Ruwase,
Rachel Bawden,
Stas Bekman,
Angelina McMillan-Major
, et al. (369 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access…
▽ More
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
△ Less
Submitted 27 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Authors:
Colin Leong,
Joshua Nemecek,
Jacob Mansdorfer,
Anna Filighera,
Abraham Owodunni,
Daniel Whitenack
Abstract:
We present Bloom Library, a linguistically diverse set of multimodal and multilingual datasets for language modeling, image captioning, visual storytelling, and speech synthesis/recognition. These datasets represent either the most, or among the most, multilingual datasets for each of the included downstream tasks. In total, the initial release of the Bloom Library datasets covers 363 languages ac…
▽ More
We present Bloom Library, a linguistically diverse set of multimodal and multilingual datasets for language modeling, image captioning, visual storytelling, and speech synthesis/recognition. These datasets represent either the most, or among the most, multilingual datasets for each of the included downstream tasks. In total, the initial release of the Bloom Library datasets covers 363 languages across 32 language families. We train downstream task models for various languages represented in the data, showing the viability of the data for future work in low-resource, multimodal NLP and establishing the first known baselines for these downstream tasks in certain languages (e.g., Bisu [bzi], with an estimated population of 700 users). Some of these first-of-their-kind baselines are comparable to state-of-the-art performance for higher-resourced languages. The Bloom Library datasets are released under Creative Commons licenses on the Hugging Face datasets hub to catalyze more linguistically diverse research in the included downstream tasks.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition
Authors:
Mei Chee Leong,
Haosong Zhang,
Hui Li Tan,
Liyuan Li,
Joo Hwee Lim
Abstract:
Fine-grained action recognition is a challenging task in computer vision. As fine-grained datasets have small inter-class variations in spatial and temporal space, fine-grained action recognition model requires good temporal reasoning and discrimination of attribute action semantics. Leveraging on CNN's ability in capturing high level spatial-temporal feature representations and Transformer's mode…
▽ More
Fine-grained action recognition is a challenging task in computer vision. As fine-grained datasets have small inter-class variations in spatial and temporal space, fine-grained action recognition model requires good temporal reasoning and discrimination of attribute action semantics. Leveraging on CNN's ability in capturing high level spatial-temporal feature representations and Transformer's modeling efficiency in capturing latent semantics and global dependencies, we investigate two frameworks that combine CNN vision backbone and Transformer Encoder to enhance fine-grained action recognition: 1) a vision-based encoder to learn latent temporal semantics, and 2) a multi-modal video-text cross encoder to exploit additional text input and learn cross association between visual and text semantics. Our experimental results show that both our Transformer encoder frameworks effectively learn latent temporal semantics and cross-modality association, with improved recognition performance over CNN vision model. We achieve new state-of-the-art performance on the FineGym benchmark dataset for both proposed architectures.
△ Less
Submitted 3 August, 2022;
originally announced August 2022.
-
BibleTTS: a large, high-fidelity, multilingual, and uniquely African speech corpus
Authors:
Josh Meyer,
David Ifeoluwa Adelani,
Edresson Casanova,
Alp Öktem,
Daniel Whitenack Julian Weber,
Salomon Kabongo,
Elizabeth Salesky,
Iroro Orife,
Colin Leong,
Perez Ogayo,
Chris Emezue,
Jonathan Mukiibi,
Salomey Osei,
Apelete Agbolo,
Victor Akinode,
Bernard Opoku,
Samuel Olanrewaju,
Jesujoba Alabi,
Shamsuddeen Muhammad
Abstract:
BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speech models. The ten languages represented are: Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, and Yoruba.…
▽ More
BibleTTS is a large, high-quality, open speech dataset for ten languages spoken in Sub-Saharan Africa. The corpus contains up to 86 hours of aligned, studio quality 48kHz single speaker recordings per language, enabling the development of high-quality text-to-speech models. The ten languages represented are: Akuapem Twi, Asante Twi, Chichewa, Ewe, Hausa, Kikuyu, Lingala, Luganda, Luo, and Yoruba. This corpus is a derivative work of Bible recordings made and released by the Open.Bible project from Biblica. We have aligned, cleaned, and filtered the original recordings, and additionally hand-checked a subset of the alignments for each language. We present results for text-to-speech models with Coqui TTS. The data is released under a commercial-friendly CC-BY-SA license.
△ Less
Submitted 7 July, 2022;
originally announced July 2022.
-
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
Authors:
David Ifeoluwa Adelani,
Jesujoba Oluwadara Alabi,
Angela Fan,
Julia Kreutzer,
Xiaoyu Shen,
Machel Reid,
Dana Ruiter,
Dietrich Klakow,
Peter Nabende,
Ernie Chang,
Tajuddeen Gwadabe,
Freshia Sackey,
Bonaventure F. P. Dossou,
Chris Chinenye Emezue,
Colin Leong,
Michael Beukman,
Shamsuddeen Hassan Muhammad,
Guyo Dub Jarso,
Oreen Yousuf,
Andre Niyongabo Rubungo,
Gilles Hacheme,
Eric Peter Wairagala,
Muhammad Umair Nasir,
Benjamin Ayoade Ajibade,
Tunde Oluwaseyi Ajayi
, et al. (20 additional authors not shown)
Abstract:
Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models…
▽ More
Recent advances in the pre-training of language models leverage large-scale datasets to create multilingual models. However, low-resource languages are mostly left out in these datasets. This is primarily because many widely spoken languages are not well represented on the web and therefore excluded from the large-scale crawls used to create datasets. Furthermore, downstream users of these models are restricted to the selection of languages originally chosen for pre-training. This work investigates how to optimally leverage existing pre-trained models to create low-resource translation systems for 16 African languages. We focus on two questions: 1) How can pre-trained models be used for languages not included in the initial pre-training? and 2) How can the resulting translation models effectively transfer to new domains? To answer these questions, we create a new African news corpus covering 16 languages, of which eight languages are not part of any existing evaluation dataset. We demonstrate that the most effective strategy for transferring both to additional languages and to additional domains is to fine-tune large pre-trained models on small quantities of high-quality translation data.
△ Less
Submitted 22 August, 2022; v1 submitted 4 May, 2022;
originally announced May 2022.
-
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Authors:
Angelina McMillan-Major,
Zaid Alyafeai,
Stella Biderman,
Kimbo Chen,
Francesco De Toni,
Gérard Dupont,
Hady Elsahar,
Chris Emezue,
Alham Fikri Aji,
Suzana Ilić,
Nurulaqilla Khamis,
Colin Leong,
Maraim Masoud,
Aitor Soroa,
Pedro Ortiz Suarez,
Zeerak Talat,
Daniel van Strien,
Yacine Jernite
Abstract:
In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficie…
▽ More
In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficient documentation and tools for analysis. Mindful of these pitfalls, we present our methodology for a documentation-first, human-centered data collection project as part of the BigScience initiative. We identified a geographically diverse set of target language groups (Arabic, Basque, Chinese, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. To structure this effort, we developed our online catalogue as a supporting tool for gathering metadata through organized public hackathons. We present our development process; analyses of the resulting resource metadata, including distributions over languages, regions, and resource types; and our lessons learned in this endeavor.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Joint Learning On The Hierarchy Representation for Fine-Grained Human Action Recognition
Authors:
Mei Chee Leong,
Hui Li Tan,
Haosong Zhang,
Liyuan Li,
Feng Lin,
Joo Hwee Lim
Abstract:
Fine-grained human action recognition is a core research topic in computer vision. Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym hierarchy representation to achieve effective joint learning and prediction for fine-grained human action recogni…
▽ More
Fine-grained human action recognition is a core research topic in computer vision. Inspired by the recently proposed hierarchy representation of fine-grained actions in FineGym and SlowFast network for action recognition, we propose a novel multi-task network which exploits the FineGym hierarchy representation to achieve effective joint learning and prediction for fine-grained human action recognition. The multi-task network consists of three pathways of SlowOnly networks with gradually increased frame rates for events, sets and elements of fine-grained actions, followed by our proposed integration layers for joint learning and prediction. It is a two-stage approach, where it first learns deep feature representation at each hierarchical level, and is followed by feature encoding and fusion for multi-task learning. Our empirical results on the FineGym dataset achieve a new state-of-the-art performance, with 91.80% Top-1 accuracy and 88.46% mean accuracy for element actions, which are 3.40% and 7.26% higher than the previous best results.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Authors:
Julia Kreutzer,
Isaac Caswell,
Lisa Wang,
Ahsan Wahab,
Daan van Esch,
Nasanbayar Ulzii-Orshikh,
Allahsera Tapo,
Nishant Subramani,
Artem Sokolov,
Claytone Sikasote,
Monang Setyawan,
Supheakmungkol Sarin,
Sokhar Samb,
Benoît Sagot,
Clara Rivera,
Annette Rios,
Isabel Papadimitriou,
Salomey Osei,
Pedro Ortiz Suarez,
Iroro Orife,
Kelechi Ogueji,
Andre Niyongabo Rubungo,
Toan Q. Nguyen,
Mathias Müller,
André Müller
, et al. (27 additional authors not shown)
Abstract:
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system…
▽ More
With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases.
△ Less
Submitted 21 February, 2022; v1 submitted 22 March, 2021;
originally announced March 2021.
-
To Trust, or Not to Trust? A Study of Human Bias in Automated Video Interview Assessments
Authors:
Chee Wee Leong,
Katrina Roohr,
Vikram Ramanarayanan,
Michelle P. Martin-Raugh,
Harrison Kell,
Rutuja Ubale,
Yao Qian,
Zydrune Mladineo,
Laura McCulla
Abstract:
Supervised systems require human labels for training. But, are humans themselves always impartial during the annotation process? We examine this question in the context of automated assessment of human behavioral tasks. Specifically, we investigate whether human ratings themselves can be trusted at their face value when scoring video-based structured interviews, and whether such ratings can impact…
▽ More
Supervised systems require human labels for training. But, are humans themselves always impartial during the annotation process? We examine this question in the context of automated assessment of human behavioral tasks. Specifically, we investigate whether human ratings themselves can be trusted at their face value when scoring video-based structured interviews, and whether such ratings can impact machine learning models that use them as training data. We present preliminary empirical evidence that indicates there might be biases in such annotations, most of which are visual in nature.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Incorporating Epistemic Uncertainty into the Safety Assurance of Socio-Technical Systems
Authors:
Chris Leong,
Tim Kelly,
Rob Alexander
Abstract:
In system development, epistemic uncertainty is an ever-present possibility when reasoning about the causal factors during hazard analysis. Such uncertainty is common when complicated systems interact with one another, and it is dangerous because it impairs hazard analysis and thus increases the chance of overlooking unsafe situations. Uncertainty around causation thus needs to be managed well. Un…
▽ More
In system development, epistemic uncertainty is an ever-present possibility when reasoning about the causal factors during hazard analysis. Such uncertainty is common when complicated systems interact with one another, and it is dangerous because it impairs hazard analysis and thus increases the chance of overlooking unsafe situations. Uncertainty around causation thus needs to be managed well. Unfortunately, existing hazard analysis techniques tend to ignore unknown uncertainties, and system stakeholders rarely track known uncertainties well through the system lifecycle. In this paper, we outline an approach to managing epistemic uncertainty in existing hazard analysis techniques by focusing on known and unknown uncertainty. We have created a reference populated with a wide range of safety-critical causal relationships to recognise unknown uncertainty, and we have developed a model to systematically capture and track known uncertainty around such factors. We have also defined a process for using the reference and model to assess possible causal factors that are suspected during hazard analysis. To assess the applicability of our approach, we have analysed the widely-used MoDAF architectural model and determined that there is potential for our approach to identify additional causal factors that are not apparent from individual MoDAF views. We have also reviewed an existing safety assessment example (the ARP4761 Aircraft System analysis) and determined that our approach could indeed be incorporated into that process. We have also integrated our approach into the STPA hazard analysis technique to demonstrate its feasibility to incorporate into existing techniques. It is therefore plausible that our approach can increase safety assurance provided by hazard analysis in the face of epistemic uncertainty.
△ Less
Submitted 9 October, 2017;
originally announced October 2017.
-
Predicting proximity with ambient mobile sensors for non-invasive health diagnostics
Authors:
Sylvester Olubolu Orimaye,
Foo Chuan Leong,
Chen Hui Lee,
Eddy Cheng Han Ng
Abstract:
Modern smart phones are becoming helpful in the areas of Internet-Of-Things (IoT) and ambient health intelligence. By learning data from several mobile sensors, we detect nearness of the human body to a mobile device in a three-dimensional space with no physical contact with the device for non-invasive health diagnostics. We show that the human body generates wave patterns that interact with other…
▽ More
Modern smart phones are becoming helpful in the areas of Internet-Of-Things (IoT) and ambient health intelligence. By learning data from several mobile sensors, we detect nearness of the human body to a mobile device in a three-dimensional space with no physical contact with the device for non-invasive health diagnostics. We show that the human body generates wave patterns that interact with other naturally occurring ambient signals that could be measured by mobile sensors, such as, temperature, humidity, magnetic field, acceleration, gravity, and light. This interaction consequentially alters the patterns of the naturally occurring signals, and thus, exhibits characteristics that could be learned to predict the nearness of the human body to a mobile device, hence provide diagnostic information for medical practitioners. Our prediction technique achieved 88.75% accuracy and 88.3% specificity.
△ Less
Submitted 9 December, 2015;
originally announced December 2015.