-
Rethinking Response Evaluation from Interlocutor's Eye for Open-Domain Dialogue Systems
Authors:
Yuma Tsuta,
Naoki Yoshinaga,
Shoetsu Sato,
Masashi Toyoda
Abstract:
Open-domain dialogue systems have started to engage in continuous conversations with humans. Those dialogue systems are required to be adjusted to the human interlocutor and evaluated in terms of their perspective. However, it is questionable whether the current automatic evaluation methods can approximate the interlocutor's judgments. In this study, we analyzed and examined what features are need…
▽ More
Open-domain dialogue systems have started to engage in continuous conversations with humans. Those dialogue systems are required to be adjusted to the human interlocutor and evaluated in terms of their perspective. However, it is questionable whether the current automatic evaluation methods can approximate the interlocutor's judgments. In this study, we analyzed and examined what features are needed in an automatic response evaluator from the interlocutor's perspective. The first experiment on the Hazumi dataset revealed that interlocutor awareness plays a critical role in making automatic response evaluation correlate with the interlocutor's judgments. The second experiment using massive conversations on X (formerly Twitter) confirmed that dialogue continuity prediction can train an interlocutor-aware response evaluator without human feedback while revealing the difficulty in evaluating generated responses compared to human responses.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
PerPLM: Personalized Fine-tuning of Pretrained Language Models via Writer-specific Intermediate Learning and Prompts
Authors:
Daisuke Oba,
Naoki Yoshinaga,
Masashi Toyoda
Abstract:
The meanings of words and phrases depend not only on where they are used (contexts) but also on who use them (writers). Pretrained language models (PLMs) are powerful tools for capturing context, but they are typically pretrained and fine-tuned for universal use across different writers. This study aims to improve the accuracy of text understanding tasks by personalizing the fine-tuning of PLMs fo…
▽ More
The meanings of words and phrases depend not only on where they are used (contexts) but also on who use them (writers). Pretrained language models (PLMs) are powerful tools for capturing context, but they are typically pretrained and fine-tuned for universal use across different writers. This study aims to improve the accuracy of text understanding tasks by personalizing the fine-tuning of PLMs for specific writers. We focus on a general setting where only the plain text from target writers are available for personalization. To avoid the cost of fine-tuning and storing multiple copies of PLMs for different users, we exhaustively explore using writer-specific prompts to personalize a unified PLM. Since the design and evaluation of these prompts is an underdeveloped area, we introduce and compare different types of prompts that are possible in our setting. To maximize the potential of prompt-based personalized fine-tuning, we propose a personalized intermediate learning based on masked language modeling to extract task-independent traits of writers' text. Our experiments, using multiple tasks, datasets, and PLMs, reveal the nature of different prompts and the effectiveness of our intermediate learning approach.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Early Discovery of Disappearing Entities in Microblogs
Authors:
Satoshi Akasaki,
Naoki Yoshinaga,
Masashi Toyoda
Abstract:
We make decisions by reacting to changes in the real world, in particular, the emergence and disappearance of impermanent entities such as events, restaurants, and services. Because we want to avoid missing out on opportunities or making fruitless actions after they have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappea…
▽ More
We make decisions by reacting to changes in the real world, in particular, the emergence and disappearance of impermanent entities such as events, restaurants, and services. Because we want to avoid missing out on opportunities or making fruitless actions after they have disappeared, it is important to know when entities disappear as early as possible. We thus tackle the task of detecting disappearing entities from microblogs, whose posts mention various entities, in a timely manner. The major challenge is detecting uncertain contexts of disappearing entities from noisy microblog posts. To collect these disappearing contexts, we design time-sensitive distant supervision, which utilizes entities from the knowledge base and time-series posts, for this task to build large-scale Twitter datasets\footnote{We will release the datasets (tweet IDs) used in the experiments to promote reproducibility.} for English and Japanese. To ensure robust detection in noisy environments, we refine pretrained word embeddings of the detection model on microblog streams of the target day. Experimental results on the Twitter datasets confirmed the effectiveness of the collected labeled data and refined word embeddings; more than 70\% of the detected disappearing entities in Wikipedia are discovered earlier than the update on Wikipedia, and the average lead-time is over one month.
△ Less
Submitted 13 October, 2022;
originally announced October 2022.
-
Evolution of the public opinion on COVID-19 vaccination in Japan
Authors:
Yuri Nakayama,
Yuka Takedomi,
Towa Suda,
Takeaki Uno,
Takako Hashimoto,
Masashi Toyoda,
Naoki Yoshinaga,
Masaru Kitsuregawa,
Luis E. C. Rocha,
Ryota Kobayashi
Abstract:
Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than…
▽ More
Vaccines are promising tools to control the spread of COVID-19. An effective vaccination campaign requires government policies and community engagement, sharing experiences for social support, and voicing concerns to vaccine safety and efficiency. The increasing use of online social platforms allows us to trace large-scale communication and infer public opinion in real-time. We collected more than 100 million vaccine-related tweets posted by 8 million users and used the Latent Dirichlet Allocation model to perform automated topic modeling of tweet texts during the vaccination campaign in Japan. We identified 15 topics grouped into 4 themes on Personal issue, Breaking news, Politics, and Conspiracy and humour. The evolution of the popularity of themes revealed a shift in public opinion, initially sharing the attention over personal issues (individual aspect), collecting information from the news (knowledge acquisition), and government criticisms, towards personal experiences once confidence in the vaccination campaign was established. An interrupted time series regression analysis showed that the Tokyo Olympic Games affected public opinion more than other critical events but not the course of the vaccination. Public opinion on politics was significantly affected by various events, positively shifting the attention in the early stages of the vaccination campaign and negatively later. Tweets about personal issues were mostly retweeted when the vaccination reached the younger population. The associations between the vaccination campaign stages and tweet themes suggest that the public engagement in the social platform contributed to speedup vaccine uptake by reducing anxiety via social learning and support.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Learning Bidirectional Translation between Descriptions and Actions with Small Paired Data
Authors:
Minori Toyoda,
Kanata Suzuki,
Yoshihiko Hayashi,
Tetsuya Ogata
Abstract:
This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive t…
▽ More
This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive to construct and difficult to collect. To address this issue, this study proposes a two-stage training method for bidirectional translation. In the proposed method, we train recurrent autoencoders (RAEs) for descriptions and actions with a large amount of non-paired data. Then, we finetune the entire model to bind their intermediate representations using small paired data. Because the data used for pre-training do not require pairing, behavior-only data or a large language corpus can be used. We experimentally evaluated our method using a paired dataset consisting of motion-captured actions and descriptions. The results showed that our method performed well, even when the amount of paired data to train was small. The visualization of the intermediate representations of each RAE showed that similar actions were encoded in a clustered position and the corresponding feature vectors were well aligned.
△ Less
Submitted 24 September, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Embodying Pre-Trained Word Embeddings Through Robot Actions
Authors:
Minori Toyoda,
Kanata Suzuki,
Hiroki Mori,
Yoshihiko Hayashi,
Tetsuya Ogata
Abstract:
We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in th…
▽ More
We propose a promising neural network model with which to acquire a grounded representation of robot actions and the linguistic descriptions thereof. Properly responding to various linguistic expressions, including polysemous words, is an important ability for robots that interact with people via linguistic dialogue. Previous studies have shown that robots can use words that are not included in the action-description paired datasets by using pre-trained word embeddings. However, the word embeddings trained under the distributional hypothesis are not grounded, as they are derived purely from a text corpus. In this letter, we transform the pre-trained word embeddings to embodied ones by using the robot's sensory-motor experiences. We extend a bidirectional translation model for actions and descriptions by incorporating non-linear layers that retrofit the word embeddings. By training the retrofit layer and the bidirectional translation model alternately, our proposed model is able to transform the pre-trained word embeddings to adapt to a paired action-description dataset. Our results demonstrate that the embeddings of synonyms form a semantic cluster by reflecting the experiences (actions and environments) of a robot. These embeddings allow the robot to properly generate actions from unseen words that are not paired with actions in a dataset.
△ Less
Submitted 17 April, 2021;
originally announced April 2021.
-
A System for Worldwide COVID-19 Information Aggregation
Authors:
Akiko Aizawa,
Frederic Bergeron,
Junjie Chen,
Fei Cheng,
Katsuhiko Hayashi,
Kentaro Inui,
Hiroyoshi Ito,
Daisuke Kawahara,
Masaru Kitsuregawa,
Hirokazu Kiyomaru,
Masaki Kobayashi,
Takashi Kodama,
Sadao Kurohashi,
Qianying Liu,
Masaki Matsubara,
Yusuke Miyao,
Atsuyuki Morishima,
Yugo Murawaki,
Kazumasa Omura,
Haiyue Song,
Eiichiro Sumita,
Shinji Suzuki,
Ribeka Tanaka,
Yu Tanaka,
Masashi Toyoda
, et al. (4 additional authors not shown)
Abstract:
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-…
▽ More
The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education. Meanwhile, the COVID-19 condition is very different among the countries (e.g., policies and development of the epidemic), and thus citizens would be interested in news in foreign countries. We build a system for worldwide COVID-19 information aggregation containing reliable articles from 10 regions in 7 languages sorted by topics. Our reliable COVID-19 related website dataset collected through crowdsourcing ensures the quality of the articles. A neural machine translation module translates articles in other languages into Japanese and English. A BERT-based topic-classifier trained on our article-topic pair dataset helps users find their interested information efficiently by putting articles into different categories.
△ Less
Submitted 11 October, 2020; v1 submitted 27 July, 2020;
originally announced August 2020.
-
Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation
Authors:
Shoetsu Sato,
** Sakuma,
Naoki Yoshinaga,
Masashi Toyoda,
Masaru Kitsuregawa
Abstract:
Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners, therefore, employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter…
▽ More
Neural network methods exhibit strong performance only in a few resource-rich domains. Practitioners, therefore, employ domain adaptation from resource-rich domains that are, in most cases, distant from the target domain. Domain adaptation between distant domains (e.g., movie subtitles and research papers), however, cannot be performed effectively due to mismatches in vocabulary; it will encounter many domain-specific words (e.g., "angstrom") and words whose meanings shift across domains(e.g., "conductor"). In this study, aiming to solve these vocabulary mismatches in domain adaptation for neural machine translation (NMT), we propose vocabulary adaptation, a simple method for effective fine-tuning that adapts embedding layers in a given pre-trained NMT model to the target domain. Prior to fine-tuning, our method replaces the embedding layers of the NMT model by projecting general word embeddings induced from monolingual data in a target domain onto a source-domain embedding space. Experimental results indicate that our method improves the performance of conventional fine-tuning by 3.86 and 3.28 BLEU points in En-Ja and De-En translation, respectively.
△ Less
Submitted 31 October, 2020; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Anomaly Detection with Inexact Labels
Authors:
Tomoharu Iwata,
Machiko Toyoda,
Shotaro Tora,
Naonori Ueda
Abstract:
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which…
▽ More
We propose a supervised anomaly detection method for data with inexact anomaly labels, where each label, which is assigned to a set of instances, indicates that at least one instance in the set is anomalous. Although many anomaly detection methods have been proposed, they cannot handle inexact anomaly labels. To measure the performance with inexact anomaly labels, we define the inexact AUC, which is our extension of the area under the ROC curve (AUC) for inexact labels. The proposed method trains an anomaly score function so that the smooth approximation of the inexact AUC increases while anomaly scores for non-anomalous instances become low. We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e.g., autoencoders. The proposed method performs well even when only a small number of inexact labels are available by incorporating an unsupervised anomaly detection mechanism with inexact AUC maximization. Using various datasets, we experimentally demonstrate that our proposed method improves the anomaly detection performance with inexact anomaly labels, and outperforms existing unsupervised and supervised anomaly detection and multiple instance learning methods.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Early Discovery of Emerging Entities in Microblogs
Authors:
Satoshi Akasaki,
Naoki Yoshinaga,
Masashi Toyoda
Abstract:
Kee** up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guara…
▽ More
Kee** up to date on emerging entities that appear every day is indispensable for various applications, such as social-trend analysis and marketing research. Previous studies have attempted to detect unseen entities that are not registered in a particular knowledge base as emerging entities and consequently find non-emerging entities since the absence of entities in knowledge bases does not guarantee their emergence. We therefore introduce a novel task of discovering truly emerging entities when they have just been introduced to the public through microblogs and propose an effective method based on time-sensitive distant supervision, which exploits distinctive early-stage contexts of emerging entities. Experimental results with a large-scale Twitter archive show that the proposed method achieves 83.2% precision of the top 500 discovered emerging entities, which outperforms baselines based on unseen entity recognition with burst detection. Besides notable emerging entities, our method can discover massive long-tail and homographic emerging entities. An evaluation of relative recall shows that the method detects 80.4% emerging entities newly registered in Wikipedia; 92.4% of them are discovered earlier than their registration in Wikipedia, and the average lead-time is more than one year (571 days).
△ Less
Submitted 8 July, 2019;
originally announced July 2019.
-
Learning to Describe Phrases with Local and Global Contexts
Authors:
Shonosuke Ishiwatari,
Hiroaki Hayashi,
Naoki Yoshinaga,
Graham Neubig,
Shoetsu Sato,
Masashi Toyoda,
Masaru Kitsuregawa
Abstract:
When reading a text, it is common to become stuck on unfamiliar words and phrases, such as polysemous words with novel senses, rarely used idioms, internet slang, or emerging entities. If we humans cannot figure out the meaning of those expressions from the immediate local context, we consult dictionaries for definitions or search documents or the web to find other global context to help in interp…
▽ More
When reading a text, it is common to become stuck on unfamiliar words and phrases, such as polysemous words with novel senses, rarely used idioms, internet slang, or emerging entities. If we humans cannot figure out the meaning of those expressions from the immediate local context, we consult dictionaries for definitions or search documents or the web to find other global context to help in interpretation. Can machines help us do this work? Which type of context is more important for machines to solve the problem? To answer these questions, we undertake a task of describing a given phrase in natural language based on its local and global contexts. To solve this task, we propose a neural description model that consists of two context encoders and a description decoder. In contrast to the existing methods for non-standard English explanation [Ni+ 2017] and definition generation [Noraset+ 2017; Gadetsky+ 2018], our model appropriately takes important clues from both local and global contexts. Experimental results on three existing datasets (including WordNet, Oxford and Urban Dictionaries) and a dataset newly created from Wikipedia demonstrate the effectiveness of our method over previous work.
△ Less
Submitted 10 April, 2019; v1 submitted 1 November, 2018;
originally announced November 2018.
-
Existence of outsiders as a characteristic of online communication networks
Authors:
Taro Takaguchi,
Takanori Maehara,
Masashi Toyoda,
Ken-ichi Kawarabayashi
Abstract:
Online social networking services (SNSs) involve communication activities between large number of individuals over the public Internet and their crawled records are often regarded as proxies of real (i.e., offline) interaction structure. However, structure observed in these records might differ from real counterparts because individuals may behave differently online and non-human accounts may even…
▽ More
Online social networking services (SNSs) involve communication activities between large number of individuals over the public Internet and their crawled records are often regarded as proxies of real (i.e., offline) interaction structure. However, structure observed in these records might differ from real counterparts because individuals may behave differently online and non-human accounts may even participate. To understand the difference between online and real social networks, we investigate an empirical communication network between users on Twitter, which is perhaps one of the largest SNSs. We define a network of user pairs that send reciprocal messages. Based on the mixing pattern observed in this network, we argue that this network differs from conventional understandings in the sense that there is a small number of distinctive users that we call outsiders. Outsiders do not belong to any user groups but they are connected with different groups, while not being well connected with each other. We identify outsiders by maximizing the degree assortativity coefficient of the network via node removal, thereby confirming that local structural properties of outsiders identified are consistent with our hypothesis. Our findings suggest that the existence of outsiders should be considered when using Twitter communication networks for social network analysis.
△ Less
Submitted 25 November, 2014;
originally announced November 2014.