Search | arXiv e-print repository

doi 10.1145/3640794.3665549

Coaching Copilot: Blended Form of an LLM-Powered Chatbot and a Human Coach to Effectively Support Self-Reflection for Leadership Growth

Authors: Riku Arakawa, Hiromu Yakura

Abstract: Chatbots' role in fostering self-reflection is now widely recognized, especially in inducing users' behavior change. While the benefits of 24/7 availability, scalability, and consistent responses have been demonstrated in contexts such as healthcare and tutoring to help one form a new habit, their utilization in coaching necessitating deeper introspective dialogue to induce leadership growth remai… ▽ More Chatbots' role in fostering self-reflection is now widely recognized, especially in inducing users' behavior change. While the benefits of 24/7 availability, scalability, and consistent responses have been demonstrated in contexts such as healthcare and tutoring to help one form a new habit, their utilization in coaching necessitating deeper introspective dialogue to induce leadership growth remains unexplored. This paper explores the potential of such a chatbot powered by recent Large Language Models (LLMs) in collaboration with professional coaches in the field of executive coaching. Through a design workshop with them and two weeks of user study involving ten coach-client pairs, we explored the feasibility and nuances of integrating chatbots to complement human coaches. Our findings highlight the benefits of chatbots' ubiquity and reasoning capabilities enabled by LLMs while identifying their limitations and design necessities for effective collaboration between human coaches and chatbots. By doing so, this work contributes to the foundation for augmenting one's self-reflective process with prevalent conversational agents through the human-in-the-loop approach. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by the International ACM Conversational User Interfaces Conference (CUI '24)

arXiv:2402.11145 [pdf, other]

Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos

Authors: Riku Arakawa, Kiyosu Maeda, Hiromu Yakura

Abstract: Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To solve it, we devel… ▽ More Multimodal scene search of conversations is essential for unlocking valuable insights into social dynamics and enhancing our communication. While experts in conversational analysis have their own knowledge and skills to find key scenes, a lack of comprehensive, user-friendly tools that streamline the processing of diverse multimodal queries impedes efficiency and objectivity. To solve it, we developed Providence, a visual-programming-based tool based on design considerations derived from a formative study with experts. It enables experts to combine various machine learning algorithms to capture human behavioral cues without writing code. Our study showed its preferable usability and satisfactory output with less cognitive load imposed in accomplishing scene search tasks of conversations, verifying the importance of its customizability and transparency. Furthermore, through the in-the-wild trial, we confirmed the objectivity and reusability of the tool transform experts' workflow, suggesting the advantage of expert-AI teaming in a highly human-contextual domain. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2309.10744 [pdf, other]

Evaluating large language models' ability to understand metaphor and sarcasm using a screening test for Asperger syndrome

Authors: Hiromu Yakura

Abstract: Metaphors and sarcasm are precious fruits of our highly-evolved social communication skills. However, children with Asperger syndrome are known to have difficulties in comprehending sarcasm, even if they possess a certain level of verbal IQ sufficient for understanding metaphors. Given that, a screening test that scores the ability to understand metaphor and sarcasm has been used to differentiate… ▽ More Metaphors and sarcasm are precious fruits of our highly-evolved social communication skills. However, children with Asperger syndrome are known to have difficulties in comprehending sarcasm, even if they possess a certain level of verbal IQ sufficient for understanding metaphors. Given that, a screening test that scores the ability to understand metaphor and sarcasm has been used to differentiate Asperger syndrome from other symptoms exhibiting akin external behaviors (e.g., attention-deficit/hyperactivity disorder). This study uses the standardized test to examine the capability of recent large language models (LLMs) in understanding human nuanced communication. The results divulged that, whereas their ability to comprehend metaphors has been improved with the increase of the number of model parameters, the improvement in sarcasm understanding was not observed. This implies that an alternative approach is imperative to imbue LLMs with the capacity to grasp sarcasm, which has been associated with the amygdala, a pivotal cerebral region for emotional learning, in the case of humans. △ Less

Submitted 10 January, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

arXiv:2307.13005 [pdf, other]

IteraTTA: An interface for exploring both text prompts and audio priors in generating music with text-to-audio models

Authors: Hiromu Yakura, Masataka Goto

Abstract: Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users ca… ▽ More Recent text-to-audio generation techniques have the potential to allow novice users to freely generate music audio. Even if they do not have musical knowledge, such as about chord progressions and instruments, users can try various text prompts to generate audio. However, compared to the image domain, gaining a clear understanding of the space of possible music audios is difficult because users cannot listen to the variations of the generated audios simultaneously. We therefore facilitate users in exploring not only text prompts but also audio priors that constrain the text-to-audio music generation process. This dual-sided exploration enables users to discern the impact of different text prompts and audio priors on the generation results through iterative comparison of them. Our developed interface, IteraTTA, is specifically designed to aid users in refining text prompts and selecting favorable audio priors from the generated audios. With this, users can progressively reach their loosely-specified goals while understanding and exploring the space of possible results. Our implementation and discussions highlight design considerations that are specifically required for text-to-audio models and how interaction techniques can contribute to their effectiveness. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: Accepted to the 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

arXiv:2302.05678 [pdf, other]

doi 10.1145/3544548.3581133

CatAlyst: Domain-Extensible Intervention for Preventing Task Procrastination Using Large Generative Models

Authors: Riku Arakawa, Hiromu Yakura, Masataka Goto

Abstract: CatAlyst uses generative models to help workers' progress by influencing their task engagement instead of directly contributing to their task outputs. It prompts distracted workers to resume their tasks by generating a continuation of their work and presenting it as an intervention that is more context-aware than conventional (predetermined) feedback. The prompt can function by drawing their inter… ▽ More CatAlyst uses generative models to help workers' progress by influencing their task engagement instead of directly contributing to their task outputs. It prompts distracted workers to resume their tasks by generating a continuation of their work and presenting it as an intervention that is more context-aware than conventional (predetermined) feedback. The prompt can function by drawing their interest and lowering the hurdle for resumption even when the generated continuation is insufficient to substitute their work, while recent human-AI collaboration research aiming at work substitution depends on a stable high accuracy. This frees CatAlyst from domain-specific model-tuning and makes it applicable to various tasks. Our studies involving writing and slide-editing tasks demonstrated CatAlyst's effectiveness in hel** workers swiftly resume tasks with a lowered cognitive load. The results suggest a new form of human-AI collaboration where large generative models publicly available but imperfect for each individual domain can contribute to workers' digital well-being. △ Less

Submitted 9 March, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted by ACM CHI Conference on Human Factors in Computing Systems (CHI '23)

arXiv:2211.03285 [pdf, other]

doi 10.1145/3564625.3564659

SLOPT: Bandit Optimization Framework for Mutation-Based Fuzzing

Authors: Yuki Koike, Hiroyuki Katsura, Hiromu Yakura, Yuma Kurogome

Abstract: Mutation-based fuzzing has become one of the most common vulnerability discovery solutions over the last decade. Fuzzing can be optimized when targeting specific programs, and given that, some studies have employed online optimization methods to do it automatically, i.e., tuning fuzzers for any given program in a program-agnostic manner. However, previous studies have neither fully explored mutati… ▽ More Mutation-based fuzzing has become one of the most common vulnerability discovery solutions over the last decade. Fuzzing can be optimized when targeting specific programs, and given that, some studies have employed online optimization methods to do it automatically, i.e., tuning fuzzers for any given program in a program-agnostic manner. However, previous studies have neither fully explored mutation schemes suitable for online optimization methods, nor online optimization methods suitable for mutation schemes. In this study, we propose an optimization framework called SLOPT that encompasses both a bandit-friendly mutation scheme and mutation-scheme-friendly bandit algorithms. The advantage of SLOPT is that it can generally be incorporated into existing fuzzers, such as AFL and Honggfuzz. As a proof of concept, we implemented SLOPT-AFL++ by integrating SLOPT into AFL++ and showed that the program-agnostic optimization delivered by SLOPT enabled SLOPT-AFL++ to achieve higher code coverage than AFL++ in all of ten real-world FuzzBench programs. Moreover, we ran SLOPT-AFL++ against several real-world programs from OSS-Fuzz and successfully identified three previously unknown vulnerabilities, even though these programs have been fuzzed by AFL++ for a considerable number of CPU days on OSS-Fuzz. △ Less

Submitted 6 November, 2022; originally announced November 2022.

Comments: To appear in Proceedings of the 2022 Annual Computer Security Applications Conference (ACSAC '22)

arXiv:2206.10987 [pdf, other]

Human-AI communication for human-human communication: Applying interpretable unsupervised anomaly detection to executive coaching

Authors: Riku Arakawa, Hiromu Yakura

Abstract: In this paper, we discuss the potential of applying unsupervised anomaly detection in constructing AI-based interactive systems that deal with highly contextual situations, i.e., human-human communication, in collaboration with domain experts. We reached this approach of utilizing unsupervised anomaly detection through our experience of develo** a computational support tool for executive coachin… ▽ More In this paper, we discuss the potential of applying unsupervised anomaly detection in constructing AI-based interactive systems that deal with highly contextual situations, i.e., human-human communication, in collaboration with domain experts. We reached this approach of utilizing unsupervised anomaly detection through our experience of develo** a computational support tool for executive coaching, which taught us the importance of providing interpretable results so that expert coaches can take both the results and contexts into account. The key idea behind this approach is to leave room for expert coaches to unleash their open-ended interpretations, rather than simplifying the nature of social interactions to well-defined problems that are tractable by conventional supervised algorithms. In addition, we found that this approach can be extended to nurturing novice coaches; by prompting them to interpret the results from the system, it can provide the coaches with educational opportunities. Although the applicability of this approach should be validated in other domains, we believe that the idea of leveraging unsupervised anomaly detection to construct AI-based interactive systems would shed light on another direction of human-AI communication. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: For the Communication in Human-AI Interaction Workshop at the 31st International Joint Conference on Artificial Intelligence

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2204.08471 [pdf, other]

doi 10.1145/3544549.3573849

AI for human assessment: What do professional assessors need?

Authors: Riku Arakawa, Hiromu Yakura

Abstract: Recent organizations have started to adopt AI-based decision support tools to optimize human resource development practices, while facing various challenges of using AIs in highly contextual and sensitive domains. We present our case study that aims to help professional assessors make decisions in human assessment, in which they conduct interviews with assessees and evaluate their suitability for… ▽ More Recent organizations have started to adopt AI-based decision support tools to optimize human resource development practices, while facing various challenges of using AIs in highly contextual and sensitive domains. We present our case study that aims to help professional assessors make decisions in human assessment, in which they conduct interviews with assessees and evaluate their suitability for certain job roles. Our workshop with two industrial assessors elucidated troubles they face (i.e., maintaining stable and non-subjective observation of assessees' behaviors) and derived requirements of AI systems (i.e., extracting their nonverbal cues from interview videos in an interpretable manner). In response, we employed an unsupervised anomaly detection algorithm using multimodal behavioral features such as facial keypoints, body and head pose, and gaze. The algorithm extracts outlier scenes from the video based on behavioral features as well as informing which feature contributes to the outlierness. We first evaluated how the assessors would perceive the extracted cues and discovered that the algorithm is useful in suggesting scenes to which assessors would pay attention, thanks to its interpretability. Then, we developed an interface prototype incorporating the algorithm and had six assessors use it for their actual assessment. Their comments revealed the effectiveness of introducing unsupervised anomaly detection to enhance their feeling of confidence and objectivity of the assessment along with potential use scenarios of such AI-based systems in human assessment. Our approach, which builds on top of the idea of separating observation and interpretation in human-AI collaboration, will facilitate human decision making in highly contextual domains, such as human assessment, while kee** their trust in the system. △ Less

Submitted 5 December, 2022; v1 submitted 17 April, 2022; originally announced April 2022.

Comments: To appear in Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Case Study)

arXiv:2105.09207 [pdf, other]

Tool- and Domain-Agnostic Parameterization of Style Transfer Effects Leveraging Pretrained Perceptual Metrics

Authors: Hiromu Yakura, Yuki Koyama, Masataka Goto

Abstract: Current deep learning techniques for style transfer would not be optimal for design support since their "one-shot" transfer does not fit exploratory design processes. To overcome this gap, we propose parametric transcription, which transcribes an end-to-end style transfer effect into parameter values of specific transformations available in an existing content editing tool. With this approach, use… ▽ More Current deep learning techniques for style transfer would not be optimal for design support since their "one-shot" transfer does not fit exploratory design processes. To overcome this gap, we propose parametric transcription, which transcribes an end-to-end style transfer effect into parameter values of specific transformations available in an existing content editing tool. With this approach, users can imitate the style of a reference sample in the tool that they are familiar with and thus can easily continue further exploration by manipulating the parameters. To enable this, we introduce a framework that utilizes an existing pretrained model for style transfer to calculate a perceptual style distance to the reference sample and uses black-box optimization to find the parameters that minimize this distance. Our experiments with various third-party tools, such as Instagram and Blender, show that our framework can effectively leverage deep learning techniques for computational design support. △ Less

Submitted 19 May, 2021; originally announced May 2021.

Comments: To appear in Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021); Project page available at https://yumetaro.info/projects/parametric-transcription/

arXiv:2102.06422 [pdf, ps, other]

Reaction or Speculation: Building Computational Support for Users in Catching-Up Series Based on an Emerging Media Consumption Phenomenon

Authors: Riku Arakawa, Hiromu Yakura

Abstract: A growing number of people are using catch-up TV services rather than watching simultaneously with other audience members at the time of broadcast. However, computational support for such catching-up users has not been well explored. In particular, we are observing an emerging phenomenon in online media consumption experiences in which speculation plays a vital role. As the phenomenon of speculati… ▽ More A growing number of people are using catch-up TV services rather than watching simultaneously with other audience members at the time of broadcast. However, computational support for such catching-up users has not been well explored. In particular, we are observing an emerging phenomenon in online media consumption experiences in which speculation plays a vital role. As the phenomenon of speculation implicitly assumes simultaneity in media consumption, there is a gap for catching-up users, who cannot directly appreciate the consumption experiences. This conversely suggests that there is potential for computational support to enhance the consumption experiences of catching-up users. Accordingly, we conducted a series of studies to pave the way for develo** computational support for catching-up users. First, we conducted semi-structured interviews to understand how people are engaging with speculation during media consumption. As a result, we discovered the distinctive aspects of speculation-based consumption experiences in contrast to social viewing experiences sharing immediate reactions that have been discussed in previous studies. We then designed two prototypes for supporting catching-up users based on our quantitative analysis of Twitter data in regard to reaction- and speculation-based media consumption. Lastly, we evaluated the prototypes in a user experiment and, based on its results, discussed ways to empower catching-up users with computational supports in response to recent transformations in media consumption. △ Less

Submitted 12 February, 2021; originally announced February 2021.

Comments: To appear in Proceedings of the ACM on Human-Computer Interaction, Vol. 5, No. CSCW1 (151)

arXiv:2101.08621 [pdf, other]

doi 10.1145/3411764.3445339

Mindless Attractor: A False-Positive Resistant Intervention for Drawing Attention Using Auditory Perturbation

Authors: Riku Arakawa, Hiromu Yakura

Abstract: Explicitly alerting users is not always an optimal intervention, especially when they are not motivated to obey. For example, in video-based learning, learners who are distracted from the video would not follow an alert asking them to pay attention. Inspired by the concept of Mindless Computing, we propose a novel intervention approach, Mindless Attractor, that leverages the nature of human speech… ▽ More Explicitly alerting users is not always an optimal intervention, especially when they are not motivated to obey. For example, in video-based learning, learners who are distracted from the video would not follow an alert asking them to pay attention. Inspired by the concept of Mindless Computing, we propose a novel intervention approach, Mindless Attractor, that leverages the nature of human speech communication to help learners refocus their attention without relying on their motivation. Specifically, it perturbs the voice in the video to direct their attention without consuming their conscious awareness. Our experiments not only confirmed the validity of the proposed approach but also emphasized its advantages in combination with a machine learning-based sensing module. Namely, it would not frustrate users even though the intervention is activated by false-positive detection of their attentive state. Our intervention approach can be a reliable way to induce behavioral change in human-AI symbiosis. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: To appear in ACM CHI Conference on Human Factors in Computing Systems (CHI '21), May 8-13, 2021, Yokohama, Japan

arXiv:2101.07999 [pdf, other]

doi 10.1145/3411764.3445252

No More Handshaking: How have COVID-19 pushed the expansion of computer-mediated communication in Japanese idol culture?

Authors: Hiromu Yakura

Abstract: In Japanese idol culture, meet-and-greet events where fans were allowed to handshake with an idol member for several seconds were regarded as its essential component until the spread of COVID-19. Now, idol groups are struggling in the transition of such events to computer-mediated communication because these events had emphasized meeting face-to-face over communicating, as we can infer from their… ▽ More In Japanese idol culture, meet-and-greet events where fans were allowed to handshake with an idol member for several seconds were regarded as its essential component until the spread of COVID-19. Now, idol groups are struggling in the transition of such events to computer-mediated communication because these events had emphasized meeting face-to-face over communicating, as we can infer from their length of time. I anticipated that investigating this emerging transition would provide implications because their communication has a unique characteristic that is distinct from well-studied situations, such as workplace communication and intimate relationships. Therefore, I first conducted a quantitative survey to develop a precise understanding of the transition, and based on its results, had semi-structured interviews with idol fans about their perceptions of the transition. The survey revealed distinctive approaches, including one where fans gathered at a venue but were isolated from the idol member by an acrylic plate and talked via a video call. Then the interviews not only provided answers to why such an approach would be reasonable but also suggested the existence of a large gap between conventional offline events and emerging online events in their perceptions. Based on the results, I discussed how we can develop interaction techniques to support this transition and how we can apply it to other situations outside idol culture, such as computer-mediated performing arts. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: To appear in ACM CHI Conference on Human Factors in Computing Systems (CHI '21), May 8-13, 2021, Yokohama, Japan

arXiv:1911.08644 [pdf, other]

Generate (non-software) Bugs to Fool Classifiers

Authors: Hiromu Yakura, Youhei Akimoto, Jun Sakuma

Abstract: In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a sys… ▽ More In adversarial attacks intended to confound deep learning models, most studies have focused on limiting the magnitude of the modification so that humans do not notice the attack. On the other hand, during an attack against autonomous cars, for example, most drivers would not find it strange if a small insect image were placed on a stop sign, or they may overlook it. In this paper, we present a systematic approach to generate natural adversarial examples against classification models by employing such natural-appearing perturbations that imitate a certain object or signal. We first show the feasibility of this approach in an attack against an image classifier by employing generative adversarial networks that produce image patches that have the appearance of a natural object to fool the target model. We also introduce an algorithm to optimize placement of the perturbation in accordance with the input image, which makes the generation of adversarial examples fast and likely to succeed. Moreover, we experimentally show that the proposed approach can be extended to the audio domain, for example, to generate perturbations that sound like the chir** of birds to fool a speech classifier. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: Accepted by AAAI 2020

arXiv:1903.11485 [pdf, other]

doi 10.1145/3290605.3300802

REsCUE: A framework for REal-time feedback on behavioral CUEs using multimodal anomaly detection

Authors: Riku Arakawa, Hiromu Yakura

Abstract: Executive coaching has been drawing more and more attention for develo** corporate managers. While conversing with managers, coach practitioners are also required to understand internal states of coachees through objective observations. In this paper, we present REsCUE, an automated system to aid coach practitioners in detecting unconscious behaviors of their clients. Using an unsupervised anoma… ▽ More Executive coaching has been drawing more and more attention for develo** corporate managers. While conversing with managers, coach practitioners are also required to understand internal states of coachees through objective observations. In this paper, we present REsCUE, an automated system to aid coach practitioners in detecting unconscious behaviors of their clients. Using an unsupervised anomaly detection algorithm applied to multimodal behavior data such as the subject's posture and gaze, REsCUE notifies behavioral cues for coaches via intuitive and interpretive feedback in real-time. Our evaluation with actual coaching scenes confirms that REsCUE provides the informative cues to understand internal states of coachees. Since REsCUE is based on the unsupervised method and does not assume any prior knowledge, further applications beside executive coaching are conceivable using our framework. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: ACM CHI 2019

arXiv:1810.11793 [pdf, ps, other]

doi 10.24963/ijcai.2019/741

Robust Audio Adversarial Example for a Physical Attack

Authors: Hiromu Yakura, Jun Sakuma

Abstract: We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. Previous work assumes that generated adversarial examples are directly fed to the recognition model, and is not able to perform such a physical attack because of reverberation and noise from playback environments. In contrast, our method obtains robust adver… ▽ More We propose a method to generate audio adversarial examples that can attack a state-of-the-art speech recognition model in the physical world. Previous work assumes that generated adversarial examples are directly fed to the recognition model, and is not able to perform such a physical attack because of reverberation and noise from playback environments. In contrast, our method obtains robust adversarial examples by simulating transformations caused by playback or recording in the physical world and incorporating the transformations into the generation process. Evaluation and a listening experiment demonstrated that our adversarial examples are able to attack without being noticed by humans. This result suggests that audio adversarial examples generated by the proposed method may become a real threat. △ Less

Submitted 18 August, 2019; v1 submitted 28 October, 2018; originally announced October 2018.

Comments: Accepted to IJCAI 2019

Showing 1–16 of 16 results for author: Yakura, H