Skip to main content

Showing 1–50 of 69 results for author: Saito, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17722  [pdf, other

    cs.SD eess.AS

    Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

    Authors: Kentaro Seki, Shinnosuke Takamichi, Norihiro Takamune, Yuki Saito, Kanami Imamura, Hiroshi Saruwatari

    Abstract: This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the stereo listening experience inherent in human hearing. Our baseline approach addresses this gap by integrating blind source separation (BSS), voice conve… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2406.07280  [pdf, ps, other

    cs.SD eess.AS

    Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment

    Authors: Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. However, the naturalness of the converted speech is limited when the noise of the source speech is unseen during the training. To this end, our proposed… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 5 pages, accepted for INTERSPEECH 2024, audio samples: http://y-saito.sakura.ne.jp/sython/Corpus/SRC4VC/IS2024_CDT_supplementary/demo_cdt.html

  3. arXiv:2406.07254  [pdf, ps, other

    cs.SD eess.AS

    SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark

    Authors: Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC when low-quality speech recording is given as the input. To this end, we first asked 100 crowdworkers to record their voice samples using smartphones. T… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted for INTERSPEECH 2024, corpus project page: https://y-saito.sakura.ne.jp/sython/Corpus/SRC4VC/index.html

  4. arXiv:2405.14522  [pdf, other

    cs.LG cs.AI cs.CL cs.CV stat.ML

    Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

    Authors: Yuya Yoshikawa, Masanari Kimura, Ryotaro Shimizu, Yuki Saito

    Abstract: Techniques that explain the predictions of black-box machine learning models are crucial to make the models transparent, thereby increasing trust in AI systems. The input features to the models often have a nested structure that consists of high- and low-level features, and each high-level feature is decomposed into multiple low-level features. For such inputs, both high-level feature attributions… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2404.15691  [pdf, other

    cs.LG stat.ML

    Long-term Off-Policy Evaluation and Learning

    Authors: Yuta Saito, Himan Abdollahpouri, Jesse Anderton, Ben Carterette, Mounia Lalmas

    Abstract: Short- and long-term outcomes of an algorithm often differ, with damaging downstream effects. A known example is a click-bait algorithm, which may increase short-term clicks but damage long-term user engagement. A possible solution to estimate the long-term outcome is to run an online experiment or A/B test for the potential algorithms, but it takes months or even longer to observe the long-term o… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: TheWebConference 2024

  6. arXiv:2404.15084  [pdf, other

    cs.LG

    Hyperparameter Optimization Can Even be Harmful in Off-Policy Learning and How to Deal with It

    Authors: Yuta Saito, Masahiro Nomura

    Abstract: There has been a growing interest in off-policy evaluation in the literature such as recommender systems and personalized medicine. We have so far seen significant progress in develo** estimators aimed at accurately estimating the effectiveness of counterfactual policies based on biased logged data. However, there are many cases where those estimators are used not only to evaluate the value of d… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: IJCAI'24

  7. arXiv:2403.17410  [pdf, other

    cs.LG cs.AI stat.ML

    On permutation-invariant neural networks

    Authors: Masanari Kimura, Ryotaro Shimizu, Yuki Hirakawa, Ryosuke Goto, Yuki Saito

    Abstract: Conventional machine learning algorithms have traditionally been designed under the assumption that input data follows a vector-based format, with an emphasis on vector-centric paradigms. However, as the demand for tasks involving set-based inputs has grown, there has been a paradigm shift in the research community towards addressing these challenges. In recent years, the emergence of neural netwo… ▽ More

    Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  8. arXiv:2403.13720  [pdf, other

    cs.SD eess.AS

    UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge

    Authors: Wataru Nakata, Kazuki Yamauchi, Dong Yang, Hiroaki Hyodo, Yuki Saito

    Abstract: We present UTDUSS, the UTokyo-SaruLab system submitted to Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge. The challenge focuses on using discrete speech unit learned from large speech corpora for some tasks. We submitted our UTDUSS system to two text-to-speech tracks: Vocoder and Acoustic+Vocoder. Our system incorporates neural audio codec (NAC) pre-trained on only speech c… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  9. arXiv:2403.13353  [pdf, other

    cs.SD eess.AS

    Building speech corpus with diverse voice characteristics for its prompt-based representation

    Authors: Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

    Abstract: In text-to-speech synthesis, the ability to control voice characteristics is vital for various applications. By leveraging thriving text prompt-based generation techniques, it should be possible to enhance the nuanced control of voice characteristics. While previous research has explored the prompt-based manipulation of voice characteristics, most studies have used pre-recorded speech, which limit… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. arXiv admin note: text overlap with arXiv:2309.13509

  10. Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems

    Authors: Riku Togashi, Kenshi Abe, Yuta Saito

    Abstract: Typical recommendation and ranking methods aim to optimize the satisfaction of users, but they are often oblivious to their impact on the items (e.g., products, jobs, news, video) and their providers. However, there has been a growing understanding that the latter is crucial to consider for a wide range of applications, since it determines the utility of those being recommended. Prior approaches t… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: accepted at WWW2024

  11. arXiv:2402.06151  [pdf, other

    stat.ML cs.LG

    POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

    Authors: Yuta Saito, Jihan Yao, Thorsten Joachims

    Abstract: We study off-policy learning (OPL) of contextual bandit policies in large discrete action spaces where existing methods -- most of which rely crucially on reward-regression models or importance-weighted policy gradients -- fail due to excessive bias or variance. To overcome these issues in OPL, we propose a novel two-stage algorithm, called Policy Optimization via Two-Stage Policy Decomposition (P… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.08062

  12. Off-Policy Evaluation of Slate Bandit Policies via Optimizing Abstraction

    Authors: Haruka Kiyohara, Masahiro Nomura, Yuta Saito

    Abstract: We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a policy selects multi-dimensional actions known as slates. This problem is widespread in recommender systems, search engines, marketing, to medical applications, however, the typical Inverse Propensity Scoring (IPS) estimator suffers from substantial variance due to large action spaces, making effective OPE a si… ▽ More

    Submitted 17 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: WWW2024

  13. arXiv:2402.00288  [pdf, other

    eess.AS cs.SD

    Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech

    Authors: Dong Yang, Tomoki Koriyama, Yuki Saito

    Abstract: Develo** Text-to-Speech (TTS) systems that can synthesize natural breath is essential for human-like voice agents but requires extensive manual annotation of breath positions in training data. To this end, we propose a self-training method for training a breath detection model that can automatically detect breath positions in speech. Our method trains the model using a large speech corpus and in… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted by INTERSPEECH2024

  14. arXiv:2311.18207  [pdf, other

    cs.LG cs.AI

    Towards Assessing and Benchmarking Risk-Return Tradeoff of Off-Policy Evaluation

    Authors: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

    Abstract: Off-Policy Evaluation (OPE) aims to assess the effectiveness of counterfactual policies using only offline logged data and is often used to identify the top-k promising policies for deployment in online A/B tests. Existing evaluation metrics for OPE estimators primarily focus on the "accuracy" of OPE or that of downstream policy selection, neglecting risk-return tradeoff in the subsequent online p… ▽ More

    Submitted 10 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: ICLR2024

  15. arXiv:2311.18206  [pdf, other

    cs.LG cs.AI

    SCOPE-RL: A Python Library for Offline Reinforcement Learning and Off-Policy Evaluation

    Authors: Haruka Kiyohara, Ren Kishimoto, Kosuke Kawakami, Ken Kobayashi, Kazuhide Nakata, Yuta Saito

    Abstract: This paper introduces SCOPE-RL, a comprehensive open-source Python software designed for offline reinforcement learning (offline RL), off-policy evaluation (OPE), and selection (OPS). Unlike most existing libraries that focus solely on either policy learning or evaluation, SCOPE-RL seamlessly integrates these two key aspects, facilitating flexible and complete implementations of both offline RL an… ▽ More

    Submitted 10 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: preprint, open-source software: https://github.com/hakuhodo-technologies/scope-rl

  16. arXiv:2311.16630  [pdf, other

    cs.LG

    Outfit Completion via Conditional Set Transformation

    Authors: Takuma Nakamura, Yuki Saito, Ryosuke Goto

    Abstract: In this paper, we formulate the outfit completion problem as a set retrieval task and propose a novel framework for solving this problem. The proposal includes a conditional set transformation architecture with deep neural networks and a compatibility-based regularization method. The proposed method utilizes a map with permutation-invariant for the input set and permutation-equivariant for the con… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 8 pages, 8 figures

  17. arXiv:2311.16509  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models

    Authors: Kazuki Yamauchi, Yusuke Ijima, Yuki Saito

    Abstract: We propose StyleCap, a method to generate natural language descriptions of speaking styles appearing in speech. Although most of conventional techniques for para-/non-linguistic information recognition focus on the category classification or the intensity estimation of pre-defined labels, they cannot provide the reasoning of the recognition result in an interpretable manner. StyleCap is a first st… ▽ More

    Submitted 27 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted for ICASSP2024

  18. arXiv:2310.14890  [pdf, other

    stat.ML cs.AI cs.LG

    Boosting for Bounding the Worst-class Error

    Authors: Yuya Saito, Shinnosuke Matsuo, Seiichi Uchida, Daiki Suehiro

    Abstract: This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10\%, 10\%, and 40\% has a worst-class error rate of 40\%, whereas the average is 20\% under the class-balanced condition. The worst-class error is important in many applications. For example, in a… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  19. JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

    Authors: Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari

    Abstract: We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also nonverbal vocalizations (NVs) that are essential expressions in spoken language to express emotions. We propose an automatic script generation method to… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

  20. arXiv:2309.13509  [pdf, other

    cs.SD eess.AS

    Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

    Authors: Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

    Abstract: In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive and complicated control of voice characteristics. A sufficiently large corpus of high-quality and diverse voice samples with corresponding free-form d… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: Submitted to ASRU2023

  21. arXiv:2308.08785  [pdf, other

    quant-ph cs.DS

    A Feasibility-Preserved Quantum Approximate Solver for the Capacitated Vehicle Routing Problem

    Authors: Ningyi Xie, Xinwei Lee, Dongsheng Cai, Yoshiyuki Saito, Nobuyoshi Asai, Hoong Chuin Lau

    Abstract: The Capacitated Vehicle Routing Problem (CVRP) is an NP-optimization problem (NPO) that arises in various fields including transportation and logistics. The CVRP extends from the Vehicle Routing Problem (VRP), aiming to determine the most efficient plan for a fleet of vehicles to deliver goods to a set of customers, subject to the limited carrying capacity of each vehicle. As the number of possibl… ▽ More

    Submitted 21 April, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 10 pages, 10 figures, 1 table

  22. arXiv:2306.15098  [pdf, other

    stat.ML cs.IR cs.LG

    Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

    Authors: Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

    Abstract: Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: KDD2023 Research track

  23. arXiv:2306.12169  [pdf, other

    cs.HC

    HumanDiffusion: diffusion model using perceptual gradients

    Authors: Yota Ueda, Shinnosuke Takamichi, Yuki Saito, Norihiro Takamune, Hiroshi Saruwatari

    Abstract: We propose {\it HumanDiffusion,} a diffusion model trained from humans' perceptual gradients to learn an acceptable range of data for humans (i.e., human-acceptable distribution). Conventional HumanGAN aims to model the human-acceptable distribution wider than the real-data distribution by training a neural network-based generator with human-based discriminators. However, HumanGAN training tends t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  24. arXiv:2306.10656  [pdf, other

    cs.LG cs.AI stat.ML

    Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

    Authors: Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito, Koki Tsuda, Hiroshi Maruyama, Kohei Hayashi

    Abstract: Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healt… ▽ More

    Submitted 14 August, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

    Comments: 14 pages, 4 figures

  25. arXiv:2305.16807  [pdf, other

    cs.CV

    Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models

    Authors: Daiki Miyake, Akihiro Iohara, Yu Saito, Toshiyuki Tanaka

    Abstract: In image editing employing diffusion models, it is crucial to preserve the reconstruction quality of the original image while changing its style. Although existing methods ensure reconstruction quality through optimization, a drawback of these is the significant amount of time required for optimization. In this paper, we propose negative-prompt inversion, a method capable of achieving equivalent r… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 22 pages, 11 figures

  26. arXiv:2305.13724  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings

    Authors: Yuki Saito, Shinnosuke Takamichi, Eiji Iimori, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We propose ChatGPT-EDSS, an empathetic dialogue speech synthesis (EDSS) method using ChatGPT for extracting dialogue context. ChatGPT is a chatbot that can deeply understand the content and purpose of an input prompt and appropriately respond to the user's request. We focus on ChatGPT's reading comprehension and introduce it to EDSS, a task of synthesizing speech that can empathize with the interl… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages, accepted for INTERSPEECH 2023

  27. arXiv:2305.13713  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center

    Authors: Yuki Saito, Eiji Iimori, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We present CALLS, a Japanese speech corpus that considers phone calls in a customer center as a new domain of empathetic spoken dialogue. The existing STUDIES corpus covers only empathetic dialogue between a teacher and student in a school. To extend the application range of empathetic dialogue speech synthesis (EDSS), we designed our corpus to include the same female speaker as the STUDIES teache… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages, accepted for INTERSPEECH2023

  28. arXiv:2305.08062  [pdf, other

    stat.ML cs.AI cs.LG

    Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

    Authors: Yuta Saito, Qingyang Ren, Thorsten Joachims

    Abstract: We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. O… ▽ More

    Submitted 2 June, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: accepted at ICML2023. arXiv admin note: text overlap with arXiv:2202.06317

  29. arXiv:2302.13652  [pdf, ps, other

    eess.AS cs.CL cs.LG cs.SD

    Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

    Authors: Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

    Abstract: Pause insertion, also known as phrase break prediction and phrasing, is an essential part of TTS systems because proper pauses with natural duration significantly enhance the rhythm and intelligibility of synthetic speech. However, conventional phrasing models ignore various speakers' different styles of inserting silent pauses, which can degrade the performance of the model trained on a multi-spe… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  30. arXiv:2211.13904  [pdf, other

    cs.LG

    Policy-Adaptive Estimator Selection for Off-Policy Evaluation

    Authors: Takuma Udagawa, Haruka Kiyohara, Yusuke Narita, Yuta Saito, Kei Tateno

    Abstract: Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual policies using only offline logged data. Although many estimators have been developed, there is no single estimator that dominates the others, because the estimators' accuracy can vary greatly depending on a given OPE task such as the evaluation policy, number of actions, and noise level. Thus, the data-drive… ▽ More

    Submitted 29 January, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: accepted at AAAI'23

  31. arXiv:2210.09916  [pdf, other

    cs.SD eess.AS

    Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

    Authors: Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari

    Abstract: In this paper, we propose a method for intermediating multiple speakers' attributes and diversifying their voice characteristics in ``speaker generation,'' an emerging task that aims to synthesize a nonexistent speaker's naturally sounding voice. The conventional TacoSpawn-based speaker generation method represents the distributions of speaker embeddings by Gaussian mixture models (GMMs) condition… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023. Demo: https://sarulab-speech.github.io/demo_mid-attribute-speaker-generation

  32. arXiv:2209.12549  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS

    Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech

    Authors: Yusuke Nakai, Yuki Saito, Kenta Udagawa, Hiroshi Saruwatari

    Abstract: We propose a novel training algorithm for a multi-speaker neural text-to-speech (TTS) model based on multi-task adversarial training. A conventional generative adversarial network (GAN)-based training algorithm significantly improves the quality of synthetic speech by reducing the statistical difference between natural and synthetic speech. However, the algorithm does not guarantee the generalizat… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 6 pages, 1 figure, Accepted for APSIPA ASC 2022

  33. arXiv:2206.10256  [pdf, other

    cs.SD cs.HC cs.NE eess.AS

    Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS

    Authors: Kenta Udagawa, Yuki Saito, Hiroshi Saruwatari

    Abstract: This paper proposes a human-in-the-loop speaker-adaptation method for multi-speaker text-to-speech. With a conventional speaker-adaptation method, a target speaker's embedding vector is extracted from his/her reference speech using a speaker encoder trained on a speaker-discriminative task. However, this method cannot obtain an embedding vector for the target speaker when the reference speech is u… ▽ More

    Submitted 21 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022

  34. arXiv:2206.08039  [pdf, ps, other

    cs.SD cs.CL cs.LG eess.AS

    Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History

    Authors: Yuto Nishimura, Yuki Saito, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We propose an end-to-end empathetic dialogue speech synthesis (DSS) model that considers both the linguistic and prosodic contexts of dialogue history. Empathy is the active attempt by humans to get inside the interlocutor in dialogue, and empathetic DSS is a technology to implement this act in spoken dialogue systems. Our model is conditioned by the history of linguistic and prosody features for… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022

  35. arXiv:2206.07247  [pdf, other

    cs.IR cs.AI cs.LG

    Fair Ranking as Fair Division: Impact-Based Individual Fairness in Ranking

    Authors: Yuta Saito, Thorsten Joachims

    Abstract: Rankings have become the primary interface in two-sided online markets. Many have noted that the rankings not only affect the satisfaction of the users (e.g., customers, listeners, employers, travelers), but that the position in the ranking allocates exposure -- and thus economic opportunity -- to the ranked items (e.g., articles, products, songs, job seekers, restaurants, hotels). This has raised… ▽ More

    Submitted 30 August, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: accepted at KDD2022, a few minor updates from the camera ready version

  36. arXiv:2203.14757  [pdf, ps, other

    cs.SD cs.AI cs.CL cs.HC cs.LG

    STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent

    Authors: Yuki Saito, Yuto Nishimura, Shinnosuke Takamichi, Kentaro Tachibana, Hiroshi Saruwatari

    Abstract: We present STUDIES, a new speech corpus for develo** a voice agent that can speak in a friendly manner. Humans naturally control their speech prosody to empathize with each other. By incorporating this "empathetic dialogue" behavior into a spoken dialogue system, we can develop a voice agent that can respond to a user more naturally. We designed the STUDIES corpus to include a speaker who speaks… ▽ More

    Submitted 16 June, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: 5 pages, 2 figures, Accepted for INTERSPEECH2022, project page: http://sython.org/Corpus/STUDIES

  37. A Real-World Implementation of Unbiased Lift-based Bidding System

    Authors: Daisuke Moriwaki, Yuta Hayakawa, Akira Matsui, Yuta Saito, Isshu Munemasa, Masashi Shibata

    Abstract: In display ad auctions of Real-Time Bid-ding (RTB), a typical Demand-Side Platform (DSP)bids based on the predicted probability of click and conversion right after an ad impression. Recent studies find such a strategy is suboptimal and propose a better bidding strategy named lift-based bidding.Lift-based bidding simply bids the price according to the lift effect of the ad impression and achieves m… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

    Comments: 2021 IEEE International Conference on Big Data (Big Data)

  38. arXiv:2202.06317  [pdf, other

    cs.LG cs.AI stat.ML

    Off-Policy Evaluation for Large Action Spaces via Embeddings

    Authors: Yuta Saito, Thorsten Joachims

    Abstract: Off-policy evaluation (OPE) in contextual bandits has seen rapid adoption in real-world systems, since it enables offline evaluation of new policies using only historic log data. Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance. This foils th… ▽ More

    Submitted 15 June, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: accepted at ICML2022

  39. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

    Authors: Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto

    Abstract: In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: WSDM2022

  40. arXiv:2109.08621  [pdf, ps, other

    cs.AI

    Data-Driven Off-Policy Estimator Selection: An Application in User Marketing on An Online Content Delivery Service

    Authors: Yuta Saito, Takuma Udagawa, Kei Tateno

    Abstract: Off-policy evaluation (OPE) is the method that attempts to estimate the performance of decision making policies using historical data generated by different policies without conducting costly online A/B tests. Accurate OPE is essential in domains such as healthcare, marketing or recommender systems to avoid deploying poor performing policies, as such policies may hart human lives or destroy the us… ▽ More

    Submitted 17 September, 2021; originally announced September 2021.

    Comments: presented at REVEAL workshop, RecSys2020

  41. arXiv:2109.08331  [pdf, other

    cs.LG

    Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation

    Authors: Haruka Kiyohara, Kosuke Kawakami, Yuta Saito

    Abstract: In recommender systems (RecSys) and real-time bidding (RTB) for online advertisements, we often try to optimize sequential decision making using bandit and reinforcement learning (RL) techniques. In these applications, offline reinforcement learning (offline RL) and off-policy evaluation (OPE) are beneficial because they enable safe policy optimization using only logged data without any risky onli… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

    Comments: SimuRec workshop at RecSys2021

  42. arXiv:2108.13703  [pdf, other

    stat.ML cs.AI cs.LG

    Evaluating the Robustness of Off-Policy Evaluation

    Authors: Yuta Saito, Takuma Udagawa, Haruka Kiyohara, Kazuki Mogi, Yusuke Narita, Kei Tateno

    Abstract: Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems. Since many OPE estimators have been proposed and some of them have hyperparameters to… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

    Comments: Accepted at RecSys2021

  43. arXiv:2108.12992  [pdf, other

    cs.LG cs.CV

    SHIFT15M: Fashion-specific dataset for set-to-set matching with several distribution shifts

    Authors: Masanari Kimura, Takuma Nakamura, Yuki Saito

    Abstract: This paper addresses the problem of set-to-set matching, which involves matching two different sets of items based on some criteria, especially in the case of high-dimensional items like images. Although neural networks have been applied to solve this problem, most machine learning-based approaches assume that the training and test data follow the same distribution, which is not always true in rea… ▽ More

    Submitted 8 March, 2023; v1 submitted 30 August, 2021; originally announced August 2021.

  44. arXiv:2102.04051  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    HumanACGAN: conditional generative adversarial network with human-based auxiliary classifier and its evaluation in phoneme perception

    Authors: Yota Ueda, Kazuki Fujii, Yuki Saito, Shinnosuke Takamichi, Yukino Baba, Hiroshi Saruwatari

    Abstract: We propose a conditional generative adversarial network (GAN) incorporating humans' perceptual evaluations. A deep neural network (DNN)-based generator of a GAN can represent a real-data distribution accurately but can never represent a human-acceptable distribution, which are ranges of data in which humans accept the naturalness regardless of whether the data are real or not. A HumanGAN was propo… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: 5 pages, 6 figures, to be published in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing

  45. arXiv:2010.11002  [pdf, other

    cs.LG stat.ME stat.ML

    Optimal Off-Policy Evaluation from Multiple Logging Policies

    Authors: Nathan Kallus, Yuta Saito, Masatoshi Uehara

    Abstract: We study off-policy evaluation (OPE) from multiple logging policies, each generating a dataset of fixed size, i.e., stratified sampling. Previous work noted that in this setting the ordering of the variances of different importance sampling estimators is instance-dependent, which brings up a dilemma as to which importance sampling weights to use. In this paper, we resolve this dilemma by finding t… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Under Review

  46. arXiv:2008.07146  [pdf, other

    cs.LG stat.ML

    Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation

    Authors: Yuta Saito, Shunsuke Aihara, Megumi Matsutani, Yusuke Narita

    Abstract: Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using data generated by a different policy. Because of its huge potential impact in practice, there has been growing research interest in this field. There is, however, no real-world public dataset that enables the evaluation of OPE, making its experimental studies unrealistic and irreproducible. With the goal of… ▽ More

    Submitted 26 October, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

    Comments: Accepted at NeurIPS2021 Datasets and Benchmarks Track

  47. arXiv:2007.04002  [pdf, other

    cs.LG cs.IR stat.ML

    Unbiased Lift-based Bidding System

    Authors: Daisuke Moriwaki, Yuta Hayakawa, Isshu Munemasa, Yuta Saito, Akira Matsui

    Abstract: Conventional bidding strategies for online display ad auction heavily relies on observed performance indicators such as clicks or conversions. A bidding strategy naively pursuing these easily observable metrics, however, fails to optimize the profitability of the advertisers. Rather, the bidding strategy that leads to the maximum revenue is a strategy pursuing the performance lift of showing ads t… ▽ More

    Submitted 8 July, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

  48. Efficient Hyperparameter Optimization under Multi-Source Covariate Shift

    Authors: Masahiro Nomura, Yuta Saito

    Abstract: A typical assumption in supervised machine learning is that the train (source) and test (target) datasets follow completely the same distribution. This assumption is, however, often violated in uncertain real-world applications, which motivates the study of learning under covariate shift. In this setting, the naive use of adaptive hyperparameter optimization methods such as Bayesian optimization d… ▽ More

    Submitted 16 August, 2021; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: equal contribution

  49. arXiv:2005.05618  [pdf

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Machine Learning Guided Discovery of Gigantic Magnetocaloric Effect in HoB$_{2}$ Near Hydrogen Liquefaction Temperature

    Authors: Pedro Baptista de Castro, Kensei Terashima, Takafumi D Yamamoto, Zhufeng Hou, Suguru Iwasaki, Ryo Matsumoto, Shintaro Adachi, Yoshito Saito, Peng Song, Hiroyuki Takeya, Yoshihiko Takano

    Abstract: Magnetic refrigeration exploits the magnetocaloric effect which is the entropy change upon application and removal of magnetic fields in materials, providing an alternate path for refrigeration other than the conventional gas cycles. While intensive research has uncovered a vast number of magnetic materials which exhibits large magnetocaloric effect, these properties for a large number of compound… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: 12 pages including 3 figures and 1 table + 11 pages of supplementary information. Published version available at: https://rdcu.be/b36ep

    Journal ref: NPG Asia Materials 12:35 (2020)

  50. arXiv:2002.06778  [pdf, other

    cs.SD eess.AS

    Lifter Training and Sub-band Modeling for Computationally Efficient and High-Quality Voice Conversion Using Spectral Differentials

    Authors: Takaaki Saeki, Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari

    Abstract: In this paper, we propose computationally efficient and high-quality methods for statistical voice conversion (VC) with direct waveform modification based on spectral differentials. The conventional method with a minimum-phase filter achieves high-quality conversion but requires heavy computation in filtering. This is because the minimum phase using a fixed lifter of the Hilbert transform often re… ▽ More

    Submitted 17 February, 2020; originally announced February 2020.

    Comments: 5 pages, to appear in IEEE International Conference on Acoustics, Speech, and Signal Processing 2020 (ICASSP 2020)