Search | arXiv e-print repository

Filtered Direct Preference Optimization

Authors: Tetsuro Morimura, Mitsuki Sakamoto, Yuu **nai, Kenshi Abe, Kaito Ariu

Abstract: Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF framework, to our knowledge, have been limited. This paper addresses the issue of text quality within the preference dataset by focusing on direct prefere… ▽ More Reinforcement learning from human feedback (RLHF) plays a crucial role in aligning language models with human preferences. While the significance of dataset quality is generally recognized, explicit investigations into its impact within the RLHF framework, to our knowledge, have been limited. This paper addresses the issue of text quality within the preference dataset by focusing on direct preference optimization (DPO), an increasingly adopted reward-model-free RLHF method. We confirm that text quality significantly influences the performance of models optimized with DPO more than those optimized with reward-model-based RLHF. Building on this new insight, we propose an extension of DPO, termed filtered direct preference optimization (fDPO). fDPO uses a trained reward model to monitor the quality of texts within the preference dataset during DPO training. Samples of lower quality are discarded based on comparisons with texts generated by the model being optimized, resulting in a more accurate dataset. Experimental results demonstrate that fDPO enhances the final model performance. Our code is available at https://github.com/CyberAgentAILab/filtered-dpo. △ Less

Submitted 4 July, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2306.03646 [pdf, other]

Dance Generation by Sound Symbolic Words

Authors: Miki Okamura, Naruya Kondo, Tatsuki Fushimi, Maki Sakamoto, Yoichi Ochiai

Abstract: This study introduces a novel approach to generate dance motions using onomatopoeia as input, with the aim of enhancing creativity and diversity in dance generation. Unlike text and music, onomatopoeia conveys rhythm and meaning through abstract word expressions without constraints on expression and without need for specialized knowledge. We adapt the AI Choreographer framework and employ the Saka… ▽ More This study introduces a novel approach to generate dance motions using onomatopoeia as input, with the aim of enhancing creativity and diversity in dance generation. Unlike text and music, onomatopoeia conveys rhythm and meaning through abstract word expressions without constraints on expression and without need for specialized knowledge. We adapt the AI Choreographer framework and employ the Sakamoto system, a feature extraction method for onomatopoeia focusing on phonemes and syllables. Additionally, we present a new dataset of 40 onomatopoeia-dance motion pairs collected through a user survey. Our results demonstrate that the proposed method enables more intuitive dance generation and can create dance motions using sound-symbolic words from a variety of languages, including those without onomatopoeia. This highlights the potential for diverse dance creation across different languages and cultures, accessible to a wider audience. Qualitative samples from our model can be found at: https://sites.google.com/view/onomatopoeia-dance/home/. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2305.16610 [pdf, other]

Adaptively Perturbed Mirror Descent for Learning in Games

Authors: Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Atsushi Iwasaki

Abstract: This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynam… ▽ More This paper proposes a payoff perturbation technique for the Mirror Descent (MD) algorithm in games where the gradient of the payoff functions is monotone in the strategy profile space, potentially containing additive noise. The optimistic family of learning algorithms, exemplified by optimistic MD, successfully achieves {\it last-iterate} convergence in scenarios devoid of noise, leading the dynamics to a Nash equilibrium. A recent re-emerging trend underscores the promise of the perturbation approach, where payoff functions are perturbed based on the distance from an anchoring, or {\it slingshot}, strategy. In response, we propose {\it Adaptively Perturbed MD} (APMD), which adjusts the magnitude of the perturbation by repeatedly updating the slingshot strategy at a predefined interval. This innovation empowers us to find a Nash equilibrium of the underlying game with guaranteed rates. Empirical demonstrations affirm that our algorithm exhibits significantly accelerated convergence. △ Less

Submitted 24 June, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: Accepted at ICML 2024

arXiv:2208.09855 [pdf, other]

Last-Iterate Convergence with Full and Noisy Feedback in Two-Player Zero-Sum Games

Authors: Kenshi Abe, Kaito Ariu, Mitsuki Sakamoto, Kentaro Toyoshima, Atsushi Iwasaki

Abstract: This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Eve… ▽ More This paper proposes Mutation-Driven Multiplicative Weights Update (M2WU) for learning an equilibrium in two-player zero-sum normal-form games and proves that it exhibits the last-iterate convergence property in both full and noisy feedback settings. In the former, players observe their exact gradient vectors of the utility functions. In the latter, they only observe the noisy gradient vectors. Even the celebrated Multiplicative Weights Update (MWU) and Optimistic MWU (OMWU) algorithms may not converge to a Nash equilibrium with noisy feedback. On the contrary, M2WU exhibits the last-iterate convergence to a stationary point near a Nash equilibrium in both feedback settings. We then prove that it converges to an exact Nash equilibrium by iteratively adapting the mutation term. We empirically confirm that M2WU outperforms MWU and OMWU in exploitability and convergence rates. △ Less

Submitted 26 May, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

Comments: Accepted in AISTATS 2023

arXiv:2206.09254 [pdf, other]

Mutation-Driven Follow the Regularized Leader for Last-Iterate Convergence in Zero-Sum Games

Authors: Kenshi Abe, Mitsuki Sakamoto, Atsushi Iwasaki

Abstract: In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games. FTRL is guaranteed to converge to a Nash equilibrium when time-averaging the strategies, while a lot of variants suffer from the issue of limit cycling behavior, i.e., lack the last-iterate convergence guarantee. To this end, we propose mutant FTRL (M-FTRL), an algorithm that int… ▽ More In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games. FTRL is guaranteed to converge to a Nash equilibrium when time-averaging the strategies, while a lot of variants suffer from the issue of limit cycling behavior, i.e., lack the last-iterate convergence guarantee. To this end, we propose mutant FTRL (M-FTRL), an algorithm that introduces mutation for the perturbation of action probabilities. We then investigate the continuous-time dynamics of M-FTRL and provide the strong convergence guarantees toward stationary points that approximate Nash equilibria under full-information feedback. Furthermore, our simulation demonstrates that M-FTRL can enjoy faster convergence rates than FTRL and optimistic FTRL under full-information feedback and surprisingly exhibits clear convergence under bandit feedback. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: Accepted in UAI 2022

arXiv:1906.11484 [pdf]

Automated Segmentation of Hip and Thigh Muscles in Metal Artifact-Contaminated CT using Convolutional Neural Network-Enhanced Normalized Metal Artifact Reduction

Authors: Mitsuki Sakamoto, Yuta Hiasa, Yoshito Otake, Masaki Takao, Yuki Suzuki, Nobuhiko Sugano, Yoshinobu Sato

Abstract: In total hip arthroplasty, analysis of postoperative medical images is important to evaluate surgical outcome. Since Computed Tomography (CT) is most prevalent modality in orthopedic surgery, we aimed at the analysis of CT image. In this work, we focus on the metal artifact in postoperative CT caused by the metallic implant, which reduces the accuracy of segmentation especially in the vicinity of… ▽ More In total hip arthroplasty, analysis of postoperative medical images is important to evaluate surgical outcome. Since Computed Tomography (CT) is most prevalent modality in orthopedic surgery, we aimed at the analysis of CT image. In this work, we focus on the metal artifact in postoperative CT caused by the metallic implant, which reduces the accuracy of segmentation especially in the vicinity of the implant. Our goal was to develop an automated segmentation method of the bones and muscles in the postoperative CT images. We propose a method that combines Normalized Metal Artifact Reduction (NMAR), which is one of the state-of-the-art metal artifact reduction methods, and a Convolutional Neural Network-based segmentation using two U-net architectures. The first U-net refines the result of NMAR and the muscle segmentation is performed by the second U-net. We conducted experiments using simulated images of 20 patients and real images of three patients to evaluate the segmentation accuracy of 19 muscles. In simulation study, the proposed method showed statistically significant improvement (p<0.05) in the average symmetric surface distance (ASD) metric for 14 muscles out of 19 muscles and the average ASD of all muscles from 1.17 +/- 0.543 mm (mean +/- std over all patients) to 1.10 +/- 0.509 mm over our previous method. The real image study using the manual trace of gluteus maximus and medius muscles showed ASD of 1.32 +/- 0.25 mm. Our future work includes training of a network in an end-to-end manner for both the metal artifact reduction and muscle segmentation. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Comments: 7 pages, 5 figures

arXiv:1712.02198 [pdf, other]

Lung Nodule Classification by the Combination of Fusion Classifier and Cascaded Convolutional Neural Networks

Authors: Masaharu Sakamoto, Hiroki Nakano, Kun Zhao, Taro Sekiyama

Abstract: Lung nodule classification is a class imbalanced problem, as nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We showed that cascaded convolutional neural networks can classify the nodule candidates precisely for a class imbalanced nodule candidate data… ▽ More Lung nodule classification is a class imbalanced problem, as nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We showed that cascaded convolutional neural networks can classify the nodule candidates precisely for a class imbalanced nodule candidate data set in our previous study. In this paper, we propose Fusion classifier in conjunction with the cascaded convolutional neural network models. To fuse the models, nodule probabilities are calculated by using the convolutional neural network models at first. Then, Fusion classifier is trained and tested by the nodule probabilities. The proposed method achieved the sensitivity of 94.4% and 95.9% at 4 and 8 false positives per scan in Free Receiver Operating Characteristics (FROC) curve analysis, respectively. △ Less

Submitted 17 December, 2017; v1 submitted 19 November, 2017; originally announced December 2017.

Comments: Draft of ISBI2018. arXiv admin note: text overlap with arXiv:1703.00311

arXiv:1703.00311 [pdf, other]

Multi-stage Neural Networks with Single-sided Classifiers for False Positive Reduction and its Evaluation using Lung X-ray CT Images

Authors: Masaharu Sakamoto, Hiroki Nakano, Kun Zhao, Taro Sekiyama

Abstract: Lung nodule classification is a class imbalanced problem because nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We therefore propose cascaded convolutional neural networks to cope with the class imbalanced problem. In the proposed approach, multi-stag… ▽ More Lung nodule classification is a class imbalanced problem because nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We therefore propose cascaded convolutional neural networks to cope with the class imbalanced problem. In the proposed approach, multi-stage convolutional neural networks that perform as single-sided classifiers filter out obvious non-nodules. Successively, a convolutional neural network trained with a balanced data set calculates nodule probabilities. The proposed method achieved the sensitivity of 92.4\% and 94.5% at 4 and 8 false positives per scan in Free Receiver Operating Characteristics (FROC) curve analysis, respectively. △ Less

Submitted 21 March, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

Comments: arXiv admin note: text overlap with arXiv:1611.07136

arXiv:1611.07136 [pdf]

Cascaded Neural Networks with Selective Classifiers and its evaluation using Lung X-ray CT Images

Authors: Masaharu Sakamoto, Hiroki Nakano

Abstract: Lung nodule detection is a class imbalanced problem because nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We therefore propose cascaded convolutional neural networks to cope with the class imbalanced problem. In the proposed approach, cascaded convol… ▽ More Lung nodule detection is a class imbalanced problem because nodules are found with much lower frequency than non-nodules. In the class imbalanced problem, conventional classifiers tend to be overwhelmed by the majority class and ignore the minority class. We therefore propose cascaded convolutional neural networks to cope with the class imbalanced problem. In the proposed approach, cascaded convolutional neural networks that perform as selective classifiers filter out obvious non-nodules. Successively, a convolutional neural network trained with a balanced data set calculates nodule probabilities. The proposed method achieved the detection sensitivity of 85.3% and 90.7% at 1 and 4 false positives per scan in FROC curve, respectively. △ Less

Submitted 21 November, 2016; originally announced November 2016.

Showing 1–9 of 9 results for author: Sakamoto, M