Search | arXiv e-print repository

arXiv:2406.07280 [pdf, ps, other]

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment

Authors: Takuto Igarashi, Yuki Saito, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari

Abstract: We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. However, the naturalness of the converted speech is limited when the noise of the source speech is unseen during the training. To this end, our proposed… ▽ More We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. However, the naturalness of the converted speech is limited when the noise of the source speech is unseen during the training. To this end, our proposed training conditions a VC model on two latent variables representing the recording quality and environment of the source speech. These latent variables are derived from deep neural networks pre-trained on recording quality assessment and acoustic scene classification and calculated in an utterance-wise or frame-wise manner. As a result, the trained VC model can explicitly learn information about speech degradation during the training. Objective and subjective evaluations show that our training improves the quality of the converted speech compared to the conventional training. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 5 pages, accepted for INTERSPEECH 2024, audio samples: http://y-saito.sakura.ne.jp/sython/Corpus/SRC4VC/IS2024_CDT_supplementary/demo_cdt.html

arXiv:2406.07254 [pdf, ps, other]

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark

Authors: Yuki Saito, Takuto Igarashi, Kentaro Seki, Shinnosuke Takamichi, Ryuichi Yamamoto, Kentaro Tachibana, Hiroshi Saruwatari

Abstract: We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC when low-quality speech recording is given as the input. To this end, we first asked 100 crowdworkers to record their voice samples using smartphones. T… ▽ More We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC when low-quality speech recording is given as the input. To this end, we first asked 100 crowdworkers to record their voice samples using smartphones. Then, we annotated the recorded samples with speaker-wise recording-quality scores and utterance-wise perceived emotion labels. We also benchmark SRC4VC on any-to-any VC, in which we trained a multi-speaker VC model on high-quality speech and used the SRC4VC speakers' voice samples as the source in VC. The results show that the recording quality mismatch between the training and evaluation data significantly degrades the VC performance, which can be improved by applying speech enhancement to the low-quality source speech samples. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted for INTERSPEECH 2024, corpus project page: https://y-saito.sakura.ne.jp/sython/Corpus/SRC4VC/index.html

arXiv:2104.04291 [pdf, other]

Brain Surface Reconstruction from MRI Images Based on Segmentation Networks Applying Signed Distance Maps

Authors: Heng Fang, Xi Yang, Taichi Kin, Takeo Igarashi

Abstract: Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an… ▽ More Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an additional Laplacian loss to ensure that the prediction results retain shape information. We validated our newly proposed method by conducting experiments on our brain magnetic resonance imaging dataset (111 patients). The evaluation results demonstrate that our approach achieves comparable dice scores and also reduces the Hausdorff distance and average symmetric surface distance, thus producing more stable and smooth brain isosurfaces. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted by IEEE ISBI 2021 (International Symposium on Biomedical Imaging)

arXiv:2010.03190 [pdf, other]

Generative Melody Composition with Human-in-the-Loop Bayesian Optimization

Authors: Yijun Zhou, Yuki Koyama, Masataka Goto, Takeo Igarashi

Abstract: Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative ap… ▽ More Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative approach; the system generates candidate melodies to evaluate, and the user evaluates them and provides preferential feedback (i.e., picking the best melody among the candidates) to the system. This process is iteratively performed based on BO techniques until the user finds the desired melody. We conducted a pilot study using our prototype system, suggesting the potential of this approach. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: 10 pages, 2 figures, Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020)

ACM Class: J.5; H.5.5; H.5.2

arXiv:2006.16161 [pdf, other]

A Two-step Surface-based 3D Deep Learning Pipeline for Segmentation of Intracranial Aneurysms

Authors: Xi Yang, Ding Xia, Taichi Kin, Takeo Igarashi

Abstract: The exact shape of intracranial aneurysms is critical in medical diagnosis and surgical planning. While voxel-based deep learning frameworks have been proposed for this segmentation task, their performance remains limited. In this study, we offer a two-step surface-based deep learning pipeline that achieves significantly higher performance. Our proposed model takes a surface model of entire princi… ▽ More The exact shape of intracranial aneurysms is critical in medical diagnosis and surgical planning. While voxel-based deep learning frameworks have been proposed for this segmentation task, their performance remains limited. In this study, we offer a two-step surface-based deep learning pipeline that achieves significantly higher performance. Our proposed model takes a surface model of entire principal brain arteries containing aneurysms as input and returns aneurysms surfaces as output. A user first generates a surface model by manually specifying multiple thresholds for time-of-flight magnetic resonance angiography images. The system then samples small surface fragments from the entire brain arteries and classifies the surface fragments according to whether aneurysms are present using a point-based deep learning network (PointNet++). Finally, the system applies surface segmentation (SO-Net) to surface fragments containing aneurysms. We conduct a direct comparison of segmentation performance by counting voxels between the proposed surface-based framework and the existing voxel-based method, in which our framework achieves a much higher dice similarity coefficient score (72%) than the prior approach (46%). △ Less

Submitted 4 July, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: It is a pre-released version

arXiv:2003.02920 [pdf, other]

IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning

Authors: Xi Yang, Ding Xia, Taichi Kin, Takeo Igarashi

Abstract: Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be us… ▽ More Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clip** operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a large-scale benchmark of classification and part segmentation by testing state-of-the-art networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: https://github.com/intra3d2019/IntrA. △ Less

Submitted 6 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: Accepted by cvpr2020, camera-ready version will be uploaded later

Showing 1–6 of 6 results for author: Igarashi, T