-
Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment
Authors:
Takuto Igarashi,
Yuki Saito,
Kentaro Seki,
Shinnosuke Takamichi,
Ryuichi Yamamoto,
Kentaro Tachibana,
Hiroshi Saruwatari
Abstract:
We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. However, the naturalness of the converted speech is limited when the noise of the source speech is unseen during the training. To this end, our proposed…
▽ More
We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. However, the naturalness of the converted speech is limited when the noise of the source speech is unseen during the training. To this end, our proposed training conditions a VC model on two latent variables representing the recording quality and environment of the source speech. These latent variables are derived from deep neural networks pre-trained on recording quality assessment and acoustic scene classification and calculated in an utterance-wise or frame-wise manner. As a result, the trained VC model can explicitly learn information about speech degradation during the training. Objective and subjective evaluations show that our training improves the quality of the converted speech compared to the conventional training.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark
Authors:
Yuki Saito,
Takuto Igarashi,
Kentaro Seki,
Shinnosuke Takamichi,
Ryuichi Yamamoto,
Kentaro Tachibana,
Hiroshi Saruwatari
Abstract:
We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC when low-quality speech recording is given as the input. To this end, we first asked 100 crowdworkers to record their voice samples using smartphones. T…
▽ More
We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC when low-quality speech recording is given as the input. To this end, we first asked 100 crowdworkers to record their voice samples using smartphones. Then, we annotated the recorded samples with speaker-wise recording-quality scores and utterance-wise perceived emotion labels. We also benchmark SRC4VC on any-to-any VC, in which we trained a multi-speaker VC model on high-quality speech and used the SRC4VC speakers' voice samples as the source in VC. The results show that the recording quality mismatch between the training and evaluation data significantly degrades the VC performance, which can be improved by applying speech enhancement to the low-quality source speech samples.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Brain Surface Reconstruction from MRI Images Based on Segmentation Networks Applying Signed Distance Maps
Authors:
Heng Fang,
Xi Yang,
Taichi Kin,
Takeo Igarashi
Abstract:
Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an…
▽ More
Whole-brain surface extraction is an essential topic in medical imaging systems as it provides neurosurgeons with a broader view of surgical planning and abnormality detection. To solve the problem confronted in current deep learning skull strip** methods lacking prior shape information, we propose a new network architecture that incorporates knowledge of signed distance fields and introduce an additional Laplacian loss to ensure that the prediction results retain shape information. We validated our newly proposed method by conducting experiments on our brain magnetic resonance imaging dataset (111 patients). The evaluation results demonstrate that our approach achieves comparable dice scores and also reduces the Hausdorff distance and average symmetric surface distance, thus producing more stable and smooth brain isosurfaces.
△ Less
Submitted 9 April, 2021;
originally announced April 2021.
-
Generative Melody Composition with Human-in-the-Loop Bayesian Optimization
Authors:
Yijun Zhou,
Yuki Koyama,
Masataka Goto,
Takeo Igarashi
Abstract:
Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative ap…
▽ More
Deep generative models allow even novice composers to generate various melodies by sampling latent vectors. However, finding the desired melody is challenging since the latent space is unintuitive and high-dimensional. In this work, we present an interactive system that supports generative melody composition with human-in-the-loop Bayesian optimization (BO). This system takes a mixed-initiative approach; the system generates candidate melodies to evaluate, and the user evaluates them and provides preferential feedback (i.e., picking the best melody among the candidates) to the system. This process is iteratively performed based on BO techniques until the user finds the desired melody. We conducted a pilot study using our prototype system, suggesting the potential of this approach.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
A Two-step Surface-based 3D Deep Learning Pipeline for Segmentation of Intracranial Aneurysms
Authors:
Xi Yang,
Ding Xia,
Taichi Kin,
Takeo Igarashi
Abstract:
The exact shape of intracranial aneurysms is critical in medical diagnosis and surgical planning. While voxel-based deep learning frameworks have been proposed for this segmentation task, their performance remains limited. In this study, we offer a two-step surface-based deep learning pipeline that achieves significantly higher performance. Our proposed model takes a surface model of entire princi…
▽ More
The exact shape of intracranial aneurysms is critical in medical diagnosis and surgical planning. While voxel-based deep learning frameworks have been proposed for this segmentation task, their performance remains limited. In this study, we offer a two-step surface-based deep learning pipeline that achieves significantly higher performance. Our proposed model takes a surface model of entire principal brain arteries containing aneurysms as input and returns aneurysms surfaces as output. A user first generates a surface model by manually specifying multiple thresholds for time-of-flight magnetic resonance angiography images. The system then samples small surface fragments from the entire brain arteries and classifies the surface fragments according to whether aneurysms are present using a point-based deep learning network (PointNet++). Finally, the system applies surface segmentation (SO-Net) to surface fragments containing aneurysms. We conduct a direct comparison of segmentation performance by counting voxels between the proposed surface-based framework and the existing voxel-based method, in which our framework achieves a much higher dice similarity coefficient score (72%) than the prior approach (46%).
△ Less
Submitted 4 July, 2021; v1 submitted 29 June, 2020;
originally announced June 2020.
-
IntrA: 3D Intracranial Aneurysm Dataset for Deep Learning
Authors:
Xi Yang,
Ding Xia,
Taichi Kin,
Takeo Igarashi
Abstract:
Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be us…
▽ More
Medicine is an important application area for deep learning models. Research in this field is a combination of medical expertise and data science knowledge. In this paper, instead of 2D medical images, we introduce an open-access 3D intracranial aneurysm dataset, IntrA, that makes the application of points-based and mesh-based classification and segmentation models available. Our dataset can be used to diagnose intracranial aneurysms and to extract the neck for a clip** operation in medicine and other areas of deep learning, such as normal estimation and surface reconstruction. We provide a large-scale benchmark of classification and part segmentation by testing state-of-the-art networks. We also discuss the performance of each method and demonstrate the challenges of our dataset. The published dataset can be accessed here: https://github.com/intra3d2019/IntrA.
△ Less
Submitted 6 April, 2020; v1 submitted 2 March, 2020;
originally announced March 2020.