Skip to main content

Showing 1–15 of 15 results for author: Tan, D

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2212.03398   

    eess.AS cs.CL cs.SD

    Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue

    Authors: Daxin Tan, Nikos Kargas, David McHardy, Constantinos Papayiannis, Antonio Bonafonte, Marek Strelec, Jonas Rohnke, Agis Oikonomou Filandras, Trevor Wood

    Abstract: Entrainment is the phenomenon by which an interlocutor adapts their speaking style to align with their partner in conversations. It has been found in different dimensions as acoustic, prosodic, lexical or syntactic. In this work, we explore and utilize the entrainment phenomenon to improve spoken dialogue systems for voice assistants. We first examine the existence of the entrainment phenomenon in… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

    Comments: This version has been removed by arXiv administrators because the submitter did not have the right to assign a license at the time of submission

  3. arXiv:2209.13112  [pdf, other

    eess.AS cs.SD

    Automated Sex Classification of Children's Voices and Changes in Differentiating Factors with Age

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Weiting Tan

    Abstract: Sex classification of children's voices allows for an investigation of the development of secondary sex characteristics which has been a key interest in the field of speech analysis. This research investigated a broad range of acoustic features from scripted and spontaneous speech and applied a hierarchical clustering-based machine learning model to distinguish the sex of children aged between 5 a… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

  4. arXiv:2204.05460  [pdf, other

    eess.AS cs.CL cs.SD

    CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

    Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted by ISCSLP 2022

  5. arXiv:2203.17190  [pdf, other

    eess.AS cs.CL

    Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

    Authors: Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

    Abstract: Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input. Pre-training only with phonemes as input can alleviate the input mismatch but lack the ability t… ▽ More

    Submitted 19 July, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

    Comments: Accepted by interspeech 2022

  6. arXiv:2201.01669  [pdf, other

    eess.AS cs.LG cs.SD

    Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

    Authors: Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, Mert Pilanci

    Abstract: The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a soluti… ▽ More

    Submitted 29 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

  7. arXiv:2110.03887  [pdf, other

    eess.AS cs.SD

    Environment Aware Text-to-Speech Synthesis

    Authors: Daxin Tan, Guangyan Zhang, Tan Lee

    Abstract: This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in TTS system development. The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condi… ▽ More

    Submitted 6 August, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted by Interspeech 2022

  8. arXiv:2110.03857  [pdf, other

    eess.AS cs.CL cs.SD

    A study on the efficacy of model pre-training in develo** neural text-to-speech system

    Authors: Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

    Abstract: In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data. This study aims to understand bet… ▽ More

    Submitted 7 October, 2021; originally announced October 2021.

  9. arXiv:2108.02821  [pdf, other

    eess.AS cs.SD

    Applying the Information Bottleneck Principle to Prosodic Representation Learning

    Authors: Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

    Abstract: This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE quantized layer is incorporated in the speech generation model to control the IB capacity and adjust the balance between reconstruction power and dise… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

    Comments: To be appeared in Interspeech 2021

  10. arXiv:2107.01554  [pdf, other

    eess.AS cs.SD

    EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

    Authors: Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bi… ▽ More

    Submitted 7 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted by ASRU 2021

  11. arXiv:2103.04699  [pdf, other

    eess.AS cs.SD

    CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

    Authors: Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

    Abstract: This paper presents the CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge. The challenge provides two Mandarin speech corpora: the AIShell-3 corpus of 218 speakers with noise and reverberation and the MST corpus including high-quality speech of one male and one female speakers. 100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectiv… ▽ More

    Submitted 5 July, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

  12. arXiv:2102.07982  [pdf, other

    cs.SD eess.AS

    Voice Gender Scoring and Independent Acoustic Characterization of Perceived Masculinity and Femininity

    Authors: Fuling Chen, Roberto Togneri, Murray Maybery, Diana Tan

    Abstract: Previous research has found that voices can provide reliable information to be used for gender classification with a high level of accuracy. In social psychology, perceived masculinity and femininity (masculinity and femininity rated by humans) has often been considered an important feature when investigating the influence of vocal features on social behaviours. While previous studies have charact… ▽ More

    Submitted 4 August, 2022; v1 submitted 16 February, 2021; originally announced February 2021.

    Comments: 24 pages, 7 figures, journal

  13. Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement

    Authors: Daxin Tan, Tan Lee

    Abstract: This paper presents a novel design of neural network system for fine-grained style modeling, transfer and prediction in expressive text-to-speech (TTS) synthesis. Fine-grained modeling is realized by extracting style embeddings from the mel-spectrograms of phone-level speech segments. Collaborative learning and adversarial learning strategies are applied in order to achieve effective disentangleme… ▽ More

    Submitted 7 October, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted by Interspeech 2021

  14. arXiv:2008.07358  [pdf, other

    cs.CV eess.IV

    SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: Point clouds are often the default choice for many applications as they exhibit more flexibility and efficiency than volumetric data. Nevertheless, their unorganized nature -- points are stored in an unordered way -- makes them less suited to be processed by deep learning pipelines. In this paper, we propose a method for 3D object completion and classification based on point clouds. We introduce a… ▽ More

    Submitted 17 August, 2020; originally announced August 2020.

    Comments: accepted in ECCV 2020 as oral

  15. arXiv:1909.01106  [pdf, other

    cs.CV cs.AI cs.CG cs.LG eess.IV

    ForkNet: Multi-branch Volumetric Semantic Completion from a Single Depth Image

    Authors: Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

    Abstract: We propose a novel model for 3D semantic completion from a single depth image, based on a single encoder and three separate generators used to reconstruct different geometric and semantic representations of the original and completed scene, all sharing the same latent space. To transfer information between the geometric and semantic branches of the network, we introduce paths between them concaten… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.

    Comments: Accepted in International Conference on Computer Vision 2019