Skip to main content

Showing 1–24 of 24 results for author: Yeung, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08989  [pdf, other

    eess.AS cs.SD

    ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis

    Authors: Dehua Tao, Daxin Tan, Yu Ting Yeung, Xiao Chen, Tan Lee

    Abstract: Representing speech as discretized units has numerous benefits in supporting downstream spoken language processing tasks. However, the approach has been less explored in speech synthesis of tonal languages like Mandarin Chinese. Our preliminary experiments on Chinese speech synthesis reveal the issue of "tone shift", where a synthesized speech utterance contains correct base syllables but incorrec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. arXiv:2404.06563  [pdf, other

    cs.DB cs.LG cs.MM

    Demonstration of MaskSearch: Efficiently Querying Image Masks for Machine Learning Workflows

    Authors: Lindsey Linxi Wei, Chung Yik Edward Yeung, Hongjian Yu, **gchuan Zhou, Dong He, Magdalena Balazinska

    Abstract: We demonstrate MaskSearch, a system designed to accelerate queries over databases of image masks generated by machine learning models. MaskSearch formalizes and accelerates a new category of queries for retrieving images and their corresponding masks based on mask properties, which support various applications, from identifying spurious correlations learned by models to exploring discrepancies bet… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  3. arXiv:2310.05374  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving End-to-End Speech Processing by Efficient Text Data Utilization with Latent Synthesis

    Authors: Jianqiao Lu, Wenyong Huang, Nianzu Zheng, Xingshan Zeng, Yu Ting Yeung, Xiao Chen

    Abstract: Training a high performance end-to-end speech (E2E) processing model requires an enormous amount of labeled speech data, especially in the era of data-centric artificial intelligence. However, labeled speech data are usually scarcer and more expensive for collection, compared to textual data. We propose Latent Synthesis (LaSyn), an efficient textual data utilization framework for E2E speech proces… ▽ More

    Submitted 24 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Comments: 15 pages, 8 figures, 8 tables, Accepted to EMNLP 2023 Findings

  4. arXiv:2309.11808  [pdf, other

    physics.comp-ph cs.DC cs.MS

    Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package

    Authors: Marcin Rogowski, Brandon C. Y. Yeung, Oliver T. Schmidt, Romit Maulik, Lisandro Dalcin, Matteo Parsani, Gianmarco Mengaldo

    Abstract: We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  5. arXiv:2307.02572  [pdf, other

    cs.LG math.AP physics.flu-dyn physics.geo-ph

    Conditional Korhunen-Loéve regression model with Basis Adaptation for high-dimensional problems: uncertainty quantification and inverse modeling

    Authors: Yu-Hong Yeung, Ramakrishna Tipireddy, David A. Barajas-Solano, Alexandre M. Tartakovsky

    Abstract: We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter field… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: 29 pages, 4 figures, 5 tables

  6. arXiv:2306.05436  [pdf, other

    stat.AP cs.CY

    Remaining Useful Life Modelling with an Escalator Health Condition Analytic System

    Authors: Inez M. Zwetsloot, Yu Lin, Jiaqi Qiu, Lishuai Li, William Ka Fai Lee, Edmond Yin San Yeung, Colman Yiu Wah Yeung, Chris Chun Long Wong

    Abstract: The refurbishment of an escalator is usually linked with its design life as recommended by the manufacturer. However, the actual useful life of an escalator should be determined by its operating condition which is affected by the runtime, workload, maintenance quality, vibration, etc., rather than age only. The objective of this project is to develop a comprehensive health condition analytic syste… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 14 pages, 12 figures, 7 tables

  7. arXiv:2301.11279  [pdf, other

    cs.LG math.AP physics.flu-dyn physics.geo-ph

    Gaussian process regression and conditional Karhunen-Loéve models for data assimilation in inverse problems

    Authors: Yu-Hong Yeung, David A. Barajas-Solano, Alexandre M. Tartakovsky

    Abstract: We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models of physical systems with spatially heterogeneous parameter fields. These fields are approximated using low-dimensional conditional Karhunen-Loéve expansions, which are constructed using Gaussian process regression models of these fields trained on the parameters'… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: 27 pages, 7 figures

  8. arXiv:2206.11250  [pdf, other

    cs.CV

    Depth-aware Glass Surface Detection with Cross-modal Context Mining

    Authors: Jiaying Lin, Yuen Hei Yeung, Rynson W. H. Lau

    Abstract: Glass surfaces are becoming increasingly ubiquitous as modern buildings tend to use a lot of glass panels. This however poses substantial challenges on the operations of autonomous systems such as robots, self-driving cars and drones, as the glass panels can become transparent obstacles to the navigation.Existing works attempt to exploit various cues, including glass boundary context or reflection… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  9. arXiv:2204.05460  [pdf, other

    eess.AS cs.CL cs.SD

    CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction

    Authors: Daxin Tan, Liqun Deng, Nianzu Zheng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This study propose a fully automated system for speech correction and accent reduction. Consider the application scenario that a recorded speech audio contains certain errors, e.g., inappropriate words, mispronunciations, that need to be corrected. The proposed system, named CorrectSpeech, performs the correction in three steps: recognizing the recorded speech and converting it into time-stamped s… ▽ More

    Submitted 13 October, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted by ISCSLP 2022

  10. arXiv:2201.12155  [pdf, other

    cs.CL cs.SD eess.AS

    Reducing language context confusion for end-to-end code-switching automatic speech recognition

    Authors: Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jianhua Tao, Yu Ting Yeung, Liqun Deng

    Abstract: Code-switching deals with alternative languages in communication process. Training end-to-end (E2E) automatic speech recognition (ASR) systems for code-switching is especially challenging as code-switching training data are always insufficient to combat the increased multilingual context confusion due to the presence of more than one language. We propose a language-related attention mechanism to r… ▽ More

    Submitted 29 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2010.14798,the paper has been accepted by Insterspeech 2022

  11. arXiv:2201.10207  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

    Authors: Wenyong Huang, Zhenhe Zhang, Yu Ting Yeung, Xin Jiang, Qun Liu

    Abstract: We introduce a new approach for speech pre-training named SPIRAL which works by learning denoising representation of perturbed data in a teacher-student framework. Specifically, given a speech utterance, we first feed the utterance to a teacher network to obtain corresponding representation. Then the same utterance is perturbed and fed to a student network. The student network is trained to output… ▽ More

    Submitted 6 March, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

    Comments: ICLR 2022

  12. arXiv:2111.08191  [pdf, other

    cs.CL cs.SD eess.AS

    CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

    Authors: Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu

    Abstract: Mispronunciation detection and diagnosis (MDD) is a popular research focus in computer-aided pronunciation training (CAPT) systems. End-to-end (e2e) approaches are becoming dominant in MDD. However an e2e MDD model usually requires entire speech utterances as input context, which leads to significant time latency especially for long paragraphs. We propose a streaming e2e MDD model called CoCA-MDD.… ▽ More

    Submitted 29 June, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

    Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022

  13. arXiv:2108.00037  [pdf, other

    cs.LG math.AP physics.flu-dyn physics.geo-ph

    Physics-Informed Machine Learning Method for Large-Scale Data Assimilation Problems

    Authors: Yu-Hong Yeung, David A. Barajas-Solano, Alexandre M. Tartakovsky

    Abstract: We develop a physics-informed machine learning approach for large-scale data assimilation and parameter estimation and apply it for estimating transmissivity and hydraulic head in the two-dimensional steady-state subsurface flow model of the Hanford Site given synthetic measurements of said variables. In our approach, we extend the physics-informed conditional Karhunen-Loéve expansion (PICKLE) met… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

    Comments: 28 pages, 9 figures submitted to Water Resources Research

  14. arXiv:2107.01554  [pdf, other

    eess.AS cs.SD

    EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

    Authors: Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

    Abstract: This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness. The EditSpeech system is developed upon a neural text-to-speech (NTTS) synthesis framework. Partial inference and bi… ▽ More

    Submitted 7 October, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted by ASRU 2021

  15. arXiv:2106.10132  [pdf, other

    eess.AS cs.CL cs.MM cs.SD eess.SP

    VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion

    Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

    Abstract: One-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally ignores the correlation between different speech representations during training, which causes leakage of content information into the speaker representation and t… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021. Code, pre-trained models and demo are available at https://github.com/Wendison/VQMIVC

  16. arXiv:2106.10127  [pdf, other

    eess.AS cs.CL cs.SD eess.SP

    Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization

    Authors: Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen Meng

    Abstract: Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domai… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: Accepted to Interspeech 2021

  17. arXiv:2010.11657  [pdf, other

    cs.SD cs.CL eess.AS

    The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

    Authors: Renyu Wang, Ruilin Tong, Yu Ting Yeung, Xiao Chen

    Abstract: This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 5 pages, 2 figures, A report about our diarisation system for VoxCeleb Challenge, Interspeech conference workshop

  18. arXiv:2008.05750  [pdf, other

    eess.AS cs.CL cs.SD

    Conv-Transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-End Speech Recognition

    Authors: Wenyong Huang, Wenchao Hu, Yu Ting Yeung, Xiao Chen

    Abstract: Transformer has achieved competitive performance against state-of-the-art end-to-end models in automatic speech recognition (ASR), and requires significantly less training time than RNN-based models. The original Transformer, with encoder-decoder architecture, is only suitable for offline ASR. It relies on an attention mechanism to learn alignments, and encodes input audio bidirectionally. The hig… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: Accepted by INTERSPEECH 2020

  19. arXiv:2007.09028  [pdf, other

    cs.LG cs.AI cs.HC stat.ML

    Sequential Explanations with Mental Model-Based Policies

    Authors: Arnold YS Yeung, Shalmali Joshi, Joseph Jay Williams, Frank Rudzicz

    Abstract: The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations gener… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Accepted into ICML 2020 Workshop on Human Interpretability in Machine Learning (Spotlight)

  20. arXiv:2005.11491  [pdf, other

    cs.DC cs.PF

    Container Profiler: Profiling Resource Utilization of Containerized Big Data Pipelines

    Authors: Varik Hoang, Ling-Hong Hung, David Perez, Huazeng Deng, Raymond Schooley, Niharika Arumilli, Ka Yee Yeung, Wes Lloyd

    Abstract: This paper presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over fifty Linux operating system metrics at the virtual machine, container, and process levels. The Container Profiler supports performing time series profiling at a co… ▽ More

    Submitted 7 February, 2023; v1 submitted 23 May, 2020; originally announced May 2020.

  21. arXiv:1812.10071  [pdf, other

    cs.CV cs.LG

    Coupled Recurrent Network (CRN)

    Authors: Lin Sun, Kui Jia, Yuejia Shen, Silvio Savarese, Dit Yan Yeung, Bertram E. Shi

    Abstract: Many semantic video analysis tasks can benefit from multiple, heterogenous signals. For example, in addition to the original RGB input sequences, sequences of optical flow are usually used to boost the performance of human action recognition in videos. To learn from these heterogenous input sources, existing methods reply on two-stream architectural designs that contain independent, parallel strea… ▽ More

    Submitted 25 March, 2019; v1 submitted 25 December, 2018; originally announced December 2018.

  22. arXiv:1811.00328  [pdf, other

    cs.CE cs.GR math.NA

    AMPS: A Real-time Mesh Cutting Algorithm for Surgical Simulations

    Authors: Yu-Hong Yeung, Alex Pothen, Jessica Crouch

    Abstract: We present the AMPS algorithm, a finite element solution method that combines principal submatrix updates and Schur complement techniques, well-suited for interactive simulations of deformation and cutting of finite element meshes. Our approach features real-time solutions to the updated stiffness matrix systems to account for interactive changes in mesh connectivity and boundary conditions. Updat… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

    Comments: 20 pages, 9 figures, 3 tables

  23. arXiv:1708.03958  [pdf, other

    cs.CV

    Lattice Long Short-Term Memory for Human Action Recognition

    Authors: Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese

    Abstract: Human actions captured in video sequences are three-dimensional signals characterizing visual appearance and motion dynamics. To learn action patterns, existing methods adopt Convolutional and/or Recurrent Neural Networks (CNNs and RNNs). CNN based methods are effective in learning spatial appearances, but are limited in modeling long-term motion dynamics. RNNs, especially Long Short-Term Memory (… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Comments: ICCV2017

  24. arXiv:1706.03147  [pdf, ps, other

    cs.CE math.NA

    AMPS: An Augmented Matrix Formulation for Principal Submatrix Updates with Application to Power Grids

    Authors: Yu-Hong Yeung, Alex Pothen, Mahantesh Halappanavar, Zhenyu Huang

    Abstract: We present AMPS, an augmented matrix approach to update the solution to a linear system of equations when the matrix is modified by a few elements within a principal submatrix. This problem arises in the dynamic security analysis of a power grid, where operators need to perform N - k contingency analysis, i.e., determine the state of the system when exactly k links from N fail. Our algorithms augm… ▽ More

    Submitted 9 June, 2017; originally announced June 2017.

    Comments: 19 pages, 4 figures, 2 tables, SIAM Journal on Scientific Computing

    MSC Class: 65F50; 65F10; 65F05; 65Y20 ACM Class: G.1.3, G.1.10