Skip to main content

Showing 1–37 of 37 results for author: Tan, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.11809  [pdf, other

    cs.LG cs.RO eess.SY

    Physics-Constrained Learning for PDE Systems with Uncertainty Quantified Port-Hamiltonian Models

    Authors: Kaiyuan Tan, Peilun Li, Thomas Beckers

    Abstract: Modeling the dynamics of flexible objects has become an emerging topic in the community as these objects become more present in many applications, e.g., soft robotics. Due to the properties of flexible materials, the movements of soft objects are often highly nonlinear and, thus, complex to predict. Data-driven approaches seem promising for modeling those complex dynamics but often neglect basic p… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.11619  [pdf, other

    eess.AS cs.LG

    AV-CrossNet: an Audiovisual Complex Spectral Map** Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling

    Authors: Vahid Ahmadi Kalkhorani, Cheng Yu, Anurag Kumar, Ke Tan, Buye Xu, DeLiang Wang

    Abstract: Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the CrossNet architecture, which is a recently proposed network that performs complex spectral map** for speech separation by lever… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 10 pages, 4 Figures, and 4 Tables

  3. arXiv:2406.07390  [pdf, other

    eess.SP cs.IT eess.IV

    DiffCom: Channel Received Signal is a Natural Condition to Guide Diffusion Posterior Sampling

    Authors: Sixian Wang, **cheng Dai, Kailin Tan, Xiaoqi Qin, Kai Niu, ** Zhang

    Abstract: End-to-end visual communication systems typically optimize a trade-off between channel bandwidth costs and signal-level distortion metrics. However, under challenging physical conditions, this traditional discriminative communication paradigm often results in unrealistic reconstructions with perceptible blurring and aliasing artifacts, despite the inclusion of perceptual or adversarial losses for… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  4. arXiv:2406.07069  [pdf, other

    cs.RO eess.SY

    Optimal Gait Control for a Tendon-driven Soft Quadruped Robot by Model-based Reinforcement Learning

    Authors: Xuezhi Niu, Kaige Tan, Lei Feng

    Abstract: This study presents an innovative approach to optimal gait control for a soft quadruped robot enabled by four Compressible Tendon-driven Soft Actuators (CTSAs). Improving our previous studies of using model-free reinforcement learning for gait control, we employ model-based reinforcement learning (MBRL) to further enhance the performance of the gait controller. Compared to rigid robots, the propos… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  5. arXiv:2406.07065  [pdf, other

    cs.RO eess.SY

    Optimal Gait Design for a Soft Quadruped Robot via Multi-fidelity Bayesian Optimization

    Authors: Kaige Tan, Xuezhi Niu, Qinglei Ji, Lei Feng, Martin Törngren

    Abstract: This study focuses on the locomotion capability improvement in a tendon-driven soft quadruped robot through an online adaptive learning approach. Leveraging the inverse kinematics model of the soft quadruped robot, we employ a central pattern generator to design a parametric gait pattern, and use Bayesian optimization (BO) to find the optimal parameters. Further, to address the challenges of model… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  6. arXiv:2403.01369  [pdf, other

    eess.AS cs.AI cs.LG

    A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

    Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 8 pages; Shorter form accepted in ICASSP 2024

  7. arXiv:2312.16884  [pdf

    eess.AS cs.SD

    Binaural recording methods with analysis on inter-aural time, level, and phase differences

    Authors: Johann Kay Ann Tan

    Abstract: Binaural recordings are a form of stereophonic recording method that replicates how human ears perceive sound, these types of recordings create a 3D aural image around the listener and are extremely immersive when well recorded and listened to appropriately with headphones. It has wide applications in video, podcast, and gaming formats -- allowing the listener to feel like they are there. Although… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  8. arXiv:2311.15959  [pdf, other

    cs.SD cs.AI eess.AS

    CheapNET: Improving Light-weight speech enhancement network by projected loss function

    Authors: Kaijun Tan, Benzhe Dai, Jiakui Li, Wenyu Mao

    Abstract: Noise suppression and echo cancellation are critical in speech enhancement and essential for smart devices and real-time communication. Deployed in voice processing front-ends and edge devices, these algorithms must ensure efficient real-time inference with low computational demands. Traditional edge-based noise suppression often uses MSE-based amplitude spectrum mask training, but this approach h… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  9. arXiv:2310.17116  [pdf, other

    eess.AS cs.SD

    Real-time Neonatal Chest Sound Separation using Deep Learning

    Authors: Yang Yi Poh, Ethan Grooby, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad

    Abstract: Auscultation for neonates is a simple and non-invasive method of providing diagnosis for cardiovascular and respiratory disease. Such diagnosis often requires high-quality heart and lung sounds to be captured during auscultation. However, in most cases, obtaining such high-quality sounds is non-trivial due to the chest sounds containing a mixture of heart, lung, and noise sounds. As such, addition… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  10. arXiv:2310.07284  [pdf, other

    eess.AS cs.CL

    Ty** to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction

    Authors: Xiang Hao, Jibin Wu, Jianwei Yu, Chenglin Xu, Kay Chen Tan

    Abstract: Humans possess an extraordinary ability to selectively focus on the sound source of interest amidst complex acoustic environments, commonly referred to as cocktail party scenarios. In an attempt to replicate this remarkable auditory attention capability in machines, target speaker extraction (TSE) models have been developed. These models leverage the pre-registered cues of the target speaker to ex… ▽ More

    Submitted 14 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Under review, https://github.com/haoxiangsnr/llm-tse

  11. arXiv:2305.19557  [pdf, other

    math.OC cs.LG eess.SP stat.ML

    Dictionary Learning under Symmetries via Group Representations

    Authors: Subhroshekhar Ghosh, Aaron Y. R. Low, Yong Sheng Soh, Zhuohang Feng, Brendan K. Y. Tan

    Abstract: The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We s… ▽ More

    Submitted 25 July, 2023; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: 29 pages, 2 figures

  12. arXiv:2304.01448  [pdf, other

    eess.AS

    TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio

    Authors: Anurag Kumar, Ke Tan, Zhaoheng Ni, Pranay Manocha, Xiaohui Zhang, Ethan Henderson, Buye Xu

    Abstract: Measuring quality and intelligibility of a speech signal is usually a critical step in development of speech processing systems. To enable this, a variety of metrics to measure quality and intelligibility under different assumptions have been developed. Through this paper, we introduce tools and a set of models to estimate such known metrics using deep neural networks. These models are made availa… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023

  13. arXiv:2301.04320  [pdf, other

    cs.SD cs.LG eess.AS

    Rethinking complex-valued deep neural networks for monaural speech enhancement

    Authors: Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

    Abstract: Despite multiple efforts made towards adopting complex-valued deep neural networks (DNNs), it remains an open question whether complex-valued DNNs are generally more effective than real-valued DNNs for monaural speech enhancement. This work is devoted to presenting a critical assessment by systematically examining complex-valued DNNs against their real-valued counterparts. Specifically, we investi… ▽ More

    Submitted 11 January, 2023; originally announced January 2023.

  14. arXiv:2211.08624  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Map** for Single-channel Speech Enhancement

    Authors: Kuan-Lin Chen, Daniel D. E. Wong, Ke Tan, Buye Xu, Anurag Kumar, Vamsi Krishna Ithapu

    Abstract: Most speech enhancement (SE) models learn a point estimate and do not make use of uncertainty estimation in the learning process. In this paper, we show that modeling heteroscedastic uncertainty by minimizing a multivariate Gaussian negative log-likelihood (NLL) improves SE performance at no extra cost. During training, our approach augments a model learning complex spectral map** with a tempora… ▽ More

    Submitted 8 March, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: 5 pages. Accepted at ICASSP 2023

  15. arXiv:2211.04339  [pdf, other

    cs.IT cs.LG eess.SP

    Toward Adaptive Semantic Communications: Efficient Data Transmission via Online Learned Nonlinear Transform Source-Channel Coding

    Authors: **cheng Dai, Sixian Wang, Ke Yang, Kailin Tan, Xiaoqi Qin, Zhongwei Si, Kai Niu, ** Zhang

    Abstract: The emerging field semantic communication is driving the research of end-to-end data transmission. By utilizing the powerful representation ability of deep learning models, learned data transmission schemes have exhibited superior performance than the established source and channel coding methods. While, so far, research efforts mainly concentrated on architecture and model improvements toward a s… ▽ More

    Submitted 24 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

    Comments: Accepted by IEEE JSAC

  16. arXiv:2210.06664  [pdf

    eess.IV cs.AI cs.CV

    Are Macula or Optic Nerve Head Structures better at Diagnosing Glaucoma? An Answer using AI and Wide-Field Optical Coherence Tomography

    Authors: Charis Y. N. Chiang, Fabian Braeu, Thanadet Chuangsuwanich, Royston K. Y. Tan, Jacqueline Chua, Leopold Schmetterer, Alexandre Thiery, Martin Buist, Michaël J. A. Girard

    Abstract: Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field sw… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: 23 pages, 5 figures

  17. arXiv:2201.10105  [pdf, other

    eess.AS cs.LG cs.SD eess.SP q-bio.QM

    Prediction of Neonatal Respiratory Distress in Term Babies at Birth from Digital Stethoscope Recorded Chest Sounds

    Authors: Ethan Grooby, Chiranjibi Sitaula, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Guy A. Dumont, Faezeh Marzbanrad

    Abstract: Neonatal respiratory distress is a common condition that if left untreated, can lead to short- and long-term complications. This paper investigates the usage of digital stethoscope recorded chest sounds taken within 1min post-delivery, to enable early detection and prediction of neonatal respiratory distress. Fifty-one term newborns were included in this study, 9 of whom developed respiratory dist… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: 4 pages, 2 figures, 1 table. Paper submitted for potential publication as conference paper at 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2022

    Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

  18. arXiv:2201.03211  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Noisy Neonatal Chest Sound Separation for High-Quality Heart and Lung Sounds

    Authors: Ethan Grooby, Chiranjibi Sitaula, Davood Fattahi, Reza Sameni, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Guy A. Dumont, Faezeh Marzbanrad

    Abstract: Stethoscope-recorded chest sounds provide the opportunity for remote cardio-respiratory health monitoring of neonates. However, reliable monitoring requires high-quality heart and lung sounds. This paper presents novel Non-negative Matrix Factorisation (NMF) and Non-negative Matrix Co-Factorisation (NMCF) methods for neonatal chest sound separation. To assess these methods and compare with existin… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

    Comments: 12 pages, 4 figures, 3 tables. Paper submitted and under review for possible publication in IEEE

    Journal ref: IEEE Journal of Biomedical and Health Informatics, 2022

  19. Verifying Switched System Stability With Logic

    Authors: Yong Kiam Tan, Stefan Mitsch, André Platzer

    Abstract: Switched systems are known to exhibit subtle (in)stability behaviors requiring system designers to carefully analyze the stability of closed-loop systems that arise from their proposed switching control laws. This paper presents a formal approach for verifying switched system stability that blends classical ideas from the controls and verification literature using differential dynamic logic (dL),… ▽ More

    Submitted 8 April, 2022; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: Long version of paper at HSCC 2022 (25th ACM International Conference on Hybrid Systems: Computation and Control, May 4-6, 2022)

    MSC Class: 03B70; 34A38; 93C30 ACM Class: F.3.1; F.4.1; G.1.7; I.2.3

  20. arXiv:2110.08664  [pdf, other

    cs.SE cs.AI eess.SY

    Finding Critical Scenarios for Automated Driving Systems: A Systematic Literature Review

    Authors: Xinhai Zhang, Jianbo Tao, Kaige Tan, Martin Törngren, José Manuel Gaspar Sánchez, Muhammad Rusyadi Ramli, Xin Tao, Magnus Gyllenhammar, Franz Wotawa, Naveen Mohan, Mihai Nica, Hermann Felbinger

    Abstract: Scenario-based approaches have been receiving a huge amount of attention in research and engineering of automated driving systems. Due to the complexity and uncertainty of the driving environment, and the complexity of the driving task itself, the number of possible driving scenarios that an ADS or ADAS may encounter is virtually infinite. Therefore it is essential to be able to reason about the i… ▽ More

    Submitted 16 October, 2021; originally announced October 2021.

    Comments: 37 pages, 24 figures

  21. arXiv:2110.04289  [pdf, other

    eess.AS cs.SD

    Location-based training for multi-channel talker-independent speaker separation

    Authors: Hassan Taherian, Ke Tan, DeLiang Wang

    Abstract: Permutation-invariant training (PIT) is a dominant approach for addressing the permutation ambiguity problem in talker-independent speaker separation. Leveraging spatial information afforded by microphone arrays, we propose a new training approach to resolving permutation ambiguities for multi-channel speaker separation. The proposed approach, named location-based training (LBT), assigns speakers… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: submitted to ICASSP 22

  22. arXiv:2109.15127  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Real-Time Multi-Level Neonatal Heart and Lung Sound Quality Assessment for Telehealth Applications

    Authors: Ethan Grooby, Chiranjibi Sitaula, Davood Fattahi, Reza Sameni, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Guy A. Dumont, Faezeh Marzbanrad

    Abstract: Digital stethoscopes in combination with telehealth allow chest sounds to be easily collected and transmitted for remote monitoring and diagnosis. Chest sounds contain important information about a newborn's cardio-respiratory health. However, low-quality recordings complicate the remote monitoring and diagnosis. In this study, a new method is proposed to objectively and automatically assess heart… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 13 pages, 8 figures, 3 tables. Paper submitted and under review in IEEE Access

    Journal ref: IEEE Access, 2022

  23. Switched Systems as Hybrid Programs

    Authors: Yong Kiam Tan, André Platzer

    Abstract: Real world systems of interest often feature interactions between discrete and continuous dynamics. Various hybrid system formalisms have been used to model and analyze this combination of dynamics, ranging from mathematical descriptions, e.g., using impulsive differential equations and switching, to automata-theoretic and language-based approaches. This paper bridges two such formalisms by showin… ▽ More

    Submitted 29 April, 2021; v1 submitted 15 January, 2021; originally announced January 2021.

    Comments: Long version of paper at ADHS 2021 (7th IFAC Conference on Analysis and Design of Hybrid Systems, July 7-9, 2021)

    MSC Class: 03B70; 34A38; 93C30 ACM Class: F.3.1; F.4.1; G.1.7; I.2.3

  24. SAGRNN: Self-Attentive Gated RNN for Binaural Speaker Separation with Interaural Cue Preservation

    Authors: Ke Tan, Buye Xu, Anurag Kumar, Eliya Nachmani, Yossi Adi

    Abstract: Most existing deep learning based binaural speaker separation systems focus on producing a monaural estimate for each of the target speakers, and thus do not preserve the interaural cues, which are crucial for human listeners to perform sound localization and lateralization. In this study, we address talker-independent binaural speaker separation with interaural cues preserved in the estimated bin… ▽ More

    Submitted 14 November, 2020; v1 submitted 2 September, 2020; originally announced September 2020.

    Comments: 5 pages, accepted by IEEE Signal Processing Letters

  25. arXiv:2008.08791  [pdf, other

    cs.HC cs.CV eess.IV

    Facial movement synergies and Action Unit detection from distal wearable Electromyography and Computer Vision

    Authors: Monica Perusquia-Hernandez, Felix Dollack, Chun Kwang Tan, Shushi Namba, Saho Ayabe-Kanamura, Kenji Suzuki

    Abstract: Distal facial Electromyography (EMG) can be used to detect smiles and frowns with reasonable accuracy. It capitalizes on volume conduction to detect relevant muscle activity, even when the electrodes are not placed directly on the source muscle. The main advantage of this method is to prevent occlusion and obstruction of the facial expression production, whilst allowing EMG measurements. However,… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: 11 pages, 11 figures, 2 tables

  26. arXiv:2004.12599  [pdf, other

    cs.CV eess.IV

    Deploying Image Deblurring across Mobile Devices: A Perspective of Quality and Latency

    Authors: Cheng-Ming Chiang, Yu Tseng, Yu-Syuan Xu, Hsien-Kai Kuo, Yi-Min Tsai, Guan-Yu Chen, Koan-Sin Tan, Wei-Ting Wang, Yu-Chieh Lin, Shou-Yao Roy Tseng, Wei-Shiang Lin, Chia-Lin Yu, BY Shen, Kloze Kao, Chia-Ming Cheng, Hung-Jen Chen

    Abstract: Recently, image enhancement and restoration have become important applications on mobile devices, such as super-resolution and image deblurring. However, most state-of-the-art networks present extremely high computational complexity. This makes them difficult to be deployed on mobile devices with acceptable latency. Moreover, when deploying to different mobile devices, there is a large latency var… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: CVPR 2020 Workshop on New Trends in Image Restoration and Enhancement (NTIRE)

  27. arXiv:1911.06294  [pdf, other

    eess.SY cs.LG eess.SP

    Deep Reinforcement Learning for Adaptive Traffic Signal Control

    Authors: Kai Liang Tan, Subhadipto Poddar, Anuj Sharma, Soumik Sarkar

    Abstract: Many existing traffic signal controllers are either simple adaptive controllers based on sensors placed around traffic intersections, or optimized by traffic engineers on a fixed schedule. Optimizing traffic controllers is time consuming and usually require experienced traffic engineers. Recent research has demonstrated the potential of using deep reinforcement learning (DRL) in this context. Howe… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: ASME 2019 Dynamic Systems and Control Conference (DSCC), October 9-11, Park City, Utah, USA

  28. Deep Multi-Magnification Networks for Multi-Class Breast Cancer Image Segmentation

    Authors: David Joon Ho, Dig V. K. Yarlagadda, Timothy M. D'Alfonso, Matthew G. Hanna, Anne Grabenstetter, Peter Ntiamoah, Edi Brogi, Lee K. Tan, Thomas J. Fuchs

    Abstract: Pathologic analysis of surgical excision specimens for breast carcinoma is important to evaluate the completeness of surgical excision and has implications for future treatment. This analysis is performed manually by pathologists reviewing histologic slides prepared from formalin-fixed tissue. In this paper, we present Deep Multi-Magnification Network trained by partial annotation for automated mu… ▽ More

    Submitted 4 January, 2021; v1 submitted 28 October, 2019; originally announced October 2019.

    Comments: Accepted at Computerized Medical Imaging and Graphics

  29. arXiv:1909.07352  [pdf, other

    eess.AS cs.SD eess.SP

    Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

    Authors: Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu

    Abstract: Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments. In this study, we address joint speech separation and dereverberation, which aims to separate target speech from background noise, interfering speech and room reverberation. In order to tackle this fundamentally difficult problem, we propose a novel multimodal network that e… ▽ More

    Submitted 10 April, 2020; v1 submitted 16 September, 2019; originally announced September 2019.

    Comments: 13 pages, accepted by IEEE JSTSP Special Issue on Deep Learning for Multi-modal Intelligence across Speech, Language, Vision, and Heterogeneous Signals

  30. arXiv:1906.05372  [pdf, other

    cs.CV eess.IV

    The Herbarium Challenge 2019 Dataset

    Authors: Kiat Chuan Tan, Yulong Liu, Barbara Ambrose, Melissa Tulig, Serge Belongie

    Abstract: Herbarium sheets are invaluable for botanical research, and considerable time and effort is spent by experts to label and identify specimens on them. In view of recent advances in computer vision and deep learning, develo** an automated approach to help experts identify specimens could significantly accelerate research in this area. Whereas most existing botanical datasets comprise photos of spe… ▽ More

    Submitted 15 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Part of the 6th Fine-Grained Visual Categorization Workshop (FGVC6) at CVPR 2019. Dataset available at https://github.com/visipedia/herbarium_comp

  31. arXiv:1903.05073  [pdf, other

    cs.RO cs.LO eess.SY

    A Formal Safety Net for Waypoint Following in Ground Robots

    Authors: Brandon Bohrer, Yong Kiam Tan, Stefan Mitsch, Andrew Sogokon, André Platzer

    Abstract: We present a reusable formally verified safety net that provides end-to-end safety and liveness guarantees for 2D waypoint-following of Dubins-type ground robots with tolerances and acceleration. We: i) Model a robot in differential dynamic logic (dL), and specify assumptions on the controller and robot kinematics, ii) Prove formal safety and liveness properties for waypoint-following with speed l… ▽ More

    Submitted 18 June, 2019; v1 submitted 12 March, 2019; originally announced March 2019.

    MSC Class: 03B70; 34A38; 68Q60; 93C85 ACM Class: I.2.9; D.2.4; F.3.1; C.3

  32. arXiv:1903.04567  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling

    Authors: Peidong Wang, Ke Tan, DeLiang Wang

    Abstract: Monaural speech enhancement has made dramatic advances since the introduction of deep learning a few years ago. Although enhanced speech has been demonstrated to have better intelligibility and quality for human listeners, feeding it directly to automatic speech recognition (ASR) systems trained with noisy speech has not produced expected improvements in ASR performance. The lack of an enhancement… ▽ More

    Submitted 12 March, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

  33. arXiv:1811.09010  [pdf

    cs.SD cs.CL eess.AS

    Deep Learning Based Phase Reconstruction for Speaker Separation: A Trigonometric Perspective

    Authors: Zhong-Qiu Wang, Ke Tan, DeLiang Wang

    Abstract: This study investigates phase reconstruction for deep learning based monaural talker-independent speaker separation in the short-time Fourier transform (STFT) domain. The key observation is that, for a mixture of two sources, with their magnitudes accurately estimated and under a geometric constraint, the absolute phase difference between each source and the mixture can be uniquely determined; in… ▽ More

    Submitted 21 November, 2018; originally announced November 2018.

    Comments: 5 pages, in submission to ICASSP-2019

  34. arXiv:1805.00367  [pdf, other

    eess.SP cs.LG

    A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring

    Authors: Chong Zhang, Geok Soon Hong, Jun-Hong Zhou, Kay Chen Tan, Haizhou Li, Huan Xu, Jihoon Hong, Hian-Leng Chan

    Abstract: In this paper, a multi-state diagnosis and prognosis (MDP) framework is proposed for tool condition monitoring via a deep belief network based multi-state approach (DBNMS). For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation. An appropriate prognostic degradation model is then applied for tool wear… ▽ More

    Submitted 30 April, 2018; originally announced May 2018.

    Comments: 14 pages, 12 figures, 10 tables, submitted to IEEE Transactions on Cybernetics

  35. arXiv:1305.6402  [pdf, other

    eess.SY

    From Parametric Model-based Optimization to robust PID Gain Scheduling

    Authors: Minh Hoang-Tuan Nguyen, Kok Kiong Tan

    Abstract: In chemical process applications, model predictive control effectively deals with input and state constraints during transient operations. However, industrial PID controllers directly manipulates the actuators, so they play the key role in small perturbation robustness. This paper considers the problem of augmenting the commonplace PID with the constraint handling and optimization functionalities… ▽ More

    Submitted 28 May, 2013; originally announced May 2013.

    Comments: 7 pages, 5 figures, submitted to JPC

  36. Enhanced Predictive Ratio Control of Interacting Systems

    Authors: Minh Hoang-Tuan Nguyen, Kok Kiong Tan, Sunan Huang

    Abstract: Ratio control for two interacting processes is proposed with a PID feedforward design based on model predictive control (MPC) scheme. At each sampling instant, the MPC control action minimizes a state-dependent performance index associated with a PID-type state vector, thus yielding a PID-type control structure. Compared to the standard MPC formulations with separated single-variable control, such… ▽ More

    Submitted 28 May, 2013; originally announced May 2013.

    Comments: 9 pages, 7 figures, 1 table

    Journal ref: Journal of Process Control, Volume 21, Issue 7, August 2011, Pages 1115 to 1125

  37. arXiv:1305.6379  [pdf, other

    eess.SY

    Robust Precision Positioning Control on Linear Ultrasonic Motor

    Authors: Minh H-T Nguyen, Kok Kiong Tan, Wenyu Liang, Chek Sing Teo

    Abstract: Ultrasonic motors used in high-precision mechatronics are characterized by strong frictional effects, which are among the main problems in precision motion control. The traditional methods apply model-based nonlinear feedforward to compensate the friction, thus requiring closed-loop stability and safety constraint considerations. Implementation of these methods requires complex designed experiment… ▽ More

    Submitted 28 May, 2013; originally announced May 2013.

    Comments: 6 pages, 8 figures, conference