Skip to main content

Showing 1–28 of 28 results for author: Lai, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.14976  [pdf, other

    eess.IV cs.CV

    CoCPF: Coordinate-based Continuous Projection Field for Ill-Posed Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jian-Huang Lai, Xiaohua Xie

    Abstract: Sparse-view computed tomography (SVCT) reconstruction aims to acquire CT images based on sparsely-sampled measurements. It allows the subjects exposed to less ionizing radiation, reducing the lifetime risk of develo** cancers. Recent researches employ implicit neural representation (INR) techniques to reconstruct CT images from a single SV sinogram. However, due to ill-posedness, these INR-based… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  2. arXiv:2405.07037  [pdf, other

    eess.SY math.OC

    Robust Online Convex Optimization for Disturbance Rejection

    Authors: Joyce Lai, Peter Seiler

    Abstract: Online convex optimization (OCO) is a powerful tool for learning sequential data, making it ideal for high precision control applications where the disturbances are arbitrary and unknown in advance. However, the ability of OCO-based controllers to accurately learn the disturbance while maintaining closed-loop stability relies on having an accurate model of the plant. This paper studies the perform… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  3. arXiv:2401.01755  [pdf, other

    cs.SD cs.AI eess.AS

    Incremental FastPitch: Chunk-based High Quality Text to Speech

    Authors: Muyang Du, Chuan Liu, Junjie Lai

    Abstract: Parallel text-to-speech models have been widely applied for real-time speech synthesis, and they offer more controllability and a much faster synthesis process compared with conventional auto-regressive models. Although parallel models have benefits in many aspects, they become naturally unfit for incremental synthesis due to their fully parallel architecture such as transformer. In this work, we… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 1 table

  4. arXiv:2312.17508  [pdf, ps, other

    eess.AS cs.AI cs.SD

    Attention-based Interactive Disentangling Network for Instance-level Emotional Voice Conversion

    Authors: Yun Chen, Lingxiao Yang, Qi Chen, Jian-Huang Lai, Xiaohua Xie

    Abstract: Emotional Voice Conversion aims to manipulate a speech according to a given emotion while preserving non-emotion components. Existing approaches cannot well express fine-grained emotional attributes. In this paper, we propose an Attention-based Interactive diseNtangling Network (AINN) that leverages instance-wise emotional knowledge for voice conversion. We introduce a two-stage pipeline to effect… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by INTERSPEECH 2023

  5. arXiv:2310.13259  [pdf

    eess.IV cs.CV

    Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

    Authors: Jeremy Lai, Faruk Ahmed, Supriya Vijay, Tiam Jaroensri, Jessica Loo, Saurabh Vyawahare, Saloni Agarwal, Fayaz Jamil, Yossi Matias, Greg S. Corrado, Dale R. Webster, Jonathan Krause, Yun Liu, Po-Hsuan Cameron Chen, Ellery Wulczyn, David F. Steiner

    Abstract: Task-specific deep learning models in histopathology offer promising opportunities for improving diagnosis, clinical research, and precision medicine. However, development of such models is often limited by availability of high-quality data. Foundation models in histopathology that learn general representations across a wide range of tissue types, diagnoses, and magnifications offer the potential… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 4 main tables, 3 main figures, additional supplemental tables and figures

  6. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  7. arXiv:2309.09843  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Instruction-Following Speech Recognition

    Authors: Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

    Abstract: Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions. With the advent of Large Language Models (LLMs) in speech processing, more organic, text-prompt-based interactions have become possible. However, the mechanisms behind these models' speech understanding and "reasoning" capabilities remai… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  8. arXiv:2307.05270  [pdf, other

    eess.IV cs.CV

    APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

    Authors: Zixuan Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie

    Abstract: Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based map** between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  9. arXiv:2305.11686  [pdf, other

    eess.IV cs.CV cs.RO

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual envi… ▽ More

    Submitted 27 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883

  10. arXiv:2305.10883  [pdf, other

    cs.AI cs.CV eess.IV

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real d… ▽ More

    Submitted 27 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release

  11. arXiv:2303.16242  [pdf, other

    eess.IV cs.CV

    CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution

    Authors: Zixuan Chen, Jian-Huang Lai, Lingxiao Yang, Xiaohua Xie

    Abstract: Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their application in various scenarios. To overcome these li… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: This paper is accepted by the International Conference on Computer Vision (ICCV) 2023

  12. arXiv:2303.14133  [pdf, other

    eess.IV cs.CR cs.CV

    Adversarial Attack and Defense for Medical Image Analysis: Methods and Applications

    Authors: Junhao Dong, Junxi Chen, Xiaohua Xie, Jianhuang Lai, Hao Chen

    Abstract: Deep learning techniques have achieved superior performance in computer-aided medical image analysis, yet they are still vulnerable to imperceptible adversarial attacks, resulting in potential misdiagnosis in clinical practice. Oppositely, recent years have also witnessed remarkable progress in defense against these tailored adversarial examples in deep medical diagnosis systems. In this expositio… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  13. arXiv:2211.13939  [pdf, other

    cs.SD cs.LG eess.AS

    Efficient Incremental Text-to-Speech on GPUs

    Authors: Muyang Du, Chuan Liu, Jiaxing Qi, Junjie Lai

    Abstract: Incremental text-to-speech, also known as streaming TTS, has been increasingly applied to online speech applications that require ultra-low response latency to provide an optimal user experience. However, most of the existing speech synthesis pipelines deployed on GPU are still non-incremental, which uncovers limitations in high-concurrency scenarios, especially when the pipeline is built with end… ▽ More

    Submitted 5 December, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures

  14. arXiv:2211.04717  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Noisy Student Training on Non-target Domain Data for Automatic Speech Recognition

    Authors: Yu Chen, Wen Ding, Junjie Lai

    Abstract: Noisy Student Training (NST) has recently demonstrated extremely strong performance in Automatic Speech Recognition(ASR). In this paper, we propose a data selection strategy named LM Filter to improve the performance of NST on non-target domain data in ASR tasks. Hypotheses with and without a Language Model are generated and the CER differences between them are utilized as a filter threshold. Resu… ▽ More

    Submitted 1 March, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: This paper is accepted by the ICASSP 2023 conference

  15. arXiv:2207.00001  [pdf

    cs.CV eess.IV

    MultiEarth 2022 -- The Champion Solution for Image-to-Image Translation Challenge via Generation Models

    Authors: Yuchuan Gou, Bo Peng, Hongchen Liu, Hang Zhou, Jui-Hsin Lai

    Abstract: The MultiEarth 2022 Image-to-Image Translation challenge provides a well-constrained test bed for generating the corresponding RGB Sentinel-2 imagery with the given Sentinel-1 VV & VH imagery. In this challenge, we designed various generation models and found the SPADE [1] and pix2pixHD [2] models could perform our best results. In our self-evaluation, the SPADE-2 model with L1-loss can achieve 0.… ▽ More

    Submitted 17 June, 2022; originally announced July 2022.

    Comments: CVPR 2022, MultiEarth 2022, Image-to-Image translation, competition

  16. arXiv:2204.02524  [pdf, other

    cs.SD cs.CL eess.AS

    Simple and Effective Unsupervised Speech Synthesis

    Authors: Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

    Abstract: We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe. The framework leverages recent work in unsupervised speech recognition as well as existing neural-based speech synthesis. Using only unlabeled speech audio and unlabeled text as well as a lexicon, our method enables speech synthesis without the need for a human-labeled corpus. Experiments demonstra… ▽ More

    Submitted 20 April, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: preprint, equal contribution from first two authors

  17. arXiv:2203.06849  [pdf, other

    cs.CL cs.SD eess.AS

    SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

    Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: ACL 2022 main conference

  18. arXiv:2110.09784  [pdf, other

    cs.SD cs.AI eess.AS

    SSAST: Self-Supervised Audio Spectrogram Transformer

    Authors: Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

    Abstract: Recently, neural networks based purely on self-attention, such as the Vision Transformer (ViT), have been shown to outperform deep learning models constructed with convolutional neural networks (CNNs) on various vision tasks, thus extending the success of Transformers, which were originally developed for language processing, to the vision domain. A recent study showed that a similar methodology ca… ▽ More

    Submitted 10 February, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

    Comments: Accepted at AAAI2022. Code at https://github.com/YuanGongND/ssast

  19. arXiv:2110.01147  [pdf, other

    cs.SD cs.CL eess.AS

    On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

    Authors: Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

    Abstract: Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explored several… ▽ More

    Submitted 27 October, 2021; v1 submitted 3 October, 2021; originally announced October 2021.

  20. arXiv:2107.07873  [pdf

    eess.SP physics.optics

    Metasurface-Enabled On-Chip Multiplexed Diffractive Neural Networks in the Visible

    Authors: Xuhao Luo, Yueqiang Hu, Xin Li, Xiangnian Ou, Jiajie Lai, Na Liu, Huigao Duan

    Abstract: Replacing electrons with photons is a compelling route towards light-speed, highly parallel, and low-power artificial intelligence computing. Recently, all-optical diffractive neural deep neural networks have been demonstrated. However, the existing architectures often comprise bulky components and, most critically, they cannot mimic the human brain for multitasking. Here, we demonstrate a multi-s… ▽ More

    Submitted 13 July, 2021; originally announced July 2021.

  21. arXiv:2106.05933  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

    Authors: Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

    Abstract: Self-supervised speech representation learning (speech SSL) has demonstrated the benefit of scale in learning rich representations for Automatic Speech Recognition (ASR) with limited paired data, such as wav2vec 2.0. We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results. However, directly applying widely adopted prunin… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  22. arXiv:2105.01051  [pdf, ps, other

    cs.CL cs.SD eess.AS

    SUPERB: Speech processing Universal PERformance Benchmark

    Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More

    Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear in Interspeech 2021

  23. arXiv:2104.07200  [pdf, other

    eess.SY

    A Novel Unified Framework for Solving Reachability, Viability and Invariance Problems

    Authors: Wei Liao, Taotao Liang, Xiaohui Wei, Jizhou Lai

    Abstract: The level set method is a widely used tool for solving reachability and invariance problems. However, some shortcomings, such as the difficulties of handling dissipation function and constructing terminal conditions for solving the Hamilton-Jacobi partial differential equation, limit the application of the level set method in some problems with non-affine nonlinear systems and irregular target set… ▽ More

    Submitted 29 November, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:2101.09646

  24. arXiv:2103.05576  [pdf, other

    eess.SY cs.MA

    Distributed Frequency Restoration and SoC Balancing Control for AC Microgrids

    Authors: Chang Yu, Xiaoqing Lu, **gang Lai, Li Chai

    Abstract: This paper develops an improved distributed finite-time control algorithm for multiagent-based ac microgrids with battery energy storage systems (BESSs) utilizing a low-width communication network. The proposed control algorithm can simultaneously coordinate BESSs to eliminate any deviation from the nominal frequency as well as solving the state of charge (SoC) balancing problem. The stability of… ▽ More

    Submitted 9 March, 2021; originally announced March 2021.

  25. arXiv:2012.09131  [pdf, other

    cs.HC cs.CV cs.CY cs.LG eess.SY

    Personal Mental Health Navigator: Harnessing the Power of Data, Personal Models, and Health Cybernetics to Promote Psychological Well-being

    Authors: Amir M. Rahmani, Jocelyn Lai, Salar Jafarlou, Asal Yunusova, Alex. P. Rivera, Sina Labbaf, Sirui Hu, Arman Anzanpour, Nikil Dutt, Ramesh Jain, Jessica L. Borelli

    Abstract: Traditionally, the regime of mental healthcare has followed an episodic psychotherapy model wherein patients seek care from a provider through a prescribed treatment plan developed over multiple provider visits. Recent advances in wearable and mobile technology have generated increased interest in digital mental healthcare that enables individuals to address episodic mental health symptoms. Howeve… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

  26. arXiv:2011.06209  [pdf, other

    eess.SY

    Recursive Regret Matching: A General Method for Solving Time-invariant Nonlinear Zero-sum Differential Games

    Authors: Wei Liao, Xiaohui Wei, Jizhou Lai

    Abstract: In this paper, a new method is proposed to compute the rolling Nash equilibrium of the time-invariant nonlinear two-person zero-sum differential games. The idea is to discretize the time to transform a differential game into a sequential game with several steps, and by introducing state-value function, transform the sequential game into a recursion consisting of several normal-form games, finally,… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: 18 pages, 9 figures

    MSC Class: 91-08; 93-08

  27. arXiv:2010.06236  [pdf, other

    eess.SY cs.LG math.OC

    Average Cost Optimal Control of Stochastic Systems Using Reinforcement Learning

    Authors: **g Lai, Junlin Xiong

    Abstract: This paper addresses the average cost minimization problem for discrete-time systems with multiplicative and additive noises via reinforcement learning. By using Q-function, we propose an online learning scheme to estimate the kernel matrix of Q-function and to update the control gain using the data along the system trajectories. The obtained control gain and kernel matrix are proved to converge t… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: 6 pages, 2 figures

  28. arXiv:2008.08734  [pdf, ps, other

    eess.SY cs.LG math.OC stat.ML

    Model-free optimal control of discrete-time systems with additive and multiplicative noises

    Authors: **g Lai, Junlin Xiong, Zhan Shu

    Abstract: This paper investigates the optimal control problem for a class of discrete-time stochastic systems subject to additive and multiplicative noises. A stochastic Lyapunov equation and a stochastic algebra Riccati equation are established for the existence of the optimal admissible control policy. A model-free reinforcement learning algorithm is proposed to learn the optimal admissible control policy… ▽ More

    Submitted 19 August, 2020; originally announced August 2020.

    Comments: 8 pages, 3 figures