Skip to main content

Showing 1–33 of 33 results for author: Lu, K

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  2. arXiv:2403.08164  [pdf, other

    cs.SD cs.LG eess.AS

    EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

    Authors: Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

    Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techn… ▽ More

    Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

  3. arXiv:2401.17837  [pdf, ps, other

    eess.SY

    Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances

    Authors: Ke Lu, Dongjun Li, Qun Wang, Kaidi Yang, Lin Zhao, Ziyou Song

    Abstract: This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guaran… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  4. arXiv:2401.00273  [pdf, ps, other

    eess.AS cs.CL

    Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

    Authors: Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

    Abstract: This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond workshop

  5. arXiv:2312.08610  [pdf, other

    eess.AS cs.SD

    A computationally efficient semi-blind source separation based approach for nonlinear echo cancellation based on an element-wise iterative source steering

    Authors: Kunxing Lu, Xianrui Wang, Tetsuya Ueda, Shoji Makino, **gdong Chen

    Abstract: While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and nonlinear distortions. To circumvent these drawbacks, the recently developed ideas on convolutive transfer function (CTF) approximation and nonl… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  6. arXiv:2311.05282  [pdf, other

    physics.optics eess.SP

    Empowering high-dimensional optical fiber communications with integrated photonic processors

    Authors: Kaihang Lu, Zengqi Chen, Hao Chen, Wu Zhou, Zunyue Zhang, Hon Ki Tsang, Yeyu Tong

    Abstract: Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  7. arXiv:2309.09838  [pdf, ps, other

    cs.CL cs.SD eess.AS

    HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

    Authors: Yi-Wei Wang, Ke-Han Lu, Kuan-Yu Chen

    Abstract: With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking modeling and error correction modeling. The former aims to select the hypothesis with the lowest error rate fr… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to Interspeech 2024

  8. arXiv:2309.09510  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

    Authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

    Abstract: Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for bui… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: To appear in the proceedings of ICASSP 2024

  9. arXiv:2308.05756  [pdf, other

    eess.SP cs.LG

    WeldMon: A Cost-effective Ultrasonic Welding Machine Condition Monitoring System

    Authors: Beitong Tian, Kuan-Chieh Lu, Ahmadreza Eslaminia, Yaohui Wang, Chenhui Shao, Klara Nahrstedt

    Abstract: Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors. Ensuring high-quality welding is vital, making tool condition monitoring systems essential for early-stage quality control. However, existing monitoring methods face challenges in cost, downtime, and adaptability. In this paper, we present WeldMon, an affordable… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 9 pages, 5 figures

  10. arXiv:2305.04021  [pdf, other

    cs.CV eess.SY

    A Sea-Land Clutter Classification Framework for Over-the-Horizon-Radar Based on Weighted Loss Semi-supervised GAN

    Authors: Xiaoxuan Zhang, Zengfu Wang, Kun Lu, Quan Pan, Yang Li

    Abstract: Deep convolutional neural network has made great achievements in sea-land clutter classification for over-the-horizon-radar (OTHR). The premise is that a large number of labeled training samples must be provided for a sea-land clutter classifier. In practical engineering applications, it is relatively easy to obtain label-free sea-land clutter samples. However, the labeling process is extremely cu… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: 9 pages

  11. arXiv:2304.04760  [pdf, other

    cs.CV eess.IV

    SAR2EO: A High-resolution Image Translation Framework with Denoising Enhancement

    Authors: Jun Yu, Shenshen Du, Guochen Xie, Renjie Lu, Pengwei Li, Zhongpeng Cai, Keda Lu

    Abstract: Synthetic Aperture Radar (SAR) to electro-optical (EO) image translation is a fundamental task in remote sensing that can enrich the dataset by fusing information from different sources. Recently, many methods have been proposed to tackle this task, but they are still difficult to complete the conversion from low-resolution images to high-resolution images. Thus, we propose a framework, SAR2EO, ai… ▽ More

    Submitted 25 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

  12. Data Augmentation and Classification of Sea-Land Clutter for Over-the-Horizon Radar Using AC-VAEGAN

    Authors: Xiaoxuan Zhang, Zengfu Wang, Kun Lu, Quan Pan

    Abstract: In the sea-land clutter classification of sky-wave over-the-horizon-radar (OTHR), the imbalanced and scarce data leads to a poor performance of the deep learning-based classification model. To solve this problem, this paper proposes an improved auxiliary classifier generative adversarial network~(AC-GAN) architecture, namely auxiliary classifier variational autoencoder generative adversarial netwo… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

    Comments: 13 pages, 16 figures

  13. arXiv:2212.08487  [pdf, other

    cs.HC cs.AI cs.NI eess.SP

    Semantics-Empowered Communication: A Tutorial-cum-Survey

    Authors: Zhilin Lu, Rongpeng Li, Kun Lu, Xianfu Chen, Ekram Hossain, Zhifeng Zhao, Honggang Zhang

    Abstract: Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed… ▽ More

    Submitted 11 November, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: This paper has been accepted for publication in the IEEE Communications Surveys and Tutorials

  14. arXiv:2211.05256  [pdf, other

    eess.IV cs.CV

    Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

    Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

    Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

  15. arXiv:2210.06244  [pdf, other

    cs.CL cs.SD eess.AS

    A context-aware knowledge transferring strategy for CTC-based ASR

    Authors: Ke-Han Lu, Kuan-Yu Chen

    Abstract: Non-autoregressive automatic speech recognition (ASR) modeling has received increasing attention recently because of its fast decoding speed and superior performance. Among representatives, methods based on the connectionist temporal classification (CTC) are still a dominating stream. However, the theoretically inherent flaw, the assumption of independence between tokens, creates a performance bar… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted by SLT 2022

  16. arXiv:2206.13511  [pdf

    eess.SY physics.app-ph

    Design and control analysis of a deployable clustered hyperbolic paraboloid cable net

    Authors: Shuo Ma, Kai Lu, Muhao Chen, Robert E. Skelton

    Abstract: This paper presents an analytical and experimental design and deployment control analysis of a hyperbolic paraboloid cable net based on clustering actuation strategies. First, the dynamics and statics for clustered tensegrity structures (CTS) are given. Then, we propose the topology design of the deployable hyperbolic paraboloid cable net. The deployability of the cable net is achieved by using cl… ▽ More

    Submitted 26 June, 2022; originally announced June 2022.

    Comments: 20 pages, 24 figures

  17. Rethinking Modern Communication from Semantic Coding to Semantic Communication

    Authors: Kun Lu, Qingyang Zhou, Rongpeng Li, Zhifeng Zhao, Xianfu Chen, Jianjun Wu, Honggang Zhang

    Abstract: Modern communications are usually designed to pursue a higher bit-level precision and fewer bits while transmitting a message. This article rethinks these two major features and introduces the concept and advantage of semantics that characterizes a new kind of semantics-aware communication framework, incorporating both the semantic encoding and the semantic communication problem. After analyzing t… ▽ More

    Submitted 8 June, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: Accepted by IEEE Wireless Communications

  18. arXiv:2108.01998  [pdf, other

    eess.SP cs.AI cs.CV cs.LG

    Adversarial Energy Disaggregation for Non-intrusive Load Monitoring

    Authors: Zhekai Du, **g**g Li, Lei Zhu, Ke Lu, Heng Tao Shen

    Abstract: Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis. {NILM aims to help households understand how the energy is used and consequently tell them how to effectively manage the energy, thus allowing energy efficie… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: Accepted to ACM/IMS Trans. on Data Science, codes can be found at https://github.com/li**118/AED

  19. arXiv:2104.07539  [pdf, other

    cs.LG eess.SY

    Multi-Agent Reinforcement Learning Based Coded Computation for Mobile Ad Hoc Computing

    Authors: Baoqian Wang, Junfei Xie, Kejie Lu, Yan Wan, Shengli Fu

    Abstract: Mobile ad hoc computing (MAHC), which allows mobile devices to directly share their computing resources, is a promising solution to address the growing demands for computing resources required by mobile devices. However, offloading a computation task from a mobile device to other mobile devices is a challenging task due to frequent topology changes and link failures because of node mobility, unsta… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

  20. DCT and DST Filtering with Sparse Graph Operators

    Authors: Keng-Shih Lu, Antonio Ortega, Debargha Mukherjee, Yue Chen

    Abstract: Graph filtering is a fundamental tool in graph signal processing. Polynomial graph filters (PGFs), defined as polynomials of a fundamental graph operator, can be implemented in the vertex domain, and usually have a lower complexity than frequency domain filter implementations. In this paper, we focus on the design of filters for graphs with graph Fourier transform (GFT) corresponding to a discrete… ▽ More

    Submitted 21 March, 2021; originally announced March 2021.

    Comments: 16 pages, 11 figures, 5 tables

  21. arXiv:2101.04859  [pdf

    cs.LG eess.SP

    A*HAR: A New Benchmark towards Semi-supervised learning for Class-imbalanced Human Activity Recognition

    Authors: Govind Narasimman, Kangkang Lu, Arun Raja, Chuan Sheng Foo, Mohamed Sabry Aly, Jie Lin, Vijay Chandrasekhar

    Abstract: Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised learning for class-imbalanced HAR. We evaluate state-… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

    Comments: 5 pages, 3 figures

  22. arXiv:2012.14704  [pdf

    cs.CV eess.IV

    Advances in deep learning methods for pavement surface crack detection and identification with visible light visual images

    Authors: Kailiang Lu

    Abstract: Compared to NDT and health monitoring method for cracks in engineering structures, surface crack detection or identification based on visible light images is non-contact, with the advantages of fast speed, low cost and high precision. Firstly, typical pavement (concrete also) crack public data sets were collected, and the characteristics of sample images as well as the random variable factors, inc… ▽ More

    Submitted 2 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: 15 pages, 14 figures, 11 tables

    ACM Class: I.5.4

    Journal ref: Computer Engineering and Science 2022

  23. arXiv:2007.15778  [pdf, other

    cs.CV cs.LG eess.IV

    Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

    Authors: Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, Daguang Xu

    Abstract: Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIM… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Comments: Accepted at Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2020

  24. arXiv:2005.02079  [pdf, other

    eess.SY eess.SP

    OTHR multitarget tracking with a GMRF model of ionospheric parameters

    Authors: Zhen Guo, Zengfu Wang, Hua Lan, Quan Pan, Kun Lu

    Abstract: The ionosphere is the propagation medium for radio waves transmitted by an over-the-horizon radar (OTHR). Ionospheric parameters, typically, virtual ionospheric heights (VIHs), are required to perform coordinate registration for OTHR multitarget tracking and localization. The inaccuracy of ionospheric parameters has a significant deleterious effect on the target localization of OTHR. Therefore, to… ▽ More

    Submitted 5 May, 2020; originally announced May 2020.

    Comments: 16 pages

  25. arXiv:2003.09984  [pdf, other

    eess.SP eess.SY

    Measurement-Level Fusion for OTHR Network Using Message Passing

    Authors: Hua Lan, Zengfu Wang, Xianglong Bai, Quan Pan, Kun Lu

    Abstract: Tracking an unknown number of targets based on multipath measurements provided by an over-the-horizon radar (OTHR) network with a statistical ionospheric model is complicated, which requires solving four subproblems: target detection, target tracking, multipath data association and ionospheric height identification. A joint solution is desired since the four subproblems are highly correlated, but… ▽ More

    Submitted 3 April, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

    Comments: 40 pages, 23 figures

  26. arXiv:2002.08558  [pdf, other

    eess.IV

    Perceptually inspired weighted MSE optimization using irregularity-aware graph Fourier transform

    Authors: Keng-Shih Lu, Antonio Ortega, Debargha Mukherjee, Yue Chen

    Abstract: In image and video coding applications, distortion has been traditionally measured using mean square error (MSE), which suggests the use of orthogonal transforms, such as the discrete cosine transform (DCT). Perceptual metrics such as Structural Similarity (SSIM) are typically used after encoding, but not tied to the encoding process. In this paper, we consider an alternative framework where the g… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 5 pages, 6 figures, submitted to International Conference of Image Processing (ICIP) 2020

  27. Fast Graph Fourier Transforms Based on Graph Symmetry and Bipartition

    Authors: Keng-Shih Lu, Antonio Ortega

    Abstract: The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations wi… ▽ More

    Submitted 18 July, 2019; originally announced July 2019.

    Comments: 14 pages, 15 figures

  28. arXiv:1904.01509  [pdf, other

    cs.LG cs.CV cs.GR eess.IV stat.ML

    FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

    Authors: Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu

    Abstract: Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presenc… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: 9 pages, 7 figures

    Journal ref: 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

  29. arXiv:1811.03853  [pdf, other

    cs.LG cs.AI eess.SY

    Sample-Efficient Policy Learning based on Completely Behavior Cloning

    Authors: Qiming Zou, Ling Wang, Ke Lu, Yu Li

    Abstract: Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algor… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

  30. arXiv:1811.03846  [pdf, other

    eess.SY

    Computation Load Balancing Real-Time Model Predictive Control in Urban Traffic Networks

    Authors: Qiming Zou, Ke Lu, Yu Li

    Abstract: Owing to the rapid growth number of vehicles, urban traffic congestion has become more and more severe in the last decades. As an effective approach, Model Predictive Control (MPC) has been applied to urban traffic signal control system. However, the potentially high online computation burden may limit its further application for real scenarios. In this paper, a new approach based on online active… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

  31. arXiv:1711.00213  [pdf, other

    eess.SP

    Closed Form Solutions of Combinatorial Graph Laplacian Estimation under Acyclic Topology Constraints

    Authors: Keng-Shih Lu, Antonio Ortega

    Abstract: How to obtain a graph from data samples is an important problem in graph signal processing. One way to formulate this graph learning problem is based on Gaussian maximum likelihood estimation, possibly under particular topology constraints. To solve this problem, we typically require iterative convex optimization solvers. In this paper, we show that when the target graph topology does not contain… ▽ More

    Submitted 1 November, 2017; originally announced November 2017.

  32. arXiv:1612.04913  [pdf, ps, other

    eess.SY

    Distributed Algorithms for Solving a Class of Convex Feasibility Problems

    Authors: Kaihong Lu, Gangshan **g, Long Wang

    Abstract: In this paper, a class of convex feasibility problems (CFPs) are studied for multi-agent systems through local interactions. The objective is to search a feasible solution to the convex inequalities with some set constraints in a distributed manner. The distributed control algorithms, involving subgradient and projection, are proposed for both continuous- and discrete-time systems, respectively. C… ▽ More

    Submitted 14 December, 2016; originally announced December 2016.

    Comments: 29 pages

  33. arXiv:1609.03161  [pdf, ps, other

    eess.SY

    Distributed algorithms for solving convex inequalities

    Authors: Kaihong Lu, Gangshan **g, Long Wang

    Abstract: In this paper, a distributed subgradient-based algorithm is proposed for continuous-time multi-agent systems to search a feasible solution to convex inequalities. The algorithm involves each agent achieving a state constrained by its own inequalities while exchanging local information with other agents under a time-varying directed communication graph. With the validity of a mild connectivity cond… ▽ More

    Submitted 12 June, 2017; v1 submitted 11 September, 2016; originally announced September 2016.