Search | arXiv e-print repository

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby facilitating the capability to understand both linguistic and non-linguistic features in speech. Enhanced with the proposed approach, our model demonstrates superior performance on the Dynamic-SUPERB benchmark, particularly in generalizing to unseen tasks. Moreover, we discover that the aligned model exhibits a zero-shot instruction-following capability without explicit speech instruction tuning. These findings highlight the potential to reshape instruction-following SLMs by incorporating rich, descriptive speech captions. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted to Interspeech 2024

arXiv:2403.08164 [pdf, other]

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

Authors: Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techn… ▽ More Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techniques can significantly reduce the parameters and training time of a TTS model while guaranteeing a certain performance due to their high parallelism, which alleviate these economic costs of training. In this paper, we propose a lightweight TTS system based on deep convolutional neural networks, which is a two-stage training end-to-end TTS model and does not employ any recurrent units. Our model consists of two stages: Text2Spectrum and SSRN. The former is used to encode phonemes into a coarse mel spectrogram and the latter is used to synthesize the complete spectrum from the coarse mel spectrogram. Meanwhile, we improve the robustness of our model by a series of data augmentations, such as noise suppression, time war**, frequency masking and time masking, for solving the low resource mongolian problem. Experiments show that our model can reduce the training time and parameters while ensuring the quality and naturalness of the synthesized speech compared to using mainstream TTS models. Our method uses NCMMSC2022-MTTSC Challenge dataset for validation, which significantly reduces training time while maintaining a certain accuracy. △ Less

Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

arXiv:2401.17837 [pdf, ps, other]

Safe Reinforcement Learning-Based Eco-Driving Control for Mixed Traffic Flows With Disturbances

Authors: Ke Lu, Dongjun Li, Qun Wang, Kaidi Yang, Lin Zhao, Ziyou Song

Abstract: This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guaran… ▽ More This paper presents a safe learning-based eco-driving framework tailored for mixed traffic flows, which aims to optimize energy efficiency while guaranteeing safety during real-system operations. Even though reinforcement learning (RL) is capable of optimizing energy efficiency in intricate environments, it is challenged by safety requirements during the training process. The lack of safety guarantees is the other concern when deploying a trained policy in real-world application. Compared with RL, model predicted control (MPC) can handle constrained dynamics systems, ensuring safe driving. However, the major challenges lie in complicated eco-driving tasks and the presence of disturbances, which respectively challenge the MPC design and the satisfaction of constraints. To address these limitations, the proposed framework incorporates the tube-based enhanced MPC (RMPC) to ensure the safe execution of the RL policy under disturbances, thereby improving the control robustness. RL not only optimizes the energy efficiency of the connected and automated vehicle in mixed traffic but also handles more uncertain scenarios, in which the energy consumption of the human-driven vehicle and its diverse and stochastic driving behaviors are considered in the optimization framework. Simulation results demonstrate that the proposed algorithm, compared with RMPC technique, shows an average improvement of 10.88% in holistic energy efficiency, while compared with RL algorithm, it effectively prevents inter-vehicle collisions. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.00273 [pdf, ps, other]

Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Authors: Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

Abstract: This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that… ▽ More This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks. △ Less

Submitted 30 December, 2023; originally announced January 2024.

Comments: Submitted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond workshop

arXiv:2312.08610 [pdf, other]

A computationally efficient semi-blind source separation based approach for nonlinear echo cancellation based on an element-wise iterative source steering

Authors: Kunxing Lu, Xianrui Wang, Tetsuya Ueda, Shoji Makino, **gdong Chen

Abstract: While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and nonlinear distortions. To circumvent these drawbacks, the recently developed ideas on convolutive transfer function (CTF) approximation and nonl… ▽ More While the semi-blind source separation-based acoustic echo cancellation (SBSS-AEC) has received much research attention due to its promising performance during double-talk compared to the traditional adaptive algorithms, it suffers from system latency and nonlinear distortions. To circumvent these drawbacks, the recently developed ideas on convolutive transfer function (CTF) approximation and nonlinear expansion have been used in the iterative projection (IP)-based semi-blind source separation (SBSS) algorithm. However, because of the introduction of CTF approximation and nonlinear expansion, this algorithm becomes computationally very expensive, which makes it difficult to implement in embedded systems. Thus, we attempt in this paper to improve this IP-based algorithm, thereby develo** an element-wise iterative source steering (EISS) algorithm. In comparison with the IP-based SBSS algorithm, the proposed algorithm is computationally much more efficient, especially when the nonlinear expansion order is high and the length of the CTF filter is long. Meanwhile, its AEC performance is as good as that of IP-based SBSS. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2311.05282 [pdf, other]

Empowering high-dimensional optical fiber communications with integrated photonic processors

Authors: Kaihang Lu, Zengqi Chen, Hao Chen, Wu Zhou, Zunyue Zhang, Hon Ki Tsang, Yeyu Tong

Abstract: Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal… ▽ More Mode division multiplexing (MDM) in optical fibers enables multichannel capabilities for various applications, including data transmission, quantum networks, imaging, and sensing. However, MDM optical fiber systems, usually necessities bulk-optics approaches for launching different orthogonal fiber modes into the multimode optical fiber, and multiple-input multiple-output digital electronic signal processing at the receiver side to undo the arbitrary mode scrambling in a circular-core optical fiber. Here we show that a high-dimensional optical fiber communication system can be entirely implemented by a reconfigurable integrated photonic processor, featuring kernels of multichannel mode multiplexing transmitter and all-optical descrambling receiver. High-speed and inter-chip communications involving six spatial- and polarization modes have been experimentally demonstrated with high efficiency and high-quality eye diagrams, despite the presence of random mode scrambling and polarization rotation in a circular-core few-mode fiber. The proposed photonic integration approach holds promising prospects for future space-division multiplexing applications. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.09838 [pdf, ps, other]

HypR: A comprehensive study for ASR hypothesis revising with a reference corpus

Authors: Yi-Wei Wang, Ke-Han Lu, Kuan-Yu Chen

Abstract: With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking modeling and error correction modeling. The former aims to select the hypothesis with the lowest error rate fr… ▽ More With the development of deep learning, automatic speech recognition (ASR) has made significant progress. To further enhance the performance of ASR, revising recognition results is one of the lightweight but efficient manners. Various methods can be roughly classified into N-best reranking modeling and error correction modeling. The former aims to select the hypothesis with the lowest error rate from a set of candidates generated by ASR for a given input speech. The latter focuses on detecting recognition errors in a given hypothesis and correcting these errors to obtain an enhanced result. However, we observe that these studies are hardly comparable to each other, as they are usually evaluated on different corpora, paired with different ASR models, and even use different datasets to train the models. Accordingly, we first concentrate on providing an ASR hypothesis revising (HypR) dataset in this study. HypR contains several commonly used corpora (AISHELL-1, TED-LIUM 2, and LibriSpeech) and provides 50 recognition hypotheses for each speech utterance. The checkpoint models of ASR are also published. In addition, we implement and compare several classic and representative methods, showing the recent research progress in revising speech recognition results. We hope that the publicly available HypR dataset can become a reference benchmark for subsequent research and promote this field of research to an advanced level. △ Less

Submitted 13 June, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted to Interspeech 2024

arXiv:2309.09510 [pdf, ps, other]

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

Authors: Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-yi Lee

Abstract: Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for bui… ▽ More Text language models have shown remarkable zero-shot capability in generalizing to unseen tasks when provided with well-formulated instructions. However, existing studies in speech processing primarily focus on limited or specific tasks. Moreover, the lack of standardized benchmarks hinders a fair comparison across different approaches. Thus, we present Dynamic-SUPERB, a benchmark designed for building universal speech models capable of leveraging instruction tuning to perform multiple tasks in a zero-shot fashion. To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark. To initiate, Dynamic-SUPERB features 55 evaluation instances by combining 33 tasks and 22 datasets. This spans a broad spectrum of dimensions, providing a comprehensive platform for evaluation. Additionally, we propose several approaches to establish benchmark baselines. These include the utilization of speech models, text language models, and the multimodal encoder. Evaluation results indicate that while these baselines perform reasonably on seen tasks, they struggle with unseen ones. We release all materials to the public and welcome researchers to collaborate on the project, advancing technologies in the field together. △ Less

Submitted 22 March, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: To appear in the proceedings of ICASSP 2024

arXiv:2308.05756 [pdf, other]

WeldMon: A Cost-effective Ultrasonic Welding Machine Condition Monitoring System

Authors: Beitong Tian, Kuan-Chieh Lu, Ahmadreza Eslaminia, Yaohui Wang, Chenhui Shao, Klara Nahrstedt

Abstract: Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors. Ensuring high-quality welding is vital, making tool condition monitoring systems essential for early-stage quality control. However, existing monitoring methods face challenges in cost, downtime, and adaptability. In this paper, we present WeldMon, an affordable… ▽ More Ultrasonic welding machines play a critical role in the lithium battery industry, facilitating the bonding of batteries with conductors. Ensuring high-quality welding is vital, making tool condition monitoring systems essential for early-stage quality control. However, existing monitoring methods face challenges in cost, downtime, and adaptability. In this paper, we present WeldMon, an affordable ultrasonic welding machine condition monitoring system that utilizes a custom data acquisition system and a data analysis pipeline designed for real-time analysis. Our classification algorithm combines auto-generated features and hand-crafted features, achieving superior cross-validation accuracy (95.8% on average over all testing tasks) compared to the state-of-the-art method (92.5%) in condition classification tasks. Our data augmentation approach alleviates the concept drift problem, enhancing tool condition classification accuracy by 8.3%. All algorithms run locally, requiring only 385 milliseconds to process data for each welding cycle. We deploy WeldMon and a commercial system on an actual ultrasonic welding machine, performing a comprehensive comparison. Our findings highlight the potential for develo** cost-effective, high-performance, and reliable tool condition monitoring systems. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: 9 pages, 5 figures

arXiv:2305.04021 [pdf, other]

A Sea-Land Clutter Classification Framework for Over-the-Horizon-Radar Based on Weighted Loss Semi-supervised GAN

Authors: Xiaoxuan Zhang, Zengfu Wang, Kun Lu, Quan Pan, Yang Li

Abstract: Deep convolutional neural network has made great achievements in sea-land clutter classification for over-the-horizon-radar (OTHR). The premise is that a large number of labeled training samples must be provided for a sea-land clutter classifier. In practical engineering applications, it is relatively easy to obtain label-free sea-land clutter samples. However, the labeling process is extremely cu… ▽ More Deep convolutional neural network has made great achievements in sea-land clutter classification for over-the-horizon-radar (OTHR). The premise is that a large number of labeled training samples must be provided for a sea-land clutter classifier. In practical engineering applications, it is relatively easy to obtain label-free sea-land clutter samples. However, the labeling process is extremely cumbersome and requires expertise in the field of OTHR. To solve this problem, we propose an improved generative adversarial network, namely weighted loss semi-supervised generative adversarial network (WL-SSGAN). Specifically, we propose a joint feature matching loss by weighting the middle layer features of the discriminator of semi-supervised generative adversarial network. Furthermore, we propose the weighted loss of WL-SSGAN by linearly weighting standard adversarial loss and joint feature matching loss. The semi-supervised classification performance of WL-SSGAN is evaluated on a sea-land clutter dataset. The experimental results show that WL-SSGAN can improve the performance of the fully supervised classifier with only a small number of labeled samples by utilizing a large number of unlabeled sea-land clutter samples. Further, the proposed weighted loss is superior to both the adversarial loss and the feature matching loss. Additionally, we compare WL-SSGAN with conventional semi-supervised classification methods and demonstrate that WL-SSGAN achieves the highest classification accuracy. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: 9 pages

arXiv:2304.04760 [pdf, other]

SAR2EO: A High-resolution Image Translation Framework with Denoising Enhancement

Authors: Jun Yu, Shenshen Du, Guochen Xie, Renjie Lu, Pengwei Li, Zhongpeng Cai, Keda Lu

Abstract: Synthetic Aperture Radar (SAR) to electro-optical (EO) image translation is a fundamental task in remote sensing that can enrich the dataset by fusing information from different sources. Recently, many methods have been proposed to tackle this task, but they are still difficult to complete the conversion from low-resolution images to high-resolution images. Thus, we propose a framework, SAR2EO, ai… ▽ More Synthetic Aperture Radar (SAR) to electro-optical (EO) image translation is a fundamental task in remote sensing that can enrich the dataset by fusing information from different sources. Recently, many methods have been proposed to tackle this task, but they are still difficult to complete the conversion from low-resolution images to high-resolution images. Thus, we propose a framework, SAR2EO, aiming at addressing this challenge. Firstly, to generate high-quality EO images, we adopt the coarse-to-fine generator, multi-scale discriminators, and improved adversarial loss in the pix2pixHD model to increase the synthesis quality. Secondly, we introduce a denoising module to remove the noise in SAR images, which helps to suppress the noise while preserving the structural information of the images. To validate the effectiveness of the proposed framework, we conduct experiments on the dataset of the Multi-modal Aerial View Imagery Challenge (MAVIC), which consists of large-scale SAR and EO image pairs. The experimental results demonstrate the superiority of our proposed framework, and we win the first place in the MAVIC held in CVPR PBVS 2023. △ Less

Submitted 25 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2301.00947 [pdf, other]

doi 10.1109/TGRS.2023.3274296

Data Augmentation and Classification of Sea-Land Clutter for Over-the-Horizon Radar Using AC-VAEGAN

Authors: Xiaoxuan Zhang, Zengfu Wang, Kun Lu, Quan Pan

Abstract: In the sea-land clutter classification of sky-wave over-the-horizon-radar (OTHR), the imbalanced and scarce data leads to a poor performance of the deep learning-based classification model. To solve this problem, this paper proposes an improved auxiliary classifier generative adversarial network~(AC-GAN) architecture, namely auxiliary classifier variational autoencoder generative adversarial netwo… ▽ More In the sea-land clutter classification of sky-wave over-the-horizon-radar (OTHR), the imbalanced and scarce data leads to a poor performance of the deep learning-based classification model. To solve this problem, this paper proposes an improved auxiliary classifier generative adversarial network~(AC-GAN) architecture, namely auxiliary classifier variational autoencoder generative adversarial network (AC-VAEGAN). AC-VAEGAN can synthesize higher quality sea-land clutter samples than AC-GAN and serve as an effective tool for data augmentation. Specifically, a 1-dimensional convolutional AC-VAEGAN architecture is designed to synthesize sea-land clutter samples. Additionally, an evaluation method combining both traditional evaluation of GAN domain and statistical evaluation of signal domain is proposed to evaluate the quality of synthetic samples. Using a dataset of OTHR sea-land clutter, both the quality of the synthetic samples and the performance of data augmentation of AC-VAEGAN are verified. Further, the effect of AC-VAEGAN as a data augmentation method on the classification performance of imbalanced and scarce sea-land clutter samples is validated. The experiment results show that the quality of samples synthesized by AC-VAEGAN is better than that of AC-GAN, and the data augmentation method with AC-VAEGAN is able to improve the classification performance in the case of imbalanced and scarce sea-land clutter samples. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: 13 pages, 16 figures

arXiv:2212.08487 [pdf, other]

Semantics-Empowered Communication: A Tutorial-cum-Survey

Authors: Zhilin Lu, Rongpeng Li, Kun Lu, Xianfu Chen, Ekram Hossain, Zhifeng Zhao, Honggang Zhang

Abstract: Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed… ▽ More Along with the springing up of the semantics-empowered communication (SemCom) research, it is now witnessing an unprecedentedly growing interest towards a wide range of aspects (e.g., theories, applications, metrics and implementations) in both academia and industry. In this work, we primarily aim to provide a comprehensive survey on both the background and research taxonomy, as well as a detailed technical tutorial. Specifically, we start by reviewing the literature and answering the "what" and "why" questions in semantic transmissions. Afterwards, we present the ecosystems of SemCom, including history, theories, metrics, datasets and toolkits, on top of which the taxonomy for research directions is presented. Furthermore, we propose to categorize the critical enabling techniques by explicit and implicit reasoning-based methods, and elaborate on how they evolve and contribute to modern content & channel semantics-empowered communications. Besides reviewing and summarizing the latest efforts in SemCom, we discuss the relations with other communication levels (e.g., conventional communications) from a holistic and unified viewpoint. Subsequently, in order to facilitate future developments and industrial applications, we also highlight advanced practical techniques for boosting semantic accuracy, robustness, and large-scale scalability, just to mention a few. Finally, we discuss the technical challenges that shed light on future research opportunities. △ Less

Submitted 11 November, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication in the IEEE Communications Surveys and Tutorials

arXiv:2211.05256 [pdf, other]

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

arXiv:2210.06244 [pdf, other]

A context-aware knowledge transferring strategy for CTC-based ASR

Authors: Ke-Han Lu, Kuan-Yu Chen

Abstract: Non-autoregressive automatic speech recognition (ASR) modeling has received increasing attention recently because of its fast decoding speed and superior performance. Among representatives, methods based on the connectionist temporal classification (CTC) are still a dominating stream. However, the theoretically inherent flaw, the assumption of independence between tokens, creates a performance bar… ▽ More Non-autoregressive automatic speech recognition (ASR) modeling has received increasing attention recently because of its fast decoding speed and superior performance. Among representatives, methods based on the connectionist temporal classification (CTC) are still a dominating stream. However, the theoretically inherent flaw, the assumption of independence between tokens, creates a performance barrier for the school of works. To mitigate the challenge, we propose a context-aware knowledge transferring strategy, consisting of a knowledge transferring module and a context-aware training strategy, for CTC-based ASR. The former is designed to distill linguistic information from a pre-trained language model, and the latter is framed to modulate the limitations caused by the conditional independence assumption. As a result, a knowledge-injected context-aware CTC-based ASR built upon the wav2vec2.0 is presented in this paper. A series of experiments on the AISHELL-1 and AISHELL-2 datasets demonstrate the effectiveness of the proposed method. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted by SLT 2022

arXiv:2206.13511 [pdf]

Design and control analysis of a deployable clustered hyperbolic paraboloid cable net

Authors: Shuo Ma, Kai Lu, Muhao Chen, Robert E. Skelton

Abstract: This paper presents an analytical and experimental design and deployment control analysis of a hyperbolic paraboloid cable net based on clustering actuation strategies. First, the dynamics and statics for clustered tensegrity structures (CTS) are given. Then, we propose the topology design of the deployable hyperbolic paraboloid cable net. The deployability of the cable net is achieved by using cl… ▽ More This paper presents an analytical and experimental design and deployment control analysis of a hyperbolic paraboloid cable net based on clustering actuation strategies. First, the dynamics and statics for clustered tensegrity structures (CTS) are given. Then, we propose the topology design of the deployable hyperbolic paraboloid cable net. The deployability of the cable net is achieved by using clustered cables. It is shown that the clustered cables significantly reduce the number of actuators required for control. The deployment trajectory and actuation prestress in the cables are designed to ensure the tensions are feasible during the deployment process. Then, we compare the deployment analysis's open-loop and closed-loop control strategies. Finally, a lab-scale model is constructed to validate the actuation laws. We test the static performance and deployment process of the experimental model. Results show that the closed-loop control approach is more stable and smoother than the open-loop one in the deployment process. The approaches developed in this paper can also be used for various deployable tensegrity structures. △ Less

Submitted 26 June, 2022; originally announced June 2022.

Comments: 20 pages, 24 figures

arXiv:2110.08496 [pdf, other]

doi 10.1109/MWC.013.2100642

Rethinking Modern Communication from Semantic Coding to Semantic Communication

Authors: Kun Lu, Qingyang Zhou, Rongpeng Li, Zhifeng Zhao, Xianfu Chen, Jianjun Wu, Honggang Zhang

Abstract: Modern communications are usually designed to pursue a higher bit-level precision and fewer bits while transmitting a message. This article rethinks these two major features and introduces the concept and advantage of semantics that characterizes a new kind of semantics-aware communication framework, incorporating both the semantic encoding and the semantic communication problem. After analyzing t… ▽ More Modern communications are usually designed to pursue a higher bit-level precision and fewer bits while transmitting a message. This article rethinks these two major features and introduces the concept and advantage of semantics that characterizes a new kind of semantics-aware communication framework, incorporating both the semantic encoding and the semantic communication problem. After analyzing the underlying defects of existing semantics-aware techniques, we establish a confidence-based distillation mechanism for the joint semantics-noise coding (JSNC) problem and a reinforcement learning (RL)-powered semantic communication paradigm that endows a system the ability to convey the semantics instead of pursuing the bit level accuracy. On top of these technical contributions, this work provides a new insight to understand how the semantics are processed and represented in a semantics-aware coding and communication system, and verifies the significant benefits of doing so. Targeted on the next generation's semantics-aware communication, some critical concerns and open challenges such as the information overhead, semantic security and implementation cost are also discussed and envisioned. △ Less

Submitted 8 June, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

Comments: Accepted by IEEE Wireless Communications

arXiv:2108.01998 [pdf, other]

Adversarial Energy Disaggregation for Non-intrusive Load Monitoring

Authors: Zhekai Du, **g**g Li, Lei Zhu, Ke Lu, Heng Tao Shen

Abstract: Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis. {NILM aims to help households understand how the energy is used and consequently tell them how to effectively manage the energy, thus allowing energy efficie… ▽ More Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis. {NILM aims to help households understand how the energy is used and consequently tell them how to effectively manage the energy, thus allowing energy efficiency which is considered as one of the twin pillars of sustainable energy policy (i.e., energy efficiency and renewable energy).} Although NILM is unidentifiable, it is widely believed that the NILM problem can be addressed by data science. Most of the existing approaches address the energy disaggregation problem by conventional techniques such as sparse coding, non-negative matrix factorization, and hidden Markov model. Recent advances reveal that deep neural networks (DNNs) can get favorable performance for NILM since DNNs can inherently learn the discriminative signatures of the different appliances. In this paper, we propose a novel method named adversarial energy disaggregation (AED) based on DNNs. We introduce the idea of adversarial learning into NILM, which is new for the energy disaggregation task. Our method trains a generator and multiple discriminators via an adversarial fashion. The proposed method not only learns shard representations for different appliances, but captures the specific multimode structures of each appliance. Extensive experiments on real-world datasets verify that our method can achieve new state-of-the-art performance. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: Accepted to ACM/IMS Trans. on Data Science, codes can be found at https://github.com/li**118/AED

arXiv:2104.07539 [pdf, other]

Multi-Agent Reinforcement Learning Based Coded Computation for Mobile Ad Hoc Computing

Authors: Baoqian Wang, Junfei Xie, Kejie Lu, Yan Wan, Shengli Fu

Abstract: Mobile ad hoc computing (MAHC), which allows mobile devices to directly share their computing resources, is a promising solution to address the growing demands for computing resources required by mobile devices. However, offloading a computation task from a mobile device to other mobile devices is a challenging task due to frequent topology changes and link failures because of node mobility, unsta… ▽ More Mobile ad hoc computing (MAHC), which allows mobile devices to directly share their computing resources, is a promising solution to address the growing demands for computing resources required by mobile devices. However, offloading a computation task from a mobile device to other mobile devices is a challenging task due to frequent topology changes and link failures because of node mobility, unstable and unknown communication environments, and the heterogeneous nature of these devices. To address these challenges, in this paper, we introduce a novel coded computation scheme based on multi-agent reinforcement learning (MARL), which has many promising features such as adaptability to network changes, high efficiency and robustness to uncertain system disturbances, consideration of node heterogeneity, and decentralized load allocation. Comprehensive simulation studies demonstrate that the proposed approach can outperform state-of-the-art distributed computing schemes. △ Less

Submitted 15 April, 2021; originally announced April 2021.

arXiv:2103.11529 [pdf, other]

doi 10.1109/TSP.2022.3160003

DCT and DST Filtering with Sparse Graph Operators

Authors: Keng-Shih Lu, Antonio Ortega, Debargha Mukherjee, Yue Chen

Abstract: Graph filtering is a fundamental tool in graph signal processing. Polynomial graph filters (PGFs), defined as polynomials of a fundamental graph operator, can be implemented in the vertex domain, and usually have a lower complexity than frequency domain filter implementations. In this paper, we focus on the design of filters for graphs with graph Fourier transform (GFT) corresponding to a discrete… ▽ More Graph filtering is a fundamental tool in graph signal processing. Polynomial graph filters (PGFs), defined as polynomials of a fundamental graph operator, can be implemented in the vertex domain, and usually have a lower complexity than frequency domain filter implementations. In this paper, we focus on the design of filters for graphs with graph Fourier transform (GFT) corresponding to a discrete trigonometric transform (DTT), i.e., one of 8 types of discrete cosine transforms (DCT) and 8 discrete sine transforms (DST). In this case, we show that multiple sparse graph operators can be identified, which allows us to propose a generalization of PGF design: multivariate polynomial graph filter (MPGF). First, for the widely used DCT-II (type-2 DCT), we characterize a set of sparse graph operators that share the DCT-II matrix as their common eigenvector matrix. This set contains the well-known connected line graph. These sparse operators can be viewed as graph filters operating in the DCT domain, which allows us to approximate any DCT graph filter by a MPGF, leading to a design with more degrees of freedom than the conventional PGF approach. Then, we extend those results to all of the 16 DTTs as well as their 2D versions, and show how their associated sets of multiple graph operators can be determined. We demonstrate experimentally that ideal low-pass and exponential DCT/DST filters can be approximated with higher accuracy with similar runtime complexity. Finally, we apply our method to transform-type selection in a video codec, AV1, where we demonstrate significant encoding time savings, with a negligible compression loss. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: 16 pages, 11 figures, 5 tables

arXiv:2101.04859 [pdf]

A*HAR: A New Benchmark towards Semi-supervised learning for Class-imbalanced Human Activity Recognition

Authors: Govind Narasimman, Kangkang Lu, Arun Raja, Chuan Sheng Foo, Mohamed Sabry Aly, Jie Lin, Vijay Chandrasekhar

Abstract: Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised learning for class-imbalanced HAR. We evaluate state-… ▽ More Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised learning for class-imbalanced HAR. We evaluate state-of-the-art semi-supervised learning method on A*HAR, by combining Mean Teacher and Convolutional Neural Network. Interestingly, we find that Mean Teacher boosts the overall performance when training the classifier with fewer labelled samples and a large amount of unlabeled samples, but the classifier falls short in handling unbalanced activities. These findings lead to an interesting open problem, i.e., development of semi-supervised HAR algorithms that are class-imbalance aware without any prior knowledge on the class distribution for unlabeled samples. The dataset and benchmark evaluation are released at https://github.com/I2RDL2/ASTAR-HAR for future research. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: 5 pages, 3 figures

arXiv:2012.14704 [pdf]

Advances in deep learning methods for pavement surface crack detection and identification with visible light visual images

Authors: Kailiang Lu

Abstract: Compared to NDT and health monitoring method for cracks in engineering structures, surface crack detection or identification based on visible light images is non-contact, with the advantages of fast speed, low cost and high precision. Firstly, typical pavement (concrete also) crack public data sets were collected, and the characteristics of sample images as well as the random variable factors, inc… ▽ More Compared to NDT and health monitoring method for cracks in engineering structures, surface crack detection or identification based on visible light images is non-contact, with the advantages of fast speed, low cost and high precision. Firstly, typical pavement (concrete also) crack public data sets were collected, and the characteristics of sample images as well as the random variable factors, including environmental, noise and interference etc., were summarized. Subsequently, the advantages and disadvantages of three main crack identification methods (i.e., hand-crafted feature engineering, machine learning, deep learning) were compared. Finally, from the aspects of model architecture, testing performance and predicting effectiveness, the development and progress of typical deep learning models, including self-built CNN, transfer learning(TL) and encoder-decoder(ED), which can be easily deployed on embedded platform, were reviewed. The benchmark test shows that: 1) It has been able to realize real-time pixel-level crack identification on embedded platform: the entire crack detection average time cost of an image sample is less than 100ms, either using the ED method (i.e., FPCNet) or the TL method based on InceptionV3. It can be reduced to less than 10ms with TL method based on MobileNet (a lightweight backbone base network). 2) In terms of accuracy, it can reach over 99.8% on CCIC which is easily identified by human eyes. On SDNET2018, some samples of which are difficult to be identified, FPCNet can reach 97.5%, while TL method is close to 96.1%. To the best of our knowledge, this paper for the first time comprehensively summarizes the pavement crack public data sets, and the performance and effectiveness of surface crack detection and identification deep learning methods for embedded platform, are reviewed and evaluated. △ Less

Submitted 2 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Comments: 15 pages, 14 figures, 11 tables

ACM Class: I.5.4

Journal ref: Computer Engineering and Science 2022

arXiv:2007.15778 [pdf, other]

Weakly supervised one-stage vision and language disease detection using large scale pneumonia and pneumothorax studies

Authors: Leo K. Tam, Xiaosong Wang, Evrim Turkbey, Kevin Lu, Yuhong Wen, Daguang Xu

Abstract: Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIM… ▽ More Detecting clinically relevant objects in medical images is a challenge despite large datasets due to the lack of detailed labels. To address the label issue, we utilize the scene-level labels with a detection architecture that incorporates natural language information. We present a challenging new set of radiologist paired bounding box and natural language annotations on the publicly available MIMIC-CXR dataset especially focussed on pneumonia and pneumothorax. Along with the dataset, we present a joint vision language weakly supervised transformer layer-selected one-stage dual head detection architecture (LITERATI) alongside strong baseline comparisons with class activation map** (CAM), gradient CAM, and relevant implementations on the NIH ChestXray-14 and MIMIC-CXR dataset. Borrowing from advances in vision language architectures, the LITERATI method demonstrates joint image and referring expression (objects localized in the image using natural language) input for detection that scales in a purely weakly supervised fashion. The architectural modifications address three obstacles -- implementing a supervised vision and language detection method in a weakly supervised fashion, incorporating clinical referring expression natural language information, and generating high fidelity detections with map probabilities. Nevertheless, the challenging clinical nature of the radiologist annotations including subtle references, multi-instance specifications, and relatively verbose underlying medical reports, ensures the vision language detection task at scale remains stimulating for future investigation. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: Accepted at Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2020

arXiv:2005.02079 [pdf, other]

OTHR multitarget tracking with a GMRF model of ionospheric parameters

Authors: Zhen Guo, Zengfu Wang, Hua Lan, Quan Pan, Kun Lu

Abstract: The ionosphere is the propagation medium for radio waves transmitted by an over-the-horizon radar (OTHR). Ionospheric parameters, typically, virtual ionospheric heights (VIHs), are required to perform coordinate registration for OTHR multitarget tracking and localization. The inaccuracy of ionospheric parameters has a significant deleterious effect on the target localization of OTHR. Therefore, to… ▽ More The ionosphere is the propagation medium for radio waves transmitted by an over-the-horizon radar (OTHR). Ionospheric parameters, typically, virtual ionospheric heights (VIHs), are required to perform coordinate registration for OTHR multitarget tracking and localization. The inaccuracy of ionospheric parameters has a significant deleterious effect on the target localization of OTHR. Therefore, to improve the localization accuracy of OTHR, it is important to develop accurate models and estimation methods of ionospheric parameters and the corresponding target tracking algorithms. In this paper, we consider the variation of the ionosphere with location and the spatial correlation of the ionosphere in OTHR target tracking. We use a Gaussian Markov random field (GMRF) to model the VIHs, providing a more accurate representation of the VIHs for OTHR target tracking. Based on expectation-conditional maximization and GMRF modeling of the VIHs, we propose a novel joint optimization solution, called ECM-GMRF, to perform target state estimation, multipath data association and VIHs estimation simultaneously. In ECM-GMRF, the measurements from both ionosondes and OTHR are exploited to estimate the VIHs, leading to a better estimation of the VIHs which improves the accuracy of data association and target state estimation, and vice versa. The simulation indicates the effectiveness of the proposed algorithm. △ Less

Submitted 5 May, 2020; originally announced May 2020.

Comments: 16 pages

arXiv:2003.09984 [pdf, other]

Measurement-Level Fusion for OTHR Network Using Message Passing

Authors: Hua Lan, Zengfu Wang, Xianglong Bai, Quan Pan, Kun Lu

Abstract: Tracking an unknown number of targets based on multipath measurements provided by an over-the-horizon radar (OTHR) network with a statistical ionospheric model is complicated, which requires solving four subproblems: target detection, target tracking, multipath data association and ionospheric height identification. A joint solution is desired since the four subproblems are highly correlated, but… ▽ More Tracking an unknown number of targets based on multipath measurements provided by an over-the-horizon radar (OTHR) network with a statistical ionospheric model is complicated, which requires solving four subproblems: target detection, target tracking, multipath data association and ionospheric height identification. A joint solution is desired since the four subproblems are highly correlated, but suffering from the intractable inference problem of high-dimensional latent variables. In this paper, a unified message passing approach, combining belief propagation (BP) and mean-field (MF) approximation, is developed for simplifying the intractable inference. Based upon the factor graph corresponding to a factorization of the joint probability distribution function (PDF) of the latent variables and a choice for a separation of this factorization into BP region and MF region, the posterior PDFs of continuous latent variables including target kinematic state, target visibility state, and ionospheric height, are approximated by MF due to its simple MP update rules for conjugate-exponential models. With regard to discrete multipath data association which contains one-to-one frame (hard) constraints, its PDF is approximated by loopy BP. Finally, the approximated posterior PDFs are updated iteratively in a closed-loop manner, which is effective for dealing with the coupling issue among target detection, target tracking, multipath data association, and ionospheric height identification. Meanwhile, the proposed approach has the measurement-level fusion architecture due to the direct processing of the raw multipath measurements from an OTHR network, which is benefit to improving target tracking performance. Its performance is demonstrated on a simulated OTHR network multitarget tracking scenario. △ Less

Submitted 3 April, 2020; v1 submitted 22 March, 2020; originally announced March 2020.

Comments: 40 pages, 23 figures

arXiv:2002.08558 [pdf, other]

Perceptually inspired weighted MSE optimization using irregularity-aware graph Fourier transform

Authors: Keng-Shih Lu, Antonio Ortega, Debargha Mukherjee, Yue Chen

Abstract: In image and video coding applications, distortion has been traditionally measured using mean square error (MSE), which suggests the use of orthogonal transforms, such as the discrete cosine transform (DCT). Perceptual metrics such as Structural Similarity (SSIM) are typically used after encoding, but not tied to the encoding process. In this paper, we consider an alternative framework where the g… ▽ More In image and video coding applications, distortion has been traditionally measured using mean square error (MSE), which suggests the use of orthogonal transforms, such as the discrete cosine transform (DCT). Perceptual metrics such as Structural Similarity (SSIM) are typically used after encoding, but not tied to the encoding process. In this paper, we consider an alternative framework where the goal is to optimize a weighted MSE metric, where different weights can be assigned to each pixel so as to reflect their relative importance in terms of perceptual image quality. For this purpose, we propose a novel transform coding scheme based on irregularity-aware graph Fourier transform (IAGFT), where the induced IAGFT is orthogonal, but the orthogonality is defined with respect to an inner product corresponding to the weighted MSE. We propose to use weights derived from local variances of the input image, such that the weighted MSE aligns with SSIM. In this way, the associated IAGFT can achieve a coding efficiency improvement in SSIM with respect to conventional transform coding based on DCT. Our experimental results show a compression gain in terms of multi-scale SSIM on test images. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: 5 pages, 6 figures, submitted to International Conference of Image Processing (ICIP) 2020

arXiv:1907.07875 [pdf, other]

doi 10.1109/TSP.2019.2932882

Fast Graph Fourier Transforms Based on Graph Symmetry and Bipartition

Authors: Keng-Shih Lu, Antonio Ortega

Abstract: The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations wi… ▽ More The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations with angle $π/4$) can be used to reduce GFT computation cost when the graph is bipartite or satisfies certain symmetry properties based on node pairing. We also propose a graph decomposition method based on graph topological symmetry, which allows us to identify and exploit butterfly structures in stages. This method is particularly useful for graphs that are nearly regular or have some specific structures, e.g., line graphs, cycle graphs, grid graphs, and human skeletal graphs. Though butterfly stages based on graph topological symmetry cannot be used for general graphs, they are useful in applications, including video compression and human action analysis, where symmetric graphs, such as symmetric line graphs and human skeletal graphs, are used. Our proposed fast GFT implementations are shown to reduce computation costs significantly, in terms of both number of operations and empirical runtimes. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: 14 pages, 15 figures

arXiv:1904.01509 [pdf, other]

doi 10.1109/ICMEW.2019.0-104

FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

Authors: Yanfu Yan, Ke Lu, Jian Xue, Pengcheng Gao, Jiayi Lyu

Abstract: Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presenc… ▽ More Facial expression analysis based on machine learning requires large number of well-annotated data to reflect different changes in facial motion. Publicly available datasets truly help to accelerate research in this area by providing a benchmark resource, but all of these datasets, to the best of our knowledge, are limited to rough annotations for action units, including only their absence, presence, or a five-level intensity according to the Facial Action Coding System. To meet the need for videos labeled in great detail, we present a well-annotated dataset named FEAFA for Facial Expression Analysis and 3D Facial Animation. One hundred and twenty-two participants, including children, young adults and elderly people, were recorded in real-world conditions. In addition, 99,356 frames were manually labeled using Expression Quantitative Tool developed by us to quantify 9 symmetrical FACS action units, 10 asymmetrical (unilateral) FACS action units, 2 symmetrical FACS action descriptors and 2 asymmetrical FACS action descriptors, and each action unit or action descriptor is well-annotated with a floating point number between 0 and 1. To provide a baseline for use in future research, a benchmark for the regression of action unit values based on Convolutional Neural Networks are presented. We also demonstrate the potential of our FEAFA dataset for 3D facial animation. Almost all state-of-the-art algorithms for facial animation are achieved based on 3D face reconstruction. We hence propose a novel method that drives virtual characters only based on action unit value regression of the 2D video frames of source actors. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: 9 pages, 7 figures

Journal ref: 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

arXiv:1811.03853 [pdf, other]

Sample-Efficient Policy Learning based on Completely Behavior Cloning

Authors: Qiming Zou, Ling Wang, Ke Lu, Yu Li

Abstract: Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algor… ▽ More Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better. △ Less

Submitted 9 November, 2018; originally announced November 2018.

arXiv:1811.03846 [pdf, other]

Computation Load Balancing Real-Time Model Predictive Control in Urban Traffic Networks

Authors: Qiming Zou, Ke Lu, Yu Li

Abstract: Owing to the rapid growth number of vehicles, urban traffic congestion has become more and more severe in the last decades. As an effective approach, Model Predictive Control (MPC) has been applied to urban traffic signal control system. However, the potentially high online computation burden may limit its further application for real scenarios. In this paper, a new approach based on online active… ▽ More Owing to the rapid growth number of vehicles, urban traffic congestion has become more and more severe in the last decades. As an effective approach, Model Predictive Control (MPC) has been applied to urban traffic signal control system. However, the potentially high online computation burden may limit its further application for real scenarios. In this paper, a new approach based on online active set strategy is proposed to improve the real-time performance of MPC-based traffic controller by reducing the online computing time. This approach divides one control cycle into several sequential sampling intervals. In each interval, online active set method is applied to solve quadratic programming (QP) of traffic signal control model, by searching the optimal solution starting at the optimal solution of previous interval in the feasible region. The most appealing property of this approach lies in that it can distribute the computational complexity into several sample intervals, instead of imposing heavy computation burden at each end of control cycle. The simulation experiments show that this breakthrough approach can obviously reduce the online computational complexity, and increase the applicability of the MPC in real-life traffic networks. △ Less

Submitted 9 November, 2018; originally announced November 2018.

arXiv:1711.00213 [pdf, other]

Closed Form Solutions of Combinatorial Graph Laplacian Estimation under Acyclic Topology Constraints

Authors: Keng-Shih Lu, Antonio Ortega

Abstract: How to obtain a graph from data samples is an important problem in graph signal processing. One way to formulate this graph learning problem is based on Gaussian maximum likelihood estimation, possibly under particular topology constraints. To solve this problem, we typically require iterative convex optimization solvers. In this paper, we show that when the target graph topology does not contain… ▽ More How to obtain a graph from data samples is an important problem in graph signal processing. One way to formulate this graph learning problem is based on Gaussian maximum likelihood estimation, possibly under particular topology constraints. To solve this problem, we typically require iterative convex optimization solvers. In this paper, we show that when the target graph topology does not contain any cycle, then the solution has a closed form in terms of the empirical covariance matrix. This enables us to efficiently construct a tree graph from data, even if there is only a single data sample available. We also provide an error bound of the objective function when we use the same solution to approximate a cyclic graph. As an example, we consider an image denoising problem, in which for each input image we construct a graph based on the theoretical result. We then apply low-pass graph filters based on this graph. Experimental results show that the weights given by the graph learning solution lead to better denoising results than the bilateral weights under some conditions. △ Less

Submitted 1 November, 2017; originally announced November 2017.

arXiv:1612.04913 [pdf, ps, other]

Distributed Algorithms for Solving a Class of Convex Feasibility Problems

Authors: Kaihong Lu, Gangshan **g, Long Wang

Abstract: In this paper, a class of convex feasibility problems (CFPs) are studied for multi-agent systems through local interactions. The objective is to search a feasible solution to the convex inequalities with some set constraints in a distributed manner. The distributed control algorithms, involving subgradient and projection, are proposed for both continuous- and discrete-time systems, respectively. C… ▽ More In this paper, a class of convex feasibility problems (CFPs) are studied for multi-agent systems through local interactions. The objective is to search a feasible solution to the convex inequalities with some set constraints in a distributed manner. The distributed control algorithms, involving subgradient and projection, are proposed for both continuous- and discrete-time systems, respectively. Conditions associated with connectivity of the directed communication graph are given to ensure convergence of the algorithms. It is shown that under mild conditions, the states of all agents reach consensus asymptotically and the consensus state is located in the solution set of the CFP. Simulation examples are presented to demonstrate the effectiveness of the theoretical results. △ Less

Submitted 14 December, 2016; originally announced December 2016.

Comments: 29 pages

arXiv:1609.03161 [pdf, ps, other]

Distributed algorithms for solving convex inequalities

Authors: Kaihong Lu, Gangshan **g, Long Wang

Abstract: In this paper, a distributed subgradient-based algorithm is proposed for continuous-time multi-agent systems to search a feasible solution to convex inequalities. The algorithm involves each agent achieving a state constrained by its own inequalities while exchanging local information with other agents under a time-varying directed communication graph. With the validity of a mild connectivity cond… ▽ More In this paper, a distributed subgradient-based algorithm is proposed for continuous-time multi-agent systems to search a feasible solution to convex inequalities. The algorithm involves each agent achieving a state constrained by its own inequalities while exchanging local information with other agents under a time-varying directed communication graph. With the validity of a mild connectivity condition associated with the communication graph, it is shown that all agents will reach agreement asymptotically and the consensus state is in the solution set of the inequalities. Furthermore, the method is also extended to solving the distributed optimization problem of minimizing the sum of local objective functions subject to convex inequalities. A simulation example is presented to demonstrate the effectiveness of the theoretical results. △ Less

Submitted 12 June, 2017; v1 submitted 11 September, 2016; originally announced September 2016.

Showing 1–33 of 33 results for author: Lu, K