Search | arXiv e-print repository

Speech-based Clinical Depression Screening: An Empirical Study

Authors: Yangbin Chen, Chenyang Xu, Chunfeng Liang, Yanbao Tao, Chuan Shi

Abstract: This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists followin… ▽ More This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading tasks. Segment duration and quantity significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features. △ Less

Submitted 12 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures

arXiv:2401.00124 [pdf, other]

Generative AI-driven Semantic Communication Networks: Architecture, Technologies and Applications

Authors: Chengsi Liang, Hongyang Du, Yao Sun, Dusit Niyato, Jiawen Kang, Dezong Zhao, Muhammad Ali Imran

Abstract: Generative artificial intelligence (GAI) has emerged as a rapidly burgeoning field demonstrating significant potential in creating diverse contents intelligently and automatically. To support such artificial intelligence-generated content (AIGC) services, future communication systems should fulfill much more stringent requirements (including data rate, throughput, latency, etc.) with limited yet p… ▽ More Generative artificial intelligence (GAI) has emerged as a rapidly burgeoning field demonstrating significant potential in creating diverse contents intelligently and automatically. To support such artificial intelligence-generated content (AIGC) services, future communication systems should fulfill much more stringent requirements (including data rate, throughput, latency, etc.) with limited yet precious spectrum resources. To tackle this challenge, semantic communication (SemCom), dramatically reducing resource consumption via extracting and transmitting semantics, has been deemed as a revolutionary communication scheme. The advanced GAI algorithms facilitate SemCom on sophisticated intelligence for model training, knowledge base construction and channel adaption. Furthermore, GAI algorithms also play an important role in the management of SemCom networks. In this survey, we first overview the basics of GAI and SemCom as well as the synergies of the two technologies. Especially, the GAI-driven SemCom framework is presented, where many GAI models for information creation, SemCom-enabled information transmission and information effectiveness for AIGC are discussed separately. We then delve into the GAI-driven SemCom network management involving with novel management layers, knowledge management, and resource allocation. Finally, we envision several promising use cases, i.e., autonomous driving, smart city, and the Metaverse for a more comprehensive exploration. △ Less

Submitted 7 January, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

arXiv:2312.15373 [pdf, other]

A Multi-day Needs-based Modeling Approach for Activity and Travel Demand Analysis

Authors: Kexin Chen, **** Guan, Ravi Seshadri, Varun Pattabhiraman, Youssef Medhat Aboutaleb, Ali Shamshiripour, Chen Liang, Xiaochun Zhang, Moshe Ben-Akiva

Abstract: This paper proposes a multi-day needs-based model for activity and travel demand analysis. The model captures the multi-day dynamics in activity generation, which enables the modeling of activities with increased flexibility in time and space (e.g., e-commerce and remote working). As an enhancement to activity-based models, the proposed model captures the underlying decision-making process of acti… ▽ More This paper proposes a multi-day needs-based model for activity and travel demand analysis. The model captures the multi-day dynamics in activity generation, which enables the modeling of activities with increased flexibility in time and space (e.g., e-commerce and remote working). As an enhancement to activity-based models, the proposed model captures the underlying decision-making process of activity generation by accounting for psychological needs as the drivers of activities. The level of need satisfaction is modeled as a psychological inventory, whose utility is optimized via decisions on activity participation, location, and duration. The utility includes both the benefit in the inventory gained and the cost in time, monetary expense as well as maintenance of safety stock. The model includes two sub-models, a Deterministic Model that optimizes the utility of the inventory, and an Empirical Model that accounts for heterogeneity and stochasticity. Numerical experiments are conducted to demonstrate model scalability. A maximum likelihood estimator is proposed, the properties of the log-likelihood function are examined and the recovery of true parameters is tested. This research contributes to the literature on transportation demand models in the following three aspects. First, it is arguably better grounded in psychological theory than traditional models and allows the generation of activity patterns to be policy-sensitive (while avoiding the need for ad hoc utility definitions). Second, it contributes to the development of needs-based models with a non-myopic approach to model multi-day activity patterns. Third, it proposes a tractable model formulation via problem reformulation and computational enhancements, which allows for maximum likelihood parameter estimation. △ Less

Submitted 23 December, 2023; originally announced December 2023.

Comments: 38 pages, 11 figures

arXiv:2308.15483 [pdf, other]

Generative AI for Semantic Communication: Architecture, Challenges, and Outlook

Authors: Le Xia, Yao Sun, Chengsi Liang, Lei Zhang, Muhammad Ali Imran, Dusit Niyato

Abstract: Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorpo… ▽ More Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorporating generative artificial intelligence (GAI) technologies with SemCom. Recognizing GAI's powerful capability in automating and creating valuable, diverse, and personalized multimodal content, this article first highlights the principal characteristics of the combination of GAI and SemCom along with their pertinent benefits and challenges. To tackle these challenges, we further propose a novel GAI-assisted SemCom network (GAI-SCN) framework in a cloud-edge-mobile design. Specifically, by employing global and local GAI models, our GAI-SCN enables multimodal semantic content provisioning, semantic-level joint-source-channel coding, and AIGC acquisition to maximize the efficiency and reliability of semantic reasoning and resource utilization. Afterward, we present a detailed implementation workflow of GAI-SCN, followed by corresponding initial simulations for performance evaluation in comparison with two benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of GAI-SCN. △ Less

Submitted 18 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: This article has been submitted to IEEE Wireless Communications Magazine for the second round of peer review after completing the major revision

arXiv:2307.01386 [pdf, other]

Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays

Authors: Yijiang Chen, Chengdong Liang, Xiao-Lei Zhang

Abstract: The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are b… ▽ More The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are built on graphs. The feature aggregation block fuses speaker features among different time and channels by a spatial-temporal GCN. The graph-based channel selection block discards the noisy channels that may contribute negatively to the system. The proposed method is flexible in incorporating various kinds of graphs and prior knowledge. We compared the proposed method with six representative methods in both real-world and simulated environments. Experimental results show that the proposed method achieves a relative equal error rate (EER) reduction of $\mathbf{15.39\%}$ lower than the strongest referenced method in the simulated datasets, and $\mathbf{17.70\%}$ lower than the latter in the real datasets. Moreover, its performance is robust across different signal-to-noise ratios and reverberation time. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.15161 [pdf, other]

Wespeaker baselines for VoxSRC2023

Authors: Shuai Wang, Chengdong Liang, Xu Xiang, Bing Han, Zhengyang Chen, Hongji Wang, Wen Ding

Abstract: This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In thi… ▽ More This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In this report, we describe the results achieved on the VoxSRC2023 dev set using the pretrained models, you can check the CodaLab evaluation server for the results on the evaluation set. △ Less

Submitted 28 June, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2305.18298 [pdf]

Optimization design of a micro-perforated panel absorber with 8.6 octave bands

Authors: Xiaoming Wang, Chen Liang, Yulin Mei

Abstract: In order to improve low-frequency characteristics of micro-perforated panel absorbers, sound absorption structures composed of micro-perforated panels and expansion chambers are design, and an optimization design method is constructed based on the transfer function model and the simulated annealing algorithm. First, a single-chamber structure composed of a micro-perforated panel and an expansion c… ▽ More In order to improve low-frequency characteristics of micro-perforated panel absorbers, sound absorption structures composed of micro-perforated panels and expansion chambers are design, and an optimization design method is constructed based on the transfer function model and the simulated annealing algorithm. First, a single-chamber structure composed of a micro-perforated panel and an expansion chamber is build, and the sound absorption curve is simulated by the finite element method. Second, for the sake of enlarging the continuous absorption bandwidth with absorption coefficients not less than 0.8, a three-chamber structure is designed, which has a sound absorption bandwidth of 1277Hz (27-1304Hz) covering 5.6 octave bands. Then, the transfer function model of the structure is established, and a series of theoretical formulae are derived to calculate the absorption coefficients. Subsequently, the sound absorption bandwidths calculated by the theoretical formulae and the finite element method are compared, and the relative error is 3.68%. Finally, an optimization design method is constructed by combining the transfer function model and the simulated annealing algorithm, where the optimization objective is to maximize the absorption bandwidth and the optimization variables are structural parameters of the three-chamber structure. The results show, after optimization, the three-chamber structure exhibits an excellent sound absorption performance, with a continuous bandwidth of 1591Hz (4-1595Hz), realizing 8.6 octave bands. △ Less

Submitted 23 April, 2023; originally announced May 2023.

arXiv:2305.15890 [pdf, ps, other]

doi 10.1109/MCOMSTD.0004.2200068

Flexible Spectrum Orchestration of Carrier Aggregation for 5G-Advanced

Authors: Xianghui Han, Chunli Liang, Ruiqi Liu, Xingguang Wei, Mengzhu Chen, Yu-Ngok Ruyue Li, Shi **

Abstract: With increasing availability of spectrum in the market due to new spectrum allocation and re-farming bands from previous cellular generation networks, a more flexible, efficient and green usage of the spectrum becomes an important topic in 5G-Advanced. In this article, we provide an overview on the 3rd Generation Partnership Project (3GPP) work on flexible spectrum orchestration for carrier aggreg… ▽ More With increasing availability of spectrum in the market due to new spectrum allocation and re-farming bands from previous cellular generation networks, a more flexible, efficient and green usage of the spectrum becomes an important topic in 5G-Advanced. In this article, we provide an overview on the 3rd Generation Partnership Project (3GPP) work on flexible spectrum orchestration for carrier aggregation (CA). The configuration settings, requirements and potential specification impacts are analyzed. Some involved Release 18 techniques, such as multi-cell scheduling, transmitter switching and network energy saving, are also presented. Evaluation results show that clear performance gain can be achieved by these techniques. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted by the IEEE Communications Standards Magazine. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material, creating new works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref: IEEE Communications Standards Magazine ( Volume: 7, Issue: 4, December 2023)

arXiv:2303.12123 [pdf, other]

Oral-3Dv2: 3D Oral Reconstruction from Panoramic X-Ray Imaging with Implicit Neural Representation

Authors: Weinan Song, Haoxin Zheng, Dezhan Tu, Chengwen Liang, Lei He

Abstract: 3D reconstruction of medical imaging from 2D images has become an increasingly interesting topic with the development of deep learning models in recent years. Previous studies in 3D reconstruction from limited X-ray images mainly rely on learning from paired 2D and 3D images, where the reconstruction quality relies on the scale and variation of collected data. This has brought significant challeng… ▽ More 3D reconstruction of medical imaging from 2D images has become an increasingly interesting topic with the development of deep learning models in recent years. Previous studies in 3D reconstruction from limited X-ray images mainly rely on learning from paired 2D and 3D images, where the reconstruction quality relies on the scale and variation of collected data. This has brought significant challenges in the collection of training data, as only a tiny fraction of patients take two types of radiation examinations in the same period. Although simulation from higher-dimension images could solve this problem, the variance between real and simulated data could bring great uncertainty at the same time. In oral reconstruction, the situation becomes more challenging as only a single panoramic X-ray image is available, where models need to infer the curved shape by prior individual knowledge. To overcome these limitations, we propose Oral-3Dv2 to solve this cross-dimension translation problem in dental healthcare by learning solely on projection information, i.e., the projection image and trajectory of the X-ray tube. Our model learns to represent the 3D oral structure in an implicit way by map** 2D coordinates into density values of voxels in the 3D space. To improve efficiency and effectiveness, we utilize a multi-head model that predicts a bunch of voxel values in 3D space simultaneously from a 2D coordinate in the axial plane and the dynamic sampling strategy to refine details of the density distribution in the reconstruction result. Extensive experiments in simulated and real data show that our model significantly outperforms existing state-of-the-art models without learning from paired images or prior individual knowledge. To the best of our knowledge, this is the first work of a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image. △ Less

Submitted 3 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.02948 [pdf, other]

A VHetNet-Enabled Asynchronous Federated Learning-Based Anomaly Detection Framework for Ubiquitous IoT

Authors: Weili Wang, Omid Abbasi, Halim Yanikomeroglu, Chengchao Liang, Lun Tang, Qianbin Chen

Abstract: Anomaly detection for the Internet of Things (IoT) is a major intelligent service required by many fields, including intrusion detection, device-activity analysis, and security supervision. However, the heterogeneous distribution of data and resource-constrained end nodes present challenges for existing anomaly detection models. Due to the advantages of flexible deployment and multi-dimensional re… ▽ More Anomaly detection for the Internet of Things (IoT) is a major intelligent service required by many fields, including intrusion detection, device-activity analysis, and security supervision. However, the heterogeneous distribution of data and resource-constrained end nodes present challenges for existing anomaly detection models. Due to the advantages of flexible deployment and multi-dimensional resources, high altitude platform stations (HAPSs) and unmanned aerial vehicles (UAVs), which are important components of vertical heterogeneous networks (VHetNets), have significant potential for sensing, computing, storage, and communication applications in ubiquitous IoT systems. In this paper, we propose a novel VHetNet-enabled asynchronous federated learning (AFL) framework to enable decentralized UAVs to collaboratively train a global anomaly detection model. In the VHetNet-enabled AFL framework, a HAPS operates as a central aerial server, and the local models trained in UAVs are uploaded to the HAPS for global aggregation due to its wide coverage and strong storage and computation capabilities. We introduce a UAV selection strategy into the AFL framework to prevent UAVs with low local model quality and large energy consumption from affecting the learning efficiency and detection accuracy of the global model. To ensure the security of transmissions between UAVs and the HAPS, we add designed noise to local model parameters in UAVs to achieve differential privacy. Moreover, we propose a compound-action actor-critic (CA2C)-based joint device association, UAV selection, and UAV trajectory planning algorithm to further enhance the overall federated execution efficiency and detection model accuracy. Extensive experimental evaluation on a real-world dataset demonstrates that the proposed algorithm can achieve high detection accuracy with short federated execution time and low energy consumption. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2211.01241 [pdf, other]

doi 10.1109/MWC.004.2200393

WiserVR: Semantic Communication Enabled Wireless Virtual Reality Delivery

Authors: Le Xia, Yao Sun, Chengsi Liang, Daquan Feng, Runze Cheng, Yang Yang, Muhammad Ali Imran

Abstract: Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using… ▽ More Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360° video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR. △ Less

Submitted 13 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: This magazine article has been accepted for publication by IEEE Wireless Communications. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2211.00941 [pdf, other]

Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames

Authors: Chengdong Liang, Xiao-Lei Zhang, BinBin Zhang, Di Wu, Shengqiang Li, Xingchen Song, Zhendong Peng, Fu** Pan

Abstract: Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small ch… ▽ More Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06% with a streaming setup. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

arXiv:2211.00323 [pdf, other]

Reconfigurable Intelligent Surface: Power Consumption Modeling and Practical Measurement Validation

Authors: **ghe Wang, Wankai Tang, **g Cheng Liang, Lei Zhang, Jun Yan Dai, Xiao Li, Shi **, Qiang Cheng, Tie Jun Cui

Abstract: The reconfigurable intelligent surface (RIS) has received a lot of interest because of its capacity to reconfigure the wireless communication environment in a cost- and energy-efficient way. However, the realistic power consumption modeling and measurement validation of RIS has received far too little attention. Therefore, in this work, we model the power consumption of RIS and conduct measurement… ▽ More The reconfigurable intelligent surface (RIS) has received a lot of interest because of its capacity to reconfigure the wireless communication environment in a cost- and energy-efficient way. However, the realistic power consumption modeling and measurement validation of RIS has received far too little attention. Therefore, in this work, we model the power consumption of RIS and conduct measurement validations using various RISs to fill this vacancy. Firstly, we propose a practical power consumption model of RIS. The RIS hardware is divided into three basic parts: the FPGA control board, the drive circuits, and the RIS unit cells. The power consumption of the first two parts is modeled as $P_{\text {static}}$ and that of the last part is modeled as $P_{\text {units}}$. Expressions of $P_{\text {static}}$ and $P_{\text {units}}$ vary amongst different types of RISs. Secondly, we conduct measurements on various RISs to validate the proposed model. Five different RISs including the PIN diode, varactor diode, and RF switch types are measured, and measurement results validate the generality and applicability of the proposed power consumption model of RIS. Finally, we summarize the measurement results and discuss the approaches to achieve the low-power-consumption design of RIS-assisted wireless communication systems. △ Less

Submitted 6 February, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.17016 [pdf, other]

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

Authors: Hongji Wang, Chengdong Liang, Shuai Wang, Zhengyang Chen, Binbin Zhang, Xu Xiang, Yanlei Deng, Yanmin Qian

Abstract: Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak… ▽ More Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speaker embedding models, loss functions, and scoring back-ends, with highly competitive results achieved by structured recipes which were adopted in the winning systems in several speaker verification challenges. The application to other downstream tasks such as speaker diarization is also exhibited in the related recipe. Moreover, CPU- and GPU-compatible deployment codes are integrated for production-oriented development. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker. △ Less

Submitted 1 November, 2022; v1 submitted 30 October, 2022; originally announced October 2022.

arXiv:2210.10265 [pdf, other]

Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays

Authors: Shupei Liu, Linfeng Feng, Yijun Gong, Chengdong Liang, Chen Zhang, Xiao-Lei Zhang, Xuelong Li

Abstract: While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distri… ▽ More While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10. △ Less

Submitted 1 April, 2024; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2207.07370 [pdf, other]

CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation

Authors: Jianwei Lin, Jiatai Lin, Cheng Lu, Hao Chen, Huan Lin, Bingchao Zhao, Zhenwei Shi, Bingjiang Qiu, Xipeng Pan, Zeyan Xu, Biao Huang, Changhong Liang, Guoqiang Han, Zaiyi Liu, Chu Han

Abstract: Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studie… ▽ More Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studies hardly consider how to fuse the multi-modality images in a reasonable manner. In this paper, we leverage the clinical knowledge of how radiologists diagnose brain tumors from multiple MRI modalities and propose a clinical knowledge-driven brain tumor segmentation model, called CKD-TransBTS. Instead of directly concatenating all the modalities, we re-organize the input modalities by separating them into two groups according to the imaging principle of MRI. A dual-branch hybrid encoder with the proposed modality-correlated cross-attention block (MCCA) is designed to extract the multi-modality image features. The proposed model inherits the strengths from both Transformer and CNN with the local feature representation ability for precise lesion boundaries and long-range feature extraction for 3D volumetric images. To bridge the gap between Transformer and CNN features, we propose a Trans&CNN Feature Calibration block (TCFC) in the decoder. We compare the proposed model with five CNN-based models and six transformer-based models on the BraTS 2021 challenge dataset. Extensive experiments demonstrate that the proposed model achieves state-of-the-art brain tumor segmentation performance compared with all the competitors. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.09157 [pdf, ps, other]

doi 10.1109/MCOM.001.2101081

Off-Network Communications For Future Railway Mobile Communication Systems: Challenges and Opportunities

Authors: Jiewen Hu, Gang Liu, Yongbo Li, Zheng Ma, Wei Wang, Chengchao Liang, F. Richard Yu, **zhi Fan

Abstract: GSM-R is predicted to be obsoleted by 2030, and a suitable successor is needed. Defined by the International Union of Railways (UIC), the Future Railway Mobile Communication System (FRMCS) contains many future use cases with strict requirements. These use cases should ensure regular communication not only in network coverage but also uncovered scenarios. There is still a lack of standards on off-n… ▽ More GSM-R is predicted to be obsoleted by 2030, and a suitable successor is needed. Defined by the International Union of Railways (UIC), the Future Railway Mobile Communication System (FRMCS) contains many future use cases with strict requirements. These use cases should ensure regular communication not only in network coverage but also uncovered scenarios. There is still a lack of standards on off-network communication in FRMCS, so this article focuses on off-network communication and intends to provide reference and direction for standardization. We first provide a comprehensive summary and analysis of off-network use cases in FRMCS. Then we give an overview of existing technologies (GSM-R, TETRA, DMR, LTE-V2X, and NR-V2X) that may support off-network communication. In addition, we simulate and evaluate the performance of existing technologies. Simulation results show that it is possible to satisfy the off-network communication requirements in FRMCS with enhancements based on LTE-V2X or NR-V2X. Finally, we give some future research directions to provide insights for industry and academia. △ Less

Submitted 10 August, 2022; v1 submitted 18 June, 2022; originally announced June 2022.

Journal ref: IEEE Communications Magazine, vol. 60, no. 10, pp. 64-70, October 2022

arXiv:2205.08390 [pdf, other]

HoVer-Trans: Anatomy-aware HoVer-Transformer for ROI-free Breast Cancer Diagnosis in Ultrasound Images

Authors: Yuhao Mo, Chu Han, Yu Liu, Min Liu, Zhenwei Shi, Jiatai Lin, Bingchao Zhao, Chunwang Huang, Bingjiang Qiu, Yanfen Cui, Lei Wu, Xipeng Pan, Zeyan Xu, Xiaomei Huang, Zaiyi Liu, Ying Wang, Changhong Liang

Abstract: Ultrasonography is an important routine examination for breast cancer diagnosis, due to its non-invasive, radiation-free and low-cost properties. However, the diagnostic accuracy of breast cancer is still limited due to its inherent limitations. It would be a tremendous success if we can precisely diagnose breast cancer by breast ultrasound images (BUS). Many learning-based computer-aided diagnost… ▽ More Ultrasonography is an important routine examination for breast cancer diagnosis, due to its non-invasive, radiation-free and low-cost properties. However, the diagnostic accuracy of breast cancer is still limited due to its inherent limitations. It would be a tremendous success if we can precisely diagnose breast cancer by breast ultrasound images (BUS). Many learning-based computer-aided diagnostic methods have been proposed to achieve breast cancer diagnosis/lesion classification. However, most of them require a pre-define ROI and then classify the lesion inside the ROI. Conventional classification backbones, such as VGG16 and ResNet50, can achieve promising classification results with no ROI requirement. But these models lack interpretability, thus restricting their use in clinical practice. In this study, we propose a novel ROI-free model for breast cancer diagnosis in ultrasound images with interpretable feature representations. We leverage the anatomical prior knowledge that malignant and benign tumors have different spatial relationships between different tissue layers, and propose a HoVer-Transformer to formulate this prior knowledge. The proposed HoVer-Trans block extracts the inter- and intra-layer spatial information horizontally and vertically. We conduct and release an open dataset GDPH&SYSUCC for breast cancer diagnosis in BUS. The proposed model is evaluated in three datasets by comparing with four CNN-based models and two vision transformer models via five-fold cross validation. It achieves state-of-the-art classification performance with the best model interpretability. In the meanwhile, our proposed model outperforms two senior sonographers on the breast cancer diagnosis when only one BUS image is given. △ Less

Submitted 15 July, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.06315 [pdf, other]

Interface Networks for Failure Localization in Power Systems

Authors: Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface… ▽ More Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface networks are validated with numerical experiments on the IEEE 118-bus test network under both DC and AC power flow models. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted to the 2022 American Control Conference (ACC 2022)

arXiv:2204.06455 [pdf, other]

WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma

Authors: Chu Han, Xipeng Pan, Lixu Yan, Huan Lin, Bingbing Li, Su Yao, Shanshan Lv, Zhenwei Shi, **hai Mai, Jiatai Lin, Bingchao Zhao, Zeyan Xu, Zhizhen Wang, Yumeng Wang, Yuan Zhang, Huihui Wang, Chao Zhu, Chunhui Lin, Lijian Mao, Min Wu, Luwen Duan, **gsong Zhu, Dong Hu, Zijie Fang, Yang Chen , et al. (18 additional authors not shown)

Abstract: Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient… ▽ More Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient pixel-level annotations, which is time-consuming and expensive. To enrich the label resources of LUAD and to alleviate the annotation efforts, we organize this challenge WSSS4LUAD to call for the outstanding weakly-supervised semantic segmentation (WSSS) techniques for histopathology images of LUAD. Participants have to design the algorithm to segment tumor epithelial, tumor-associated stroma and normal tissue with only patch-level labels. This challenge includes 10,091 patch-level annotations (the training set) and over 130 million labeled pixels (the validation and test sets), from 87 WSIs (67 from GDPH, 20 from TCGA). All the labels were generated by a pathologist-in-the-loop pipeline with the help of AI models and checked by the label review board. Among 532 registrations, 28 teams submitted the results in the test phase with over 1,000 submissions. Finally, the first place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919). According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS. Cutmix data augmentation has been widely adopted to generate more reliable samples. With the success of this challenge, we believe that WSSS approaches with patch-level annotations can be a complement to the traditional pixel annotations while reducing the annotation efforts. The entire dataset has been released to encourage more researches on computational pathology in LUAD and more novel WSSS techniques. △ Less

Submitted 13 April, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

arXiv:2202.13804 [pdf, other]

RestainNet: a self-supervised digital re-stainer for stain normalization

Authors: Bingchao Zhao, Jiatai Lin, Changhong Liang, Zongjian Yi, Xin Chen, Bingbing Li, Weihao Qiu, Danyi Li, Li Liang, Chu Han, Zaiyi Liu

Abstract: Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this p… ▽ More Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this paper, we formulated stain normalization as a digital re-staining process and proposed a self-supervised learning model, which is called RestainNet. Our network is regarded as a digital restainer which learns how to re-stain an unstained (grayscale) image. Two digital stains, Hematoxylin (H) and Eosin (E) were extracted from the original image by Beer-Lambert's Law. We proposed a staining loss to maintain the correctness of stain intensity during the restaining process. Thanks to the self-supervised nature, paired training samples are no longer necessary, which demonstrates great flexibility in practical usage. Our RestainNet outperforms existing approaches and achieves state-of-the-art performance with regard to color correctness and structure preservation. We further conducted experiments on the segmentation and classification tasks and the proposed RestainNet achieved outstanding performance compared with SOTA methods. The self-supervised design allows the network to learn any staining style with no extra effort. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2112.12389 [pdf, other]

S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation

Authors: Chen Liang, Chong Yang, **g Xu, Juyang Huang, Yongliang Wang, Yang Dong

Abstract: Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains… ▽ More Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer. △ Less

Submitted 23 December, 2021; originally announced December 2021.

arXiv:2111.03063 [pdf, other]

PDBL: Improving Histopathological Tissue Classification with Plug-and-Play Pyramidal Deep-Broad Learning

Authors: Jiatai Lin, Guoqiang Han, Xipeng Pan, Hao Chen, Danyi Li, Xi** Jia, Zhenwei Shi, Zhizhen Wang, Yanfen Cui, Haiming Li, Changhong Liang, Li Liang, Zaiyi Liu, Chu Han

Abstract: Histopathological tissue classification is a fundamental task in pathomics cancer research. Precisely differentiating different tissue types is a benefit for the downstream researches, like cancer diagnosis, prognosis and etc. Existing works mostly leverage the popular classification backbones in computer vision to achieve histopathological tissue classification. In this paper, we proposed a super… ▽ More Histopathological tissue classification is a fundamental task in pathomics cancer research. Precisely differentiating different tissue types is a benefit for the downstream researches, like cancer diagnosis, prognosis and etc. Existing works mostly leverage the popular classification backbones in computer vision to achieve histopathological tissue classification. In this paper, we proposed a super lightweight plug-and-play module, named Pyramidal Deep-Broad Learning (PDBL), for any well-trained classification backbone to further improve the classification performance without a re-training burden. We mimic how pathologists observe pathology slides in different magnifications and construct an image pyramid for the input image in order to obtain the pyramidal contextual information. For each level in the pyramid, we extract the multi-scale deep-broad features by our proposed Deep-Broad block (DB-block). We equipped PDBL in three popular classification backbones, ShuffLeNetV2, EfficientNetb0, and ResNet50 to evaluate the effectiveness and efficiency of our proposed module on two datasets (Kather Multiclass Dataset and the LC25000 Dataset). Experimental results demonstrate the proposed PDBL can steadily improve the tissue-level classification performance for any CNN backbones, especially for the lightweight models when given a small among of training samples (less than 10%), which greatly saves the computational time and annotation efforts. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

arXiv:2110.08048 [pdf, other]

Multi-Layer Pseudo-Supervision for Histopathology Tissue Semantic Segmentation using Patch-level Classification Labels

Authors: Chu Han, Jiatai Lin, **hai Mai, Yi Wang, Qingling Zhang, Bingchao Zhao, Xin Chen, Xipeng Pan, Zhenwei Shi, Xiaowei Xu, Su Yao, Lixu Yan, Huan Lin, Zeyan Xu, Xiaomei Huang, Guoqiang Han, Changhong Liang, Zaiyi Liu

Abstract: Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on hi… ▽ More Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on histopathology images, finally reducing the annotation efforts. We proposed a two-step model including a classification and a segmentation phases. In the classification phase, we proposed a CAM-based model to generate pseudo masks by patch-level labels. In the segmentation phase, we achieved tissue semantic segmentation by our proposed Multi-Layer Pseudo-Supervision. Several technical novelties have been proposed to reduce the information gap between pixel-level and patch-level annotations. As a part of this paper, we introduced a new weakly-supervised semantic segmentation (WSSS) dataset for lung adenocarcinoma (LUAD-HistoSeg). We conducted several experiments to evaluate our proposed model on two datasets. Our proposed model outperforms two state-of-the-art WSSS approaches. Note that we can achieve comparable quantitative and qualitative results with the fully-supervised model, with only around a 2\% gap for MIoU and FwIoU. By comparing with manual labeling, our model can greatly save the annotation time from hours to minutes. The source code is available at: \url{https://github.com/ChuHan89/WSSS-Tissue}. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: 15 pages, 10 figures, journal

MSC Class: 68U10 ACM Class: I.4.6

arXiv:2110.05975 [pdf, other]

Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays

Authors: Chengdong Liang, Yijiang Chen, Jiadi Yao, Xiao-Lei Zhang

Abstract: Speaker verification based on ad-hoc microphone arrays has the potential of reducing the error significantly in adverse acoustic environments. However, existing approaches extract utterance-level speaker embeddings from each channel of an ad-hoc microphone array, which does not consider fully the spatial-temporal information across the devices. In this paper, we propose to aggregate the multichann… ▽ More Speaker verification based on ad-hoc microphone arrays has the potential of reducing the error significantly in adverse acoustic environments. However, existing approaches extract utterance-level speaker embeddings from each channel of an ad-hoc microphone array, which does not consider fully the spatial-temporal information across the devices. In this paper, we propose to aggregate the multichannel signals of the ad-hoc microphone array at the frame-level by exploring the cross-channel information deeply with two attention mechanisms. The first one is a self-attention method. It consists of a cross-frame self-attention layer and a cross-channel self-attention layer successively, both working at the frame level. The second one learns the cross-frame and cross-channel information via two graph attention layers. Experimental results demonstrate that the proposed methods reach the state-of-the-art performance. Moreover, the graph-attention method is better than the self-attention method in most cases. △ Less

Submitted 28 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

Comments: 5 pages, 3 figures

arXiv:2107.05859 [pdf, other]

AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data

Authors: Menglong Xu, Shengqiang Li, Chengdong Liang, Xiao-Lei Zhang

Abstract: Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without… ▽ More Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without taking the unseen sounds into account. To enhance the robustness of the deep neural networks based KWS, in this paper, we introduce a new loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC). The proposed method not only maximizes the classification accuracy of keywords on the closed training set, but also maximizes the AUC score for optimizing the performance of non-keyword segments detection. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves new state-of-the-art performance in terms of most evaluation metrics. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: submitted to ASRU2021

arXiv:2107.00178 [pdf, other]

Attention-based multi-channel speaker verification with ad-hoc microphone arrays

Authors: Chengdong Liang, Junqi Chen, Shanzheng Guan, Xiao-Lei Zhang

Abstract: Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel s… ▽ More Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel speaker verification with ad-hoc microphone arrays. Specifically, we add an inter-channel processing layer and a global fusion layer after the pooling layer of a single-channel speaker verification system. The inter-channel processing layer applies a so-called residual self-attention along the channel dimension for allocating weights to different microphones. The global fusion layer integrates all channels in a way that is independent to the number of the input channels. We further replace the softmax operator in the residual self-attention with sparsemax, which forces the channel weights of very noisy channels to zero. Experimental results with ad-hoc microphone arrays of over 30 channels demonstrate the effectiveness of the proposed methods. For example, the multi-channel speaker verification with sparsemax achieves an equal error rate (EER) of over 20% lower than oracle one-best system on semi-real data sets, and over 30% lower on simulation data sets, in test scenarios with both matched and mismatched channel numbers. △ Less

Submitted 30 June, 2021; originally announced July 2021.

Comments: Submitted to APSIPA ASC 2021

arXiv:2105.05234 [pdf, other]

A Spectral Representation of Power Systems with Applications to Adaptive Grid Partitioning and Cascading Failure Localization

Authors: Alessandro Zocca, Chen Liang, Linqi Guo, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing betw… ▽ More Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing between non-cut and cut set outages. Using this spectral representation of power systems, we identify the crucial graph sub-structure that ensures line failure localization -- the network bridge-block decomposition. Leveraging this theory, we propose an adaptive network topology reconfiguration paradigm that uses a two-stage algorithm where the first stage aims to identify optimal clusters using the notion of network modularity and the second stage refines the clusters by means of optimal line switching actions. Our proposed methodology is illustrated using extensive numerical examples on standard IEEE networks and we discussed several extensions and variants of the proposed algorithm. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: 45 pages, 7 figures

arXiv:2103.15722

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Authors: Chengdong Liang, Menglong Xu, Xiao-Lei Zhang

Abstract: Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship bet… ▽ More Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship between adjacent signals ignored. To address this issue, in this paper, we introduce relative-position-awareness self-attention (RPSA). It not only maintains the global-range dependency modeling ability of self-attention, but also improves the localness modeling ability. Because the local window length of the original RPSA is fixed and sensitive to different test data, here we propose Gaussian-based self-attention (GSA) whose window length is learnable and adaptive to the test data automatically. We further generalize GSA to a new residual Gaussian self-attention (resGSA) for the performance improvement. We apply RPSA, GSA, and resGSA to Transformer-based speech recognition respectively. Experimental results on the AISHELL-1 Mandarin speech recognition corpus demonstrate the effectiveness of the proposed methods. For example, the resGSA-Transformer achieves a character error rate (CER) of 5.86% on the test set, which is relative 7.8% lower than that of the SA-Transformer. Although the performance of the proposed resGSA-Transformer is only slightly better than that of the RPSA-Transformer, it does not have to tune the window length manually. △ Less

Submitted 8 October, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: There is an error in the description of section 3.2.1

arXiv:2102.10260 [pdf, other]

Wireless sensor network for in situ soil moisture monitoring

Authors: Jianing Fang, Chuheng Hu, Nour Smaoui, Doug Carlson, Jayant Gupchup, Razvan Musaloiu-E., Chieh-Jan Mike Liang, Marcus Chang, Omprakash Gnawali, Tamas Budavari, Andreas Terzis, Katalin Szlavecz, Alexander S. Szalay

Abstract: We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm… ▽ More We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: 12 pages, 16 figures, Sensornets 2021 Conference

arXiv:2012.04264 [pdf, other]

Raw Image Deblurring

Authors: Chih-Hung Liang, Yu-An Chen, Yueh-Cheng Liu, Winston H. Hsu

Abstract: Deep learning-based blind image deblurring plays an essential role in solving image blur since all existing kernels are limited in modeling the real world blur. Thus far, researchers focus on powerful models to handle the deblurring problem and achieve decent results. For this work, in a new aspect, we discover the great opportunity for image enhancement (e.g., deblurring) directly from RAW images… ▽ More Deep learning-based blind image deblurring plays an essential role in solving image blur since all existing kernels are limited in modeling the real world blur. Thus far, researchers focus on powerful models to handle the deblurring problem and achieve decent results. For this work, in a new aspect, we discover the great opportunity for image enhancement (e.g., deblurring) directly from RAW images and investigate novel neural network structures benefiting RAW-based learning. However, to the best of our knowledge, there is no available RAW image deblurring dataset. Therefore, we built a new dataset containing both RAW images and processed sRGB images and design a new model to utilize the unique characteristics of RAW images. The proposed deblurring model, trained solely from RAW images, achieves the state-of-art performance and outweighs those trained on processed sRGB images. Furthermore, with fine-tuning, the proposed model, trained on our new dataset, can generalize to other sensors. Additionally, by a series of experiments, we demonstrate that existing deblurring models can also be improved by training on the RAW images in our new dataset. Ultimately, we show a new venue for further opportunities based on the devised novel raw-based deblurring method and the brand-new Deblur-RAW dataset. △ Less

Submitted 8 December, 2020; originally announced December 2020.

Comments: IEEE Transactions on Multimedia

arXiv:2005.11320 [pdf, ps, other]

doi 10.1109/TPWRS.2021.3068048

Line Failure Localization of Power Networks Part II: Cut Set Outages

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of… ▽ More Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of cut set failures, are fully determined by the positions in the network of the buses that participate in load balancing. We then extend our results to general cut set outages. In contrast to non-cut outages discussed in Part I whose subsequent line failures are contained within the original blocks, cut set outages typically impact the whole network, affecting the power flows on all remaining lines. We corroborate our analytical results in both parts using the IEEE 118-bus test system, in which the failure propagation patterns exhibit a clear block-diagonal structure predicted by our theory, even when using full AC power flow equations. △ Less

Submitted 23 April, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: arXiv admin note: text overlap with arXiv:1803.08551

arXiv:2005.11319 [pdf, other]

Adaptive Network Response to Line Failures in Power Systems

Authors: Chen Liang, Linqi Guo, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the… ▽ More Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the unified control operates coincide with the bridge-blocks of the network, the proposed strategy drives the post-contingency system to a steady state where the impact of initial line outages is localized within the areas where they occurred whenever possible, stop** the cascading process. When the initial line outages cannot be localized, the proposed control strategy provides a configurable design that progressively involves and coordinates more balancing areas. We compare the proposed control strategy with the classical Automatic Generation Control (AGC) on the IEEE 118-bus and 2736-bus test networks. Simulation results show that our strategy greatly improves overall reliability in terms of the N-k security standard, and localizes the impact of initial failures in the majority of the simulated contingencies. Moreover, the proposed framework incurs significantly less load loss, if any, compared to AGC, in all our case studies. △ Less

Submitted 12 May, 2022; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: Accepted to IEEE Transactions on Control of Network Systems. arXiv admin note: text overlap with arXiv:1904.05461

arXiv:2005.10199 [pdf, ps, other]

doi 10.1109/TPWRS.2021.3066336

Line Failure Localization of Power Networks Part I: Non-cut Outages

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms… ▽ More Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms of topological structures. Our results show that the distribution of specific collections of subtrees of the transmission network plays a critical role on the patterns of power redistribution, and motivates the block decomposition of the transmission network as a structure to understand long-distance propagation of disturbances. In Part I of this paper, we present the case when the post-contingency network remains connected after an initial set of lines are disconnected simultaneously. In Part II, we present the case when an outage separates the network into multiple islands. △ Less

Submitted 23 April, 2021; v1 submitted 20 May, 2020; originally announced May 2020.

arXiv:2004.10401 [pdf, other]

An Integrated Approach for Failure Mitigation & Localization in Power Systems

Authors: Chen Liang, Linqi Guo, Alessandro Zocca, Shuyue Yu, Steven H. Low, Adam Wierman

Abstract: The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the contr… ▽ More The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the control areas are connected in a tree structure; and (ii) leverages a unified frequency control paradigm to provide congestion management in real time. Even though the proposed topology reduces redundancy, the integration of tree structure at regional level and real-time congestion management can provide stronger guarantees on failure localization and mitigation. We illustrate our approach on the IEEE 39-bus network and evaluate its performance on the IEEE 118-bus, 179-bus, 200-bus and 240-bus networks with various network congestion conditions. Simulations show that, compared with the traditional approach, our approach not only prevents load shedding in more failure scenarios, but also incurs smaller amounts of load loss in scenarios where load shedding is inevitable. Moreover, generators under our approach adjust their operations more actively and efficiently in a local manner. △ Less

Submitted 22 April, 2020; originally announced April 2020.

Comments: Accepted to the 21st Power Systems Computation Conference (PSCC 2020)

arXiv:2001.04198 [pdf, ps, other]

Predefined-time Terminal Sliding Mode Control of Robot Manipulators

Authors: Chang-Duo Liang, Ming-Feng Ge, Zhi-Wei Liu, Yan-Wu Wang, Hamid Reza Karimi

Abstract: In this paper, we present a new terminal sliding mode control to achieve predefined-time stability of robot manipulators. The proposed control is developed based on a novel predefined-time terminal sliding mode (PTSM) surface, on which the states are forced to reach the origin in a predefined time, i.e., the settling time is independent to the initial condition and can be explicitly user-defined v… ▽ More In this paper, we present a new terminal sliding mode control to achieve predefined-time stability of robot manipulators. The proposed control is developed based on a novel predefined-time terminal sliding mode (PTSM) surface, on which the states are forced to reach the origin in a predefined time, i.e., the settling time is independent to the initial condition and can be explicitly user-defined via adjusting some specific parameters called the predefined-time parameters. It is also demonstrated that the proposed control can provide satisfactory steady-state performance in the case of both external disturbances and parametric uncertainties. Besides, we present a formal systemic analysis method to derive the sufficient conditions for guaranteeing the predefined-time convergence of the closed-loop system. Finally, the effectiveness and performance of the presented control scheme are illustrated through both theoretical comparisons and numerical simulations. △ Less

Submitted 25 April, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

Comments: 10 pages, 9 figures, This draft is not intended for publication

arXiv:1912.12825 [pdf, other]

Neural Architecture Search on Acoustic Scene Classification

Authors: Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang, Xiangxiang Chu

Abstract: Convolutional neural networks are widely adopted in Acoustic Scene Classification (ASC) tasks, but they generally carry a heavy computational burden. In this work, we propose a lightweight yet high-performing baseline network inspired by MobileNetV2, which replaces square convolutional kernels with unidirectional ones to extract features alternately in temporal and frequency dimensions. Furthermor… ▽ More Convolutional neural networks are widely adopted in Acoustic Scene Classification (ASC) tasks, but they generally carry a heavy computational burden. In this work, we propose a lightweight yet high-performing baseline network inspired by MobileNetV2, which replaces square convolutional kernels with unidirectional ones to extract features alternately in temporal and frequency dimensions. Furthermore, we explore a dynamic architecture space built on the basis of the proposed baseline with the recent Neural Architecture Search (NAS) paradigm, which first trains a supernet that incorporates all candidate networks and then applies a well-known evolutionary algorithm NSGA-II to discover more efficient networks with higher accuracy and lower computational cost. Experimental results demonstrate that our searched network is competent in ASC tasks, which achieves 90.3% F1-score on the DCASE2018 task 5 evaluation set, marking a new state-of-the-art performance while saving 25% of FLOPs compared to our baseline network. △ Less

Submitted 5 August, 2020; v1 submitted 30 December, 2019; originally announced December 2019.

Comments: Accepted to Interspeech 2020

arXiv:1905.03277 [pdf, other]

doi 10.1145/3306346.3323024

Handheld Multi-Frame Super-Resolution

Authors: Bartlomiej Wronski, Ignacio Garcia-Dorado, Manfred Ernst, Damien Kelly, Michael Krainin, Chia-Kai Liang, Marc Levoy, Peyman Milanfar

Abstract: Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-f… ▽ More Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone. △ Less

Submitted 16 February, 2021; v1 submitted 8 May, 2019; originally announced May 2019.

Comments: 24 pages, accepted to Siggraph 2019 Technical Papers program

arXiv:1904.06892 [pdf]

doi 10.1109/ACCESS.2019.2909579

Learning to Guide: Guidance Law Based on Deep Meta-learning and Model Predictive Path Integral Control

Authors: Chen Liang, Weihong Wang, Zhenghua Liu, Chao Lai, Benchun Zhou

Abstract: In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However the traditional MPPI framework assumes the actual environment similar to the train… ▽ More In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However the traditional MPPI framework assumes the actual environment similar to the training dataset for the deep neural network which is impractical in practice with different maneuvering of target, other perturbations and actuator failures. To address this problem, our method utilize meta-learning technique to make the deep neural dynamics model adapt to such changes online. With this approach we can alleviate the performance deterioration of standard MPPI control caused by the difference between actual environment and training data. Then, a novel guidance law for a varying velocity interceptor intercepting maneuvering target with desired terminal impact angle under actuator failure is constructed based on aforementioned techniques. Simulation and experiment results under different cases show the effectiveness and robustness of the proposed guidance law in achieving successful interceptions of maneuvering target. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: Code available at https://github.com/tccliangchen/deep_meta-learning_guidance_law . in IEEE Access 2019

arXiv:1904.05461 [pdf, ps, other]

Less is More: Real-time Failure Localization in Power Systems

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Cascading failures in power systems exhibit non-local propagation patterns which make the analysis and mitigation of failures difficult. In this work, we propose a distributed control framework inspired by the recently proposed concepts of unified controller and network tree-partition that offers strong guarantees in both the mitigation and localization of cascading failures in power systems. In t… ▽ More Cascading failures in power systems exhibit non-local propagation patterns which make the analysis and mitigation of failures difficult. In this work, we propose a distributed control framework inspired by the recently proposed concepts of unified controller and network tree-partition that offers strong guarantees in both the mitigation and localization of cascading failures in power systems. In this framework, the transmission network is partitioned into several control areas which are connected in a tree structure, and the unified controller is adopted by generators or controllable loads for fast timescale disturbance response. After an initial failure, the proposed strategy always prevents successive failures from happening, and regulates the system to the desired steady state where the impact of initial failures are localized as much as possible. For extreme failures that cannot be localized, the proposed framework has a configurable design, that progressively involves and coordinates more control areas for failure mitigation and, as a last resort, imposes minimal load shedding. We compare the proposed control framework with Automatic Generation Control (AGC) on the IEEE 118-bus test system. Simulation results show that our novel framework greatly improves the system robustness in terms of the N-1 security standard, and localizes the impact of initial failures in majority of the load profiles that are examined. Moreover, the proposed framework incurs significantly less load loss, if any, compared to AGC, in all of our case studies. △ Less

Submitted 18 April, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

arXiv:1901.02069 [pdf]

Microwave Integrated Circuits Design with Relational Induction Neural Network

Authors: Jie Liu, Zhi-Xi Chen, Wen-Hui Dong, Xiao Wang, Jia Shi, Hong-Liang Teng, Xi-Wang Dai, Stephen S. -T. Yau, Chang-Hong Liang, **-Fa Feng

Abstract: The automation design of microwave integrated circuits (MWIC) has long been viewed as a fundamental challenge for artificial intelligence owing to its larger solution space and structural complexity than Go. Here, we developed a novel artificial agent, termed Relational Induction Neural Network, that can lead to an automotive design of MWIC and avoid brute-force computing to examine every possible… ▽ More The automation design of microwave integrated circuits (MWIC) has long been viewed as a fundamental challenge for artificial intelligence owing to its larger solution space and structural complexity than Go. Here, we developed a novel artificial agent, termed Relational Induction Neural Network, that can lead to an automotive design of MWIC and avoid brute-force computing to examine every possible solution, which is a significant breakthrough in the field of electronics. Through the experiments on microwave transmission line circuit, filter circuit and antenna circuit design tasks, strongly competitive results are obtained respectively. Compared with the traditional reinforcement learning method, the learning curve shows that the proposed architecture is able to quickly converge to the pre-designed MWIC model and the convergence rate is up to four orders of magnitude. This is the first study which has been shown that an agent through training or learning to automatically induct the relationship between MWIC's structures without incorporating any of the additional prior knowledge. Notably, the relationship can be explained in terms of the MWIC theory and electromagnetic field distribution. Our work bridges the divide between artificial intelligence and MWIC and can extend to mechanical wave, mechanics and other related fields. △ Less

Submitted 3 January, 2019; originally announced January 2019.

arXiv:1811.08111 [pdf, other]

doi 10.1109/ICASSP.2019.8682380

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

Authors: **g-Xuan Zhang, Zhen-Hua Ling, Yuan Jiang, Li-Juan Liu, Chen Liang, Li-Rong Dai

Abstract: This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text tra… ▽ More This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Comments: 5 pages, 4 figures, 2 tables. Submitted to IEEE ICASSP 2019

Journal ref: IEEE International Conference on Acoustic, Speech and Signal Processing (2019) 6785-6789

arXiv:1803.08551 [pdf, other]

Failure Localization in Power Systems via Tree Partitions

Authors: Linqi Guo, Chen Liang, Alessandro Zocca, Steven H. Low, Adam Wierman

Abstract: Cascading failures in power systems propagate non-locally, making the control and mitigation of outages extremely hard. In this work, we use the emerging concept of the tree partition of transmission networks to provide an analytical characterization of line failure localizability in transmission systems. Our results rigorously establish the well perceived intuition in power community that failure… ▽ More Cascading failures in power systems propagate non-locally, making the control and mitigation of outages extremely hard. In this work, we use the emerging concept of the tree partition of transmission networks to provide an analytical characterization of line failure localizability in transmission systems. Our results rigorously establish the well perceived intuition in power community that failures cannot cross bridges, and reveal a finer-grained concept that encodes more precise information on failure propagations within tree-partition regions. Specifically, when a non-bridge line is tripped, the impact of this failure only propagates within well-defined components, which we refer to as cells, of the tree partition defined by the bridges. In contrast, when a bridge line is tripped, the impact of this failure propagates globally across the network, affecting the power flow on all remaining transmission lines. This characterization suggests that it is possible to improve the system robustness by temporarily switching off certain transmission lines, so as to create more, smaller components in the tree partition; thus spatially localizing line failures and making the grid less vulnerable to large-scale outages. We illustrate this approach using the IEEE 118-bus test system and demonstrate that switching off a negligible portion of transmission lines allows the impact of line failures to be significantly more localized without substantial changes in line congestion. △ Less

Submitted 16 August, 2018; v1 submitted 22 March, 2018; originally announced March 2018.

Showing 1–43 of 43 results for author: Liang, C