-
Speech-based Clinical Depression Screening: An Empirical Study
Authors:
Yangbin Chen,
Chenyang Xu,
Chunfeng Liang,
Yanbao Tao,
Chuan Shi
Abstract:
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists followin…
▽ More
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading tasks. Segment duration and quantity significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features.
△ Less
Submitted 12 June, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Generative AI-driven Semantic Communication Networks: Architecture, Technologies and Applications
Authors:
Chengsi Liang,
Hongyang Du,
Yao Sun,
Dusit Niyato,
Jiawen Kang,
Dezong Zhao,
Muhammad Ali Imran
Abstract:
Generative artificial intelligence (GAI) has emerged as a rapidly burgeoning field demonstrating significant potential in creating diverse contents intelligently and automatically. To support such artificial intelligence-generated content (AIGC) services, future communication systems should fulfill much more stringent requirements (including data rate, throughput, latency, etc.) with limited yet p…
▽ More
Generative artificial intelligence (GAI) has emerged as a rapidly burgeoning field demonstrating significant potential in creating diverse contents intelligently and automatically. To support such artificial intelligence-generated content (AIGC) services, future communication systems should fulfill much more stringent requirements (including data rate, throughput, latency, etc.) with limited yet precious spectrum resources. To tackle this challenge, semantic communication (SemCom), dramatically reducing resource consumption via extracting and transmitting semantics, has been deemed as a revolutionary communication scheme. The advanced GAI algorithms facilitate SemCom on sophisticated intelligence for model training, knowledge base construction and channel adaption. Furthermore, GAI algorithms also play an important role in the management of SemCom networks. In this survey, we first overview the basics of GAI and SemCom as well as the synergies of the two technologies. Especially, the GAI-driven SemCom framework is presented, where many GAI models for information creation, SemCom-enabled information transmission and information effectiveness for AIGC are discussed separately. We then delve into the GAI-driven SemCom network management involving with novel management layers, knowledge management, and resource allocation. Finally, we envision several promising use cases, i.e., autonomous driving, smart city, and the Metaverse for a more comprehensive exploration.
△ Less
Submitted 7 January, 2024; v1 submitted 29 December, 2023;
originally announced January 2024.
-
A Multi-day Needs-based Modeling Approach for Activity and Travel Demand Analysis
Authors:
Kexin Chen,
**** Guan,
Ravi Seshadri,
Varun Pattabhiraman,
Youssef Medhat Aboutaleb,
Ali Shamshiripour,
Chen Liang,
Xiaochun Zhang,
Moshe Ben-Akiva
Abstract:
This paper proposes a multi-day needs-based model for activity and travel demand analysis. The model captures the multi-day dynamics in activity generation, which enables the modeling of activities with increased flexibility in time and space (e.g., e-commerce and remote working). As an enhancement to activity-based models, the proposed model captures the underlying decision-making process of acti…
▽ More
This paper proposes a multi-day needs-based model for activity and travel demand analysis. The model captures the multi-day dynamics in activity generation, which enables the modeling of activities with increased flexibility in time and space (e.g., e-commerce and remote working). As an enhancement to activity-based models, the proposed model captures the underlying decision-making process of activity generation by accounting for psychological needs as the drivers of activities. The level of need satisfaction is modeled as a psychological inventory, whose utility is optimized via decisions on activity participation, location, and duration. The utility includes both the benefit in the inventory gained and the cost in time, monetary expense as well as maintenance of safety stock. The model includes two sub-models, a Deterministic Model that optimizes the utility of the inventory, and an Empirical Model that accounts for heterogeneity and stochasticity. Numerical experiments are conducted to demonstrate model scalability. A maximum likelihood estimator is proposed, the properties of the log-likelihood function are examined and the recovery of true parameters is tested. This research contributes to the literature on transportation demand models in the following three aspects. First, it is arguably better grounded in psychological theory than traditional models and allows the generation of activity patterns to be policy-sensitive (while avoiding the need for ad hoc utility definitions). Second, it contributes to the development of needs-based models with a non-myopic approach to model multi-day activity patterns. Third, it proposes a tractable model formulation via problem reformulation and computational enhancements, which allows for maximum likelihood parameter estimation.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
Generative AI for Semantic Communication: Architecture, Challenges, and Outlook
Authors:
Le Xia,
Yao Sun,
Chengsi Liang,
Lei Zhang,
Muhammad Ali Imran,
Dusit Niyato
Abstract:
Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorpo…
▽ More
Semantic communication (SemCom) is expected to be a core paradigm in future communication networks, yielding significant benefits in terms of spectrum resource saving and information interaction efficiency. However, the existing SemCom structure is limited by the lack of context-reasoning ability and background knowledge provisioning, which, therefore, motivates us to seek the potential of incorporating generative artificial intelligence (GAI) technologies with SemCom. Recognizing GAI's powerful capability in automating and creating valuable, diverse, and personalized multimodal content, this article first highlights the principal characteristics of the combination of GAI and SemCom along with their pertinent benefits and challenges. To tackle these challenges, we further propose a novel GAI-assisted SemCom network (GAI-SCN) framework in a cloud-edge-mobile design. Specifically, by employing global and local GAI models, our GAI-SCN enables multimodal semantic content provisioning, semantic-level joint-source-channel coding, and AIGC acquisition to maximize the efficiency and reliability of semantic reasoning and resource utilization. Afterward, we present a detailed implementation workflow of GAI-SCN, followed by corresponding initial simulations for performance evaluation in comparison with two benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of GAI-SCN.
△ Less
Submitted 18 January, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Spatial-temporal Graph Based Multi-channel Speaker Verification With Ad-hoc Microphone Arrays
Authors:
Yijiang Chen,
Chengdong Liang,
Xiao-Lei Zhang
Abstract:
The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are b…
▽ More
The performance of speaker verification degrades significantly in adverse acoustic environments with strong reverberation and noise. To address this issue, this paper proposes a spatial-temporal graph convolutional network (GCN) method for the multi-channel speaker verification with ad-hoc microphone arrays. It includes a feature aggregation block and a channel selection block, both of which are built on graphs. The feature aggregation block fuses speaker features among different time and channels by a spatial-temporal GCN. The graph-based channel selection block discards the noisy channels that may contribute negatively to the system. The proposed method is flexible in incorporating various kinds of graphs and prior knowledge. We compared the proposed method with six representative methods in both real-world and simulated environments.
Experimental results show that the proposed method achieves a relative equal error rate (EER) reduction of $\mathbf{15.39\%}$ lower than the strongest referenced method in the simulated datasets, and $\mathbf{17.70\%}$ lower than the latter in the real datasets. Moreover, its performance is robust across different signal-to-noise ratios and reverberation time.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Wespeaker baselines for VoxSRC2023
Authors:
Shuai Wang,
Chengdong Liang,
Xu Xiang,
Bing Han,
Zhengyang Chen,
Hongji Wang,
Wen Ding
Abstract:
This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In thi…
▽ More
This report showcases the results achieved using the wespeaker toolkit for the VoxSRC2023 Challenge. Our aim is to provide participants, especially those with limited experience, with clear and straightforward guidelines to develop their initial systems. Via well-structured recipes and strong results, we hope to offer an accessible and good enough start point for all interested individuals. In this report, we describe the results achieved on the VoxSRC2023 dev set using the pretrained models, you can check the CodaLab evaluation server for the results on the evaluation set.
△ Less
Submitted 28 June, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Optimization design of a micro-perforated panel absorber with 8.6 octave bands
Authors:
Xiaoming Wang,
Chen Liang,
Yulin Mei
Abstract:
In order to improve low-frequency characteristics of micro-perforated panel absorbers, sound absorption structures composed of micro-perforated panels and expansion chambers are design, and an optimization design method is constructed based on the transfer function model and the simulated annealing algorithm. First, a single-chamber structure composed of a micro-perforated panel and an expansion c…
▽ More
In order to improve low-frequency characteristics of micro-perforated panel absorbers, sound absorption structures composed of micro-perforated panels and expansion chambers are design, and an optimization design method is constructed based on the transfer function model and the simulated annealing algorithm. First, a single-chamber structure composed of a micro-perforated panel and an expansion chamber is build, and the sound absorption curve is simulated by the finite element method. Second, for the sake of enlarging the continuous absorption bandwidth with absorption coefficients not less than 0.8, a three-chamber structure is designed, which has a sound absorption bandwidth of 1277Hz (27-1304Hz) covering 5.6 octave bands. Then, the transfer function model of the structure is established, and a series of theoretical formulae are derived to calculate the absorption coefficients. Subsequently, the sound absorption bandwidths calculated by the theoretical formulae and the finite element method are compared, and the relative error is 3.68%. Finally, an optimization design method is constructed by combining the transfer function model and the simulated annealing algorithm, where the optimization objective is to maximize the absorption bandwidth and the optimization variables are structural parameters of the three-chamber structure. The results show, after optimization, the three-chamber structure exhibits an excellent sound absorption performance, with a continuous bandwidth of 1591Hz (4-1595Hz), realizing 8.6 octave bands.
△ Less
Submitted 23 April, 2023;
originally announced May 2023.
-
Flexible Spectrum Orchestration of Carrier Aggregation for 5G-Advanced
Authors:
Xianghui Han,
Chunli Liang,
Ruiqi Liu,
Xingguang Wei,
Mengzhu Chen,
Yu-Ngok Ruyue Li,
Shi **
Abstract:
With increasing availability of spectrum in the market due to new spectrum allocation and re-farming bands from previous cellular generation networks, a more flexible, efficient and green usage of the spectrum becomes an important topic in 5G-Advanced. In this article, we provide an overview on the 3rd Generation Partnership Project (3GPP) work on flexible spectrum orchestration for carrier aggreg…
▽ More
With increasing availability of spectrum in the market due to new spectrum allocation and re-farming bands from previous cellular generation networks, a more flexible, efficient and green usage of the spectrum becomes an important topic in 5G-Advanced. In this article, we provide an overview on the 3rd Generation Partnership Project (3GPP) work on flexible spectrum orchestration for carrier aggregation (CA). The configuration settings, requirements and potential specification impacts are analyzed. Some involved Release 18 techniques, such as multi-cell scheduling, transmitter switching and network energy saving, are also presented. Evaluation results show that clear performance gain can be achieved by these techniques.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Oral-3Dv2: 3D Oral Reconstruction from Panoramic X-Ray Imaging with Implicit Neural Representation
Authors:
Weinan Song,
Haoxin Zheng,
Dezhan Tu,
Chengwen Liang,
Lei He
Abstract:
3D reconstruction of medical imaging from 2D images has become an increasingly interesting topic with the development of deep learning models in recent years. Previous studies in 3D reconstruction from limited X-ray images mainly rely on learning from paired 2D and 3D images, where the reconstruction quality relies on the scale and variation of collected data. This has brought significant challeng…
▽ More
3D reconstruction of medical imaging from 2D images has become an increasingly interesting topic with the development of deep learning models in recent years. Previous studies in 3D reconstruction from limited X-ray images mainly rely on learning from paired 2D and 3D images, where the reconstruction quality relies on the scale and variation of collected data. This has brought significant challenges in the collection of training data, as only a tiny fraction of patients take two types of radiation examinations in the same period. Although simulation from higher-dimension images could solve this problem, the variance between real and simulated data could bring great uncertainty at the same time. In oral reconstruction, the situation becomes more challenging as only a single panoramic X-ray image is available, where models need to infer the curved shape by prior individual knowledge. To overcome these limitations, we propose Oral-3Dv2 to solve this cross-dimension translation problem in dental healthcare by learning solely on projection information, i.e., the projection image and trajectory of the X-ray tube. Our model learns to represent the 3D oral structure in an implicit way by map** 2D coordinates into density values of voxels in the 3D space. To improve efficiency and effectiveness, we utilize a multi-head model that predicts a bunch of voxel values in 3D space simultaneously from a 2D coordinate in the axial plane and the dynamic sampling strategy to refine details of the density distribution in the reconstruction result. Extensive experiments in simulated and real data show that our model significantly outperforms existing state-of-the-art models without learning from paired images or prior individual knowledge. To the best of our knowledge, this is the first work of a non-adversarial-learning-based model in 3D radiology reconstruction from a single panoramic X-ray image.
△ Less
Submitted 3 September, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
A VHetNet-Enabled Asynchronous Federated Learning-Based Anomaly Detection Framework for Ubiquitous IoT
Authors:
Weili Wang,
Omid Abbasi,
Halim Yanikomeroglu,
Chengchao Liang,
Lun Tang,
Qianbin Chen
Abstract:
Anomaly detection for the Internet of Things (IoT) is a major intelligent service required by many fields, including intrusion detection, device-activity analysis, and security supervision. However, the heterogeneous distribution of data and resource-constrained end nodes present challenges for existing anomaly detection models. Due to the advantages of flexible deployment and multi-dimensional re…
▽ More
Anomaly detection for the Internet of Things (IoT) is a major intelligent service required by many fields, including intrusion detection, device-activity analysis, and security supervision. However, the heterogeneous distribution of data and resource-constrained end nodes present challenges for existing anomaly detection models. Due to the advantages of flexible deployment and multi-dimensional resources, high altitude platform stations (HAPSs) and unmanned aerial vehicles (UAVs), which are important components of vertical heterogeneous networks (VHetNets), have significant potential for sensing, computing, storage, and communication applications in ubiquitous IoT systems. In this paper, we propose a novel VHetNet-enabled asynchronous federated learning (AFL) framework to enable decentralized UAVs to collaboratively train a global anomaly detection model. In the VHetNet-enabled AFL framework, a HAPS operates as a central aerial server, and the local models trained in UAVs are uploaded to the HAPS for global aggregation due to its wide coverage and strong storage and computation capabilities. We introduce a UAV selection strategy into the AFL framework to prevent UAVs with low local model quality and large energy consumption from affecting the learning efficiency and detection accuracy of the global model. To ensure the security of transmissions between UAVs and the HAPS, we add designed noise to local model parameters in UAVs to achieve differential privacy. Moreover, we propose a compound-action actor-critic (CA2C)-based joint device association, UAV selection, and UAV trajectory planning algorithm to further enhance the overall federated execution efficiency and detection model accuracy. Extensive experimental evaluation on a real-world dataset demonstrates that the proposed algorithm can achieve high detection accuracy with short federated execution time and low energy consumption.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
WiserVR: Semantic Communication Enabled Wireless Virtual Reality Delivery
Authors:
Le Xia,
Yao Sun,
Chengsi Liang,
Daquan Feng,
Runze Cheng,
Yang Yang,
Muhammad Ali Imran
Abstract:
Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using…
▽ More
Virtual reality (VR) over wireless is expected to be one of the killer applications in next-generation communication networks. Nevertheless, the huge data volume along with stringent requirements on latency and reliability under limited bandwidth resources makes untethered wireless VR delivery increasingly challenging. Such bottlenecks, therefore, motivate this work to seek the potential of using semantic communication, a new paradigm that promises to significantly ease the resource pressure, for efficient VR delivery. To this end, we propose a novel framework, namely WIreless SEmantic deliveRy for VR (WiserVR), for delivering consecutive 360° video frames to VR users. Specifically, deep learning-based multiple modules are well-devised for the transceiver in WiserVR to realize high-performance feature extraction and semantic recovery. Among them, we dedicatedly develop a concept of semantic location graph and leverage the joint-semantic-channel-coding method with knowledge sharing to not only substantially reduce communication latency, but also to guarantee adequate transmission reliability and resilience under various channel states. Moreover, implementation of WiserVR is presented, followed by corresponding initial simulations for performance evaluation compared with benchmarks. Finally, we discuss several open issues and offer feasible solutions to unlock the full potential of WiserVR.
△ Less
Submitted 13 March, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Fast-U2++: Fast and Accurate End-to-End Speech Recognition in Joint CTC/Attention Frames
Authors:
Chengdong Liang,
Xiao-Lei Zhang,
BinBin Zhang,
Di Wu,
Shengqiang Li,
Xingchen Song,
Zhendong Peng,
Fu** Pan
Abstract:
Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small ch…
▽ More
Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version of U2++ to further reduce partial latency. The core idea of fast-U2++ is to output partial results of the bottom layers in its encoder with a small chunk, while using a large chunk in the top layers of its encoder to compensate the performance degradation caused by the small chunk. Moreover, we use knowledge distillation method to reduce the token emission latency. We present extensive experiments on Aishell-1 dataset. Experiments and ablation studies show that compared to U2++, fast-U2++ reduces model latency from 320ms to 80ms, and achieves a character error rate (CER) of 5.06% with a streaming setup.
△ Less
Submitted 2 November, 2022;
originally announced November 2022.
-
Reconfigurable Intelligent Surface: Power Consumption Modeling and Practical Measurement Validation
Authors:
**ghe Wang,
Wankai Tang,
**g Cheng Liang,
Lei Zhang,
Jun Yan Dai,
Xiao Li,
Shi **,
Qiang Cheng,
Tie Jun Cui
Abstract:
The reconfigurable intelligent surface (RIS) has received a lot of interest because of its capacity to reconfigure the wireless communication environment in a cost- and energy-efficient way. However, the realistic power consumption modeling and measurement validation of RIS has received far too little attention. Therefore, in this work, we model the power consumption of RIS and conduct measurement…
▽ More
The reconfigurable intelligent surface (RIS) has received a lot of interest because of its capacity to reconfigure the wireless communication environment in a cost- and energy-efficient way. However, the realistic power consumption modeling and measurement validation of RIS has received far too little attention. Therefore, in this work, we model the power consumption of RIS and conduct measurement validations using various RISs to fill this vacancy. Firstly, we propose a practical power consumption model of RIS. The RIS hardware is divided into three basic parts: the FPGA control board, the drive circuits, and the RIS unit cells. The power consumption of the first two parts is modeled as $P_{\text {static}}$ and that of the last part is modeled as $P_{\text {units}}$. Expressions of $P_{\text {static}}$ and $P_{\text {units}}$ vary amongst different types of RISs. Secondly, we conduct measurements on various RISs to validate the proposed model. Five different RISs including the PIN diode, varactor diode, and RF switch types are measured, and measurement results validate the generality and applicability of the proposed power consumption model of RIS. Finally, we summarize the measurement results and discuss the approaches to achieve the low-power-consumption design of RIS-assisted wireless communication systems.
△ Less
Submitted 6 February, 2024; v1 submitted 1 November, 2022;
originally announced November 2022.
-
Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit
Authors:
Hongji Wang,
Chengdong Liang,
Shuai Wang,
Zhengyang Chen,
Binbin Zhang,
Xu Xiang,
Yanlei Deng,
Yanmin Qian
Abstract:
Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speak…
▽ More
Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speaker embedding models, loss functions, and scoring back-ends, with highly competitive results achieved by structured recipes which were adopted in the winning systems in several speaker verification challenges. The application to other downstream tasks such as speaker diarization is also exhibited in the related recipe. Moreover, CPU- and GPU-compatible deployment codes are integrated for production-oriented development. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.
△ Less
Submitted 1 November, 2022; v1 submitted 30 October, 2022;
originally announced October 2022.
-
Deep Learning Based Stage-wise Two-dimensional Speaker Localization with Large Ad-hoc Microphone Arrays
Authors:
Shupei Liu,
Linfeng Feng,
Yijun Gong,
Chengdong Liang,
Chen Zhang,
Xiao-Lei Zhang,
Xuelong Li
Abstract:
While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distri…
▽ More
While deep-learning-based speaker localization has shown advantages in challenging acoustic environments, it often yields only direction-of-arrival (DOA) cues rather than precise two-dimensional (2D) coordinates. To address this, we propose a novel deep-learning-based 2D speaker localization method leveraging ad-hoc microphone arrays, where an ad-hoc microphone array is composed of randomly distributed microphone nodes, each of which is equipped with a traditional array. Specifically, we first employ convolutional neural networks at each node to estimate speaker directions. Then, we integrate these DOA estimates using triangulation and clustering techniques to get 2D speaker locations. To further boost the estimation accuracy, we introduce a node selection algorithm that strategically filters the most reliable nodes. Extensive experiments on both simulated and real-world data demonstrate that our approach significantly outperforms conventional methods. The proposed node selection further refines performance. The real-world dataset in the experiment, named Libri-adhoc-node10 which is a newly recorded data described for the first time in this paper, is online available at https://github.com/Liu-sp/Libri-adhoc-nodes10.
△ Less
Submitted 1 April, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor Segmentation
Authors:
Jianwei Lin,
Jiatai Lin,
Cheng Lu,
Hao Chen,
Huan Lin,
Bingchao Zhao,
Zhenwei Shi,
Bingjiang Qiu,
Xipeng Pan,
Zeyan Xu,
Biao Huang,
Changhong Liang,
Guoqiang Han,
Zaiyi Liu,
Chu Han
Abstract:
Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studie…
▽ More
Brain tumor segmentation (BTS) in magnetic resonance image (MRI) is crucial for brain tumor diagnosis, cancer management and research purposes. With the great success of the ten-year BraTS challenges as well as the advances of CNN and Transformer algorithms, a lot of outstanding BTS models have been proposed to tackle the difficulties of BTS in different technical aspects. However, existing studies hardly consider how to fuse the multi-modality images in a reasonable manner. In this paper, we leverage the clinical knowledge of how radiologists diagnose brain tumors from multiple MRI modalities and propose a clinical knowledge-driven brain tumor segmentation model, called CKD-TransBTS. Instead of directly concatenating all the modalities, we re-organize the input modalities by separating them into two groups according to the imaging principle of MRI. A dual-branch hybrid encoder with the proposed modality-correlated cross-attention block (MCCA) is designed to extract the multi-modality image features. The proposed model inherits the strengths from both Transformer and CNN with the local feature representation ability for precise lesion boundaries and long-range feature extraction for 3D volumetric images. To bridge the gap between Transformer and CNN features, we propose a Trans&CNN Feature Calibration block (TCFC) in the decoder. We compare the proposed model with five CNN-based models and six transformer-based models on the BraTS 2021 challenge dataset. Extensive experiments demonstrate that the proposed model achieves state-of-the-art brain tumor segmentation performance compared with all the competitors.
△ Less
Submitted 15 July, 2022;
originally announced July 2022.
-
Off-Network Communications For Future Railway Mobile Communication Systems: Challenges and Opportunities
Authors:
Jiewen Hu,
Gang Liu,
Yongbo Li,
Zheng Ma,
Wei Wang,
Chengchao Liang,
F. Richard Yu,
**zhi Fan
Abstract:
GSM-R is predicted to be obsoleted by 2030, and a suitable successor is needed. Defined by the International Union of Railways (UIC), the Future Railway Mobile Communication System (FRMCS) contains many future use cases with strict requirements. These use cases should ensure regular communication not only in network coverage but also uncovered scenarios. There is still a lack of standards on off-n…
▽ More
GSM-R is predicted to be obsoleted by 2030, and a suitable successor is needed. Defined by the International Union of Railways (UIC), the Future Railway Mobile Communication System (FRMCS) contains many future use cases with strict requirements. These use cases should ensure regular communication not only in network coverage but also uncovered scenarios. There is still a lack of standards on off-network communication in FRMCS, so this article focuses on off-network communication and intends to provide reference and direction for standardization. We first provide a comprehensive summary and analysis of off-network use cases in FRMCS. Then we give an overview of existing technologies (GSM-R, TETRA, DMR, LTE-V2X, and NR-V2X) that may support off-network communication. In addition, we simulate and evaluate the performance of existing technologies. Simulation results show that it is possible to satisfy the off-network communication requirements in FRMCS with enhancements based on LTE-V2X or NR-V2X. Finally, we give some future research directions to provide insights for industry and academia.
△ Less
Submitted 10 August, 2022; v1 submitted 18 June, 2022;
originally announced June 2022.
-
HoVer-Trans: Anatomy-aware HoVer-Transformer for ROI-free Breast Cancer Diagnosis in Ultrasound Images
Authors:
Yuhao Mo,
Chu Han,
Yu Liu,
Min Liu,
Zhenwei Shi,
Jiatai Lin,
Bingchao Zhao,
Chunwang Huang,
Bingjiang Qiu,
Yanfen Cui,
Lei Wu,
Xipeng Pan,
Zeyan Xu,
Xiaomei Huang,
Zaiyi Liu,
Ying Wang,
Changhong Liang
Abstract:
Ultrasonography is an important routine examination for breast cancer diagnosis, due to its non-invasive, radiation-free and low-cost properties. However, the diagnostic accuracy of breast cancer is still limited due to its inherent limitations. It would be a tremendous success if we can precisely diagnose breast cancer by breast ultrasound images (BUS). Many learning-based computer-aided diagnost…
▽ More
Ultrasonography is an important routine examination for breast cancer diagnosis, due to its non-invasive, radiation-free and low-cost properties. However, the diagnostic accuracy of breast cancer is still limited due to its inherent limitations. It would be a tremendous success if we can precisely diagnose breast cancer by breast ultrasound images (BUS). Many learning-based computer-aided diagnostic methods have been proposed to achieve breast cancer diagnosis/lesion classification. However, most of them require a pre-define ROI and then classify the lesion inside the ROI. Conventional classification backbones, such as VGG16 and ResNet50, can achieve promising classification results with no ROI requirement. But these models lack interpretability, thus restricting their use in clinical practice. In this study, we propose a novel ROI-free model for breast cancer diagnosis in ultrasound images with interpretable feature representations. We leverage the anatomical prior knowledge that malignant and benign tumors have different spatial relationships between different tissue layers, and propose a HoVer-Transformer to formulate this prior knowledge. The proposed HoVer-Trans block extracts the inter- and intra-layer spatial information horizontally and vertically. We conduct and release an open dataset GDPH&SYSUCC for breast cancer diagnosis in BUS. The proposed model is evaluated in three datasets by comparing with four CNN-based models and two vision transformer models via five-fold cross validation. It achieves state-of-the-art classification performance with the best model interpretability. In the meanwhile, our proposed model outperforms two senior sonographers on the breast cancer diagnosis when only one BUS image is given.
△ Less
Submitted 15 July, 2022; v1 submitted 17 May, 2022;
originally announced May 2022.
-
Interface Networks for Failure Localization in Power Systems
Authors:
Chen Liang,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface…
▽ More
Transmission power systems usually consist of interconnected sub-grids that are operated relatively independently. When a failure happens, it is desirable to localize its impact within the sub-grid where the failure occurs. This paper introduces three interface networks to connect sub-grids, achieving better failure localization while maintaining robust network connectivity. The proposed interface networks are validated with numerical experiments on the IEEE 118-bus test network under both DC and AC power flow models.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma
Authors:
Chu Han,
Xipeng Pan,
Lixu Yan,
Huan Lin,
Bingbing Li,
Su Yao,
Shanshan Lv,
Zhenwei Shi,
**hai Mai,
Jiatai Lin,
Bingchao Zhao,
Zeyan Xu,
Zhizhen Wang,
Yumeng Wang,
Yuan Zhang,
Huihui Wang,
Chao Zhu,
Chunhui Lin,
Lijian Mao,
Min Wu,
Luwen Duan,
**gsong Zhu,
Dong Hu,
Zijie Fang,
Yang Chen
, et al. (18 additional authors not shown)
Abstract:
Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient…
▽ More
Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient pixel-level annotations, which is time-consuming and expensive. To enrich the label resources of LUAD and to alleviate the annotation efforts, we organize this challenge WSSS4LUAD to call for the outstanding weakly-supervised semantic segmentation (WSSS) techniques for histopathology images of LUAD. Participants have to design the algorithm to segment tumor epithelial, tumor-associated stroma and normal tissue with only patch-level labels. This challenge includes 10,091 patch-level annotations (the training set) and over 130 million labeled pixels (the validation and test sets), from 87 WSIs (67 from GDPH, 20 from TCGA). All the labels were generated by a pathologist-in-the-loop pipeline with the help of AI models and checked by the label review board. Among 532 registrations, 28 teams submitted the results in the test phase with over 1,000 submissions. Finally, the first place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919). According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS. Cutmix data augmentation has been widely adopted to generate more reliable samples. With the success of this challenge, we believe that WSSS approaches with patch-level annotations can be a complement to the traditional pixel annotations while reducing the annotation efforts. The entire dataset has been released to encourage more researches on computational pathology in LUAD and more novel WSSS techniques.
△ Less
Submitted 13 April, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
RestainNet: a self-supervised digital re-stainer for stain normalization
Authors:
Bingchao Zhao,
Jiatai Lin,
Changhong Liang,
Zongjian Yi,
Xin Chen,
Bingbing Li,
Weihao Qiu,
Danyi Li,
Li Liang,
Chu Han,
Zaiyi Liu
Abstract:
Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this p…
▽ More
Color inconsistency is an inevitable challenge in computational pathology, which generally happens because of stain intensity variations or sections scanned by different scanners. It harms the pathological image analysis methods, especially the learning-based models. A series of approaches have been proposed for stain normalization. However, most of them are lack flexibility in practice. In this paper, we formulated stain normalization as a digital re-staining process and proposed a self-supervised learning model, which is called RestainNet. Our network is regarded as a digital restainer which learns how to re-stain an unstained (grayscale) image. Two digital stains, Hematoxylin (H) and Eosin (E) were extracted from the original image by Beer-Lambert's Law. We proposed a staining loss to maintain the correctness of stain intensity during the restaining process. Thanks to the self-supervised nature, paired training samples are no longer necessary, which demonstrates great flexibility in practical usage. Our RestainNet outperforms existing approaches and achieves state-of-the-art performance with regard to color correctness and structure preservation. We further conducted experiments on the segmentation and classification tasks and the proposed RestainNet achieved outstanding performance compared with SOTA methods. The self-supervised design allows the network to learn any staining style with no extra effort.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation
Authors:
Chen Liang,
Chong Yang,
**g Xu,
Juyang Huang,
Yongliang Wang,
Yang Dong
Abstract:
Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains…
▽ More
Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer.
△ Less
Submitted 23 December, 2021;
originally announced December 2021.
-
PDBL: Improving Histopathological Tissue Classification with Plug-and-Play Pyramidal Deep-Broad Learning
Authors:
Jiatai Lin,
Guoqiang Han,
Xipeng Pan,
Hao Chen,
Danyi Li,
Xi** Jia,
Zhenwei Shi,
Zhizhen Wang,
Yanfen Cui,
Haiming Li,
Changhong Liang,
Li Liang,
Zaiyi Liu,
Chu Han
Abstract:
Histopathological tissue classification is a fundamental task in pathomics cancer research. Precisely differentiating different tissue types is a benefit for the downstream researches, like cancer diagnosis, prognosis and etc. Existing works mostly leverage the popular classification backbones in computer vision to achieve histopathological tissue classification. In this paper, we proposed a super…
▽ More
Histopathological tissue classification is a fundamental task in pathomics cancer research. Precisely differentiating different tissue types is a benefit for the downstream researches, like cancer diagnosis, prognosis and etc. Existing works mostly leverage the popular classification backbones in computer vision to achieve histopathological tissue classification. In this paper, we proposed a super lightweight plug-and-play module, named Pyramidal Deep-Broad Learning (PDBL), for any well-trained classification backbone to further improve the classification performance without a re-training burden. We mimic how pathologists observe pathology slides in different magnifications and construct an image pyramid for the input image in order to obtain the pyramidal contextual information. For each level in the pyramid, we extract the multi-scale deep-broad features by our proposed Deep-Broad block (DB-block). We equipped PDBL in three popular classification backbones, ShuffLeNetV2, EfficientNetb0, and ResNet50 to evaluate the effectiveness and efficiency of our proposed module on two datasets (Kather Multiclass Dataset and the LC25000 Dataset). Experimental results demonstrate the proposed PDBL can steadily improve the tissue-level classification performance for any CNN backbones, especially for the lightweight models when given a small among of training samples (less than 10%), which greatly saves the computational time and annotation efforts.
△ Less
Submitted 4 November, 2021;
originally announced November 2021.
-
Multi-Layer Pseudo-Supervision for Histopathology Tissue Semantic Segmentation using Patch-level Classification Labels
Authors:
Chu Han,
Jiatai Lin,
**hai Mai,
Yi Wang,
Qingling Zhang,
Bingchao Zhao,
Xin Chen,
Xipeng Pan,
Zhenwei Shi,
Xiaowei Xu,
Su Yao,
Lixu Yan,
Huan Lin,
Zeyan Xu,
Xiaomei Huang,
Guoqiang Han,
Changhong Liang,
Zaiyi Liu
Abstract:
Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on hi…
▽ More
Tissue-level semantic segmentation is a vital step in computational pathology. Fully-supervised models have already achieved outstanding performance with dense pixel-level annotations. However, drawing such labels on the giga-pixel whole slide images is extremely expensive and time-consuming. In this paper, we use only patch-level classification labels to achieve tissue semantic segmentation on histopathology images, finally reducing the annotation efforts. We proposed a two-step model including a classification and a segmentation phases. In the classification phase, we proposed a CAM-based model to generate pseudo masks by patch-level labels. In the segmentation phase, we achieved tissue semantic segmentation by our proposed Multi-Layer Pseudo-Supervision. Several technical novelties have been proposed to reduce the information gap between pixel-level and patch-level annotations. As a part of this paper, we introduced a new weakly-supervised semantic segmentation (WSSS) dataset for lung adenocarcinoma (LUAD-HistoSeg). We conducted several experiments to evaluate our proposed model on two datasets. Our proposed model outperforms two state-of-the-art WSSS approaches. Note that we can achieve comparable quantitative and qualitative results with the fully-supervised model, with only around a 2\% gap for MIoU and FwIoU. By comparing with manual labeling, our model can greatly save the annotation time from hours to minutes. The source code is available at: \url{https://github.com/ChuHan89/WSSS-Tissue}.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
Authors:
Chengdong Liang,
Yijiang Chen,
Jiadi Yao,
Xiao-Lei Zhang
Abstract:
Speaker verification based on ad-hoc microphone arrays has the potential of reducing the error significantly in adverse acoustic environments. However, existing approaches extract utterance-level speaker embeddings from each channel of an ad-hoc microphone array, which does not consider fully the spatial-temporal information across the devices. In this paper, we propose to aggregate the multichann…
▽ More
Speaker verification based on ad-hoc microphone arrays has the potential of reducing the error significantly in adverse acoustic environments. However, existing approaches extract utterance-level speaker embeddings from each channel of an ad-hoc microphone array, which does not consider fully the spatial-temporal information across the devices. In this paper, we propose to aggregate the multichannel signals of the ad-hoc microphone array at the frame-level by exploring the cross-channel information deeply with two attention mechanisms. The first one is a self-attention method. It consists of a cross-frame self-attention layer and a cross-channel self-attention layer successively, both working at the frame level. The second one learns the cross-frame and cross-channel information via two graph attention layers. Experimental results demonstrate that the proposed methods reach the state-of-the-art performance. Moreover, the graph-attention method is better than the self-attention method in most cases.
△ Less
Submitted 28 March, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
AUC Optimization for Robust Small-footprint Keyword Spotting with Limited Training Data
Authors:
Menglong Xu,
Shengqiang Li,
Chengdong Liang,
Xiao-Lei Zhang
Abstract:
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without…
▽ More
Deep neural networks provide effective solutions to small-footprint keyword spotting (KWS). However, if training data is limited, it remains challenging to achieve robust and highly accurate KWS in real-world scenarios where unseen sounds that are out of the training data are frequently encountered. Most conventional methods aim to maximize the classification accuracy on the training set, without taking the unseen sounds into account. To enhance the robustness of the deep neural networks based KWS, in this paper, we introduce a new loss function, named the maximization of the area under the receiver-operating-characteristic curve (AUC). The proposed method not only maximizes the classification accuracy of keywords on the closed training set, but also maximizes the AUC score for optimizing the performance of non-keyword segments detection. Experimental results on the Google Speech Commands dataset v1 and v2 show that our method achieves new state-of-the-art performance in terms of most evaluation metrics.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Attention-based multi-channel speaker verification with ad-hoc microphone arrays
Authors:
Chengdong Liang,
Junqi Chen,
Shanzheng Guan,
Xiao-Lei Zhang
Abstract:
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel s…
▽ More
Recently, ad-hoc microphone array has been widely studied. Unlike traditional microphone array settings, the spatial arrangement and number of microphones of ad-hoc microphone arrays are not known in advance, which hinders the adaptation of traditional speaker verification technologies to ad-hoc microphone arrays. To overcome this weakness, in this paper, we propose attention-based multi-channel speaker verification with ad-hoc microphone arrays. Specifically, we add an inter-channel processing layer and a global fusion layer after the pooling layer of a single-channel speaker verification system. The inter-channel processing layer applies a so-called residual self-attention along the channel dimension for allocating weights to different microphones. The global fusion layer integrates all channels in a way that is independent to the number of the input channels. We further replace the softmax operator in the residual self-attention with sparsemax, which forces the channel weights of very noisy channels to zero. Experimental results with ad-hoc microphone arrays of over 30 channels demonstrate the effectiveness of the proposed methods. For example, the multi-channel speaker verification with sparsemax achieves an equal error rate (EER) of over 20% lower than oracle one-best system on semi-real data sets, and over 30% lower on simulation data sets, in test scenarios with both matched and mismatched channel numbers.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
A Spectral Representation of Power Systems with Applications to Adaptive Grid Partitioning and Cascading Failure Localization
Authors:
Alessandro Zocca,
Chen Liang,
Linqi Guo,
Steven H. Low,
Adam Wierman
Abstract:
Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing betw…
▽ More
Transmission line failures in power systems propagate and cascade non-locally. This well-known yet counter-intuitive feature makes it even more challenging to optimally and reliably operate these complex networks. In this work we present a comprehensive framework based on spectral graph theory that fully and rigorously captures how multiple simultaneous line failures propagate, distinguishing between non-cut and cut set outages. Using this spectral representation of power systems, we identify the crucial graph sub-structure that ensures line failure localization -- the network bridge-block decomposition. Leveraging this theory, we propose an adaptive network topology reconfiguration paradigm that uses a two-stage algorithm where the first stage aims to identify optimal clusters using the notion of network modularity and the second stage refines the clusters by means of optimal line switching actions. Our proposed methodology is illustrated using extensive numerical examples on standard IEEE networks and we discussed several extensions and variants of the proposed algorithm.
△ Less
Submitted 11 May, 2021;
originally announced May 2021.
-
Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
Authors:
Chengdong Liang,
Menglong Xu,
Xiao-Lei Zhang
Abstract:
Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship bet…
▽ More
Self-attention (SA), which encodes vector sequences according to their pairwise similarity, is widely used in speech recognition due to its strong context modeling ability. However, when applied to long sequence data, its accuracy is reduced. This is caused by the fact that its weighted average operator may lead to the dispersion of the attention distribution, which results in the relationship between adjacent signals ignored. To address this issue, in this paper, we introduce relative-position-awareness self-attention (RPSA). It not only maintains the global-range dependency modeling ability of self-attention, but also improves the localness modeling ability. Because the local window length of the original RPSA is fixed and sensitive to different test data, here we propose Gaussian-based self-attention (GSA) whose window length is learnable and adaptive to the test data automatically. We further generalize GSA to a new residual Gaussian self-attention (resGSA) for the performance improvement. We apply RPSA, GSA, and resGSA to Transformer-based speech recognition respectively. Experimental results on the AISHELL-1 Mandarin speech recognition corpus demonstrate the effectiveness of the proposed methods. For example, the resGSA-Transformer achieves a character error rate (CER) of 5.86% on the test set, which is relative 7.8% lower than that of the SA-Transformer. Although the performance of the proposed resGSA-Transformer is only slightly better than that of the RPSA-Transformer, it does not have to tune the window length manually.
△ Less
Submitted 8 October, 2021; v1 submitted 29 March, 2021;
originally announced March 2021.
-
Wireless sensor network for in situ soil moisture monitoring
Authors:
Jianing Fang,
Chuheng Hu,
Nour Smaoui,
Doug Carlson,
Jayant Gupchup,
Razvan Musaloiu-E.,
Chieh-Jan Mike Liang,
Marcus Chang,
Omprakash Gnawali,
Tamas Budavari,
Andreas Terzis,
Katalin Szlavecz,
Alexander S. Szalay
Abstract:
We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environm…
▽ More
We discuss the history and lessons learned from a series of deployments of environmental sensors measuring soil parameters and CO2 fluxes over the last fifteen years, in an outdoor environment. We present the hardware and software architecture of our current Gen-3 system, and then discuss how we are simplifying the user facing part of the software, to make it easier and friendlier for the environmental scientist to be in full control of the system. Finally, we describe the current effort to build a large-scale Gen-4 sensing platform consisting of hundreds of nodes to track the environmental parameters for urban green spaces in Baltimore, Maryland.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
Raw Image Deblurring
Authors:
Chih-Hung Liang,
Yu-An Chen,
Yueh-Cheng Liu,
Winston H. Hsu
Abstract:
Deep learning-based blind image deblurring plays an essential role in solving image blur since all existing kernels are limited in modeling the real world blur. Thus far, researchers focus on powerful models to handle the deblurring problem and achieve decent results. For this work, in a new aspect, we discover the great opportunity for image enhancement (e.g., deblurring) directly from RAW images…
▽ More
Deep learning-based blind image deblurring plays an essential role in solving image blur since all existing kernels are limited in modeling the real world blur. Thus far, researchers focus on powerful models to handle the deblurring problem and achieve decent results. For this work, in a new aspect, we discover the great opportunity for image enhancement (e.g., deblurring) directly from RAW images and investigate novel neural network structures benefiting RAW-based learning. However, to the best of our knowledge, there is no available RAW image deblurring dataset. Therefore, we built a new dataset containing both RAW images and processed sRGB images and design a new model to utilize the unique characteristics of RAW images. The proposed deblurring model, trained solely from RAW images, achieves the state-of-art performance and outweighs those trained on processed sRGB images. Furthermore, with fine-tuning, the proposed model, trained on our new dataset, can generalize to other sensors. Additionally, by a series of experiments, we demonstrate that existing deblurring models can also be improved by training on the RAW images in our new dataset. Ultimately, we show a new venue for further opportunities based on the devised novel raw-based deblurring method and the brand-new Deblur-RAW dataset.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.
-
Line Failure Localization of Power Networks Part II: Cut Set Outages
Authors:
Linqi Guo,
Chen Liang,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of…
▽ More
Transmission line failure in power systems prop-agate non-locally, making the control of the resulting outages extremely difficult. In Part II of this paper, we continue the study of line failure localizability in transmission networks and characterize the impact of cut set outages. We establish a Simple Path Criterion, showing that the propagation pattern due to bridge outages, a special case of cut set failures, are fully determined by the positions in the network of the buses that participate in load balancing. We then extend our results to general cut set outages. In contrast to non-cut outages discussed in Part I whose subsequent line failures are contained within the original blocks, cut set outages typically impact the whole network, affecting the power flows on all remaining lines. We corroborate our analytical results in both parts using the IEEE 118-bus test system, in which the failure propagation patterns exhibit a clear block-diagonal structure predicted by our theory, even when using full AC power flow equations.
△ Less
Submitted 23 April, 2021; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Adaptive Network Response to Line Failures in Power Systems
Authors:
Chen Liang,
Linqi Guo,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the…
▽ More
Transmission line failures in power systems propagate and cascade non-locally. In this work, we propose an adaptive control strategy that offers strong guarantees in both the mitigation and localization of line failures. Specifically, we leverage the properties of network bridge-block decomposition and a frequency regulation method called the unified control. If the balancing areas over which the unified control operates coincide with the bridge-blocks of the network, the proposed strategy drives the post-contingency system to a steady state where the impact of initial line outages is localized within the areas where they occurred whenever possible, stop** the cascading process. When the initial line outages cannot be localized, the proposed control strategy provides a configurable design that progressively involves and coordinates more balancing areas. We compare the proposed control strategy with the classical Automatic Generation Control (AGC) on the IEEE 118-bus and 2736-bus test networks. Simulation results show that our strategy greatly improves overall reliability in terms of the N-k security standard, and localizes the impact of initial failures in the majority of the simulated contingencies. Moreover, the proposed framework incurs significantly less load loss, if any, compared to AGC, in all our case studies.
△ Less
Submitted 12 May, 2022; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Line Failure Localization of Power Networks Part I: Non-cut Outages
Authors:
Linqi Guo,
Chen Liang,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms…
▽ More
Transmission line failures in power systems propagate non-locally, making the control of the resulting outages extremely difficult. In this work, we establish a mathematical theory that characterizes the patterns of line failure propagation and localization in terms of network graph structure. It provides a novel perspective on distribution factors that precisely captures Kirchhoff's Law in terms of topological structures. Our results show that the distribution of specific collections of subtrees of the transmission network plays a critical role on the patterns of power redistribution, and motivates the block decomposition of the transmission network as a structure to understand long-distance propagation of disturbances. In Part I of this paper, we present the case when the post-contingency network remains connected after an initial set of lines are disconnected simultaneously. In Part II, we present the case when an outage separates the network into multiple islands.
△ Less
Submitted 23 April, 2021; v1 submitted 20 May, 2020;
originally announced May 2020.
-
An Integrated Approach for Failure Mitigation & Localization in Power Systems
Authors:
Chen Liang,
Linqi Guo,
Alessandro Zocca,
Shuyue Yu,
Steven H. Low,
Adam Wierman
Abstract:
The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the contr…
▽ More
The transmission grid is often comprised of several control areas that are connected by multiple tie lines in a mesh structure for reliability. It is also well-known that line failures can propagate non-locally and redundancy can exacerbate cascading. In this paper, we propose an integrated approach to grid reliability that (i) judiciously switches off a small number of tie lines so that the control areas are connected in a tree structure; and (ii) leverages a unified frequency control paradigm to provide congestion management in real time. Even though the proposed topology reduces redundancy, the integration of tree structure at regional level and real-time congestion management can provide stronger guarantees on failure localization and mitigation. We illustrate our approach on the IEEE 39-bus network and evaluate its performance on the IEEE 118-bus, 179-bus, 200-bus and 240-bus networks with various network congestion conditions. Simulations show that, compared with the traditional approach, our approach not only prevents load shedding in more failure scenarios, but also incurs smaller amounts of load loss in scenarios where load shedding is inevitable. Moreover, generators under our approach adjust their operations more actively and efficiently in a local manner.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Predefined-time Terminal Sliding Mode Control of Robot Manipulators
Authors:
Chang-Duo Liang,
Ming-Feng Ge,
Zhi-Wei Liu,
Yan-Wu Wang,
Hamid Reza Karimi
Abstract:
In this paper, we present a new terminal sliding mode control to achieve predefined-time stability of robot manipulators. The proposed control is developed based on a novel predefined-time terminal sliding mode (PTSM) surface, on which the states are forced to reach the origin in a predefined time, i.e., the settling time is independent to the initial condition and can be explicitly user-defined v…
▽ More
In this paper, we present a new terminal sliding mode control to achieve predefined-time stability of robot manipulators. The proposed control is developed based on a novel predefined-time terminal sliding mode (PTSM) surface, on which the states are forced to reach the origin in a predefined time, i.e., the settling time is independent to the initial condition and can be explicitly user-defined via adjusting some specific parameters called the predefined-time parameters. It is also demonstrated that the proposed control can provide satisfactory steady-state performance in the case of both external disturbances and parametric uncertainties. Besides, we present a formal systemic analysis method to derive the sufficient conditions for guaranteeing the predefined-time convergence of the closed-loop system. Finally, the effectiveness and performance of the presented control scheme are illustrated through both theoretical comparisons and numerical simulations.
△ Less
Submitted 25 April, 2020; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Neural Architecture Search on Acoustic Scene Classification
Authors:
Jixiang Li,
Chuming Liang,
Bo Zhang,
Zhao Wang,
Fei Xiang,
Xiangxiang Chu
Abstract:
Convolutional neural networks are widely adopted in Acoustic Scene Classification (ASC) tasks, but they generally carry a heavy computational burden. In this work, we propose a lightweight yet high-performing baseline network inspired by MobileNetV2, which replaces square convolutional kernels with unidirectional ones to extract features alternately in temporal and frequency dimensions. Furthermor…
▽ More
Convolutional neural networks are widely adopted in Acoustic Scene Classification (ASC) tasks, but they generally carry a heavy computational burden. In this work, we propose a lightweight yet high-performing baseline network inspired by MobileNetV2, which replaces square convolutional kernels with unidirectional ones to extract features alternately in temporal and frequency dimensions. Furthermore, we explore a dynamic architecture space built on the basis of the proposed baseline with the recent Neural Architecture Search (NAS) paradigm, which first trains a supernet that incorporates all candidate networks and then applies a well-known evolutionary algorithm NSGA-II to discover more efficient networks with higher accuracy and lower computational cost. Experimental results demonstrate that our searched network is competent in ASC tasks, which achieves 90.3% F1-score on the DCASE2018 task 5 evaluation set, marking a new state-of-the-art performance while saving 25% of FLOPs compared to our baseline network.
△ Less
Submitted 5 August, 2020; v1 submitted 30 December, 2019;
originally announced December 2019.
-
Handheld Multi-Frame Super-Resolution
Authors:
Bartlomiej Wronski,
Ignacio Garcia-Dorado,
Manfred Ernst,
Damien Kelly,
Michael Krainin,
Chia-Kai Liang,
Marc Levoy,
Peyman Milanfar
Abstract:
Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-f…
▽ More
Compared to DSLR cameras, smartphone cameras have smaller sensors, which limits their spatial resolution; smaller apertures, which limits their light gathering ability; and smaller pixels, which reduces their signal-to noise ratio. The use of color filter arrays (CFAs) requires demosaicing, which further degrades resolution. In this paper, we supplant the use of traditional demosaicing in single-frame and burst photography pipelines with a multiframe super-resolution algorithm that creates a complete RGB image directly from a burst of CFA raw images. We harness natural hand tremor, typical in handheld photography, to acquire a burst of raw frames with small offsets. These frames are then aligned and merged to form a single image with red, green, and blue values at every pixel site. This approach, which includes no explicit demosaicing step, serves to both increase image resolution and boost signal to noise ratio. Our algorithm is robust to challenging scene conditions: local motion, occlusion, or scene changes. It runs at 100 milliseconds per 12-megapixel RAW input burst frame on mass-produced mobile phones. Specifically, the algorithm is the basis of the Super-Res Zoom feature, as well as the default merge method in Night Sight mode (whether zooming or not) on Google's flagship phone.
△ Less
Submitted 16 February, 2021; v1 submitted 8 May, 2019;
originally announced May 2019.
-
Learning to Guide: Guidance Law Based on Deep Meta-learning and Model Predictive Path Integral Control
Authors:
Chen Liang,
Weihong Wang,
Zhenghua Liu,
Chao Lai,
Benchun Zhou
Abstract:
In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However the traditional MPPI framework assumes the actual environment similar to the train…
▽ More
In this paper, we present a novel guidance scheme based on model-based deep reinforcement learning (RL) technique. With model-based deep RL method, a deep neural network is trained as a predictive model of guidance dynamics which is incorporated into a model predictive path integral (MPPI) control framework. However the traditional MPPI framework assumes the actual environment similar to the training dataset for the deep neural network which is impractical in practice with different maneuvering of target, other perturbations and actuator failures. To address this problem, our method utilize meta-learning technique to make the deep neural dynamics model adapt to such changes online. With this approach we can alleviate the performance deterioration of standard MPPI control caused by the difference between actual environment and training data. Then, a novel guidance law for a varying velocity interceptor intercepting maneuvering target with desired terminal impact angle under actuator failure is constructed based on aforementioned techniques. Simulation and experiment results under different cases show the effectiveness and robustness of the proposed guidance law in achieving successful interceptions of maneuvering target.
△ Less
Submitted 15 April, 2019;
originally announced April 2019.
-
Less is More: Real-time Failure Localization in Power Systems
Authors:
Linqi Guo,
Chen Liang,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Cascading failures in power systems exhibit non-local propagation patterns which make the analysis and mitigation of failures difficult. In this work, we propose a distributed control framework inspired by the recently proposed concepts of unified controller and network tree-partition that offers strong guarantees in both the mitigation and localization of cascading failures in power systems. In t…
▽ More
Cascading failures in power systems exhibit non-local propagation patterns which make the analysis and mitigation of failures difficult. In this work, we propose a distributed control framework inspired by the recently proposed concepts of unified controller and network tree-partition that offers strong guarantees in both the mitigation and localization of cascading failures in power systems. In this framework, the transmission network is partitioned into several control areas which are connected in a tree structure, and the unified controller is adopted by generators or controllable loads for fast timescale disturbance response. After an initial failure, the proposed strategy always prevents successive failures from happening, and regulates the system to the desired steady state where the impact of initial failures are localized as much as possible. For extreme failures that cannot be localized, the proposed framework has a configurable design, that progressively involves and coordinates more control areas for failure mitigation and, as a last resort, imposes minimal load shedding. We compare the proposed control framework with Automatic Generation Control (AGC) on the IEEE 118-bus test system. Simulation results show that our novel framework greatly improves the system robustness in terms of the N-1 security standard, and localizes the impact of initial failures in majority of the load profiles that are examined. Moreover, the proposed framework incurs significantly less load loss, if any, compared to AGC, in all of our case studies.
△ Less
Submitted 18 April, 2019; v1 submitted 10 April, 2019;
originally announced April 2019.
-
Microwave Integrated Circuits Design with Relational Induction Neural Network
Authors:
Jie Liu,
Zhi-Xi Chen,
Wen-Hui Dong,
Xiao Wang,
Jia Shi,
Hong-Liang Teng,
Xi-Wang Dai,
Stephen S. -T. Yau,
Chang-Hong Liang,
**-Fa Feng
Abstract:
The automation design of microwave integrated circuits (MWIC) has long been viewed as a fundamental challenge for artificial intelligence owing to its larger solution space and structural complexity than Go. Here, we developed a novel artificial agent, termed Relational Induction Neural Network, that can lead to an automotive design of MWIC and avoid brute-force computing to examine every possible…
▽ More
The automation design of microwave integrated circuits (MWIC) has long been viewed as a fundamental challenge for artificial intelligence owing to its larger solution space and structural complexity than Go. Here, we developed a novel artificial agent, termed Relational Induction Neural Network, that can lead to an automotive design of MWIC and avoid brute-force computing to examine every possible solution, which is a significant breakthrough in the field of electronics. Through the experiments on microwave transmission line circuit, filter circuit and antenna circuit design tasks, strongly competitive results are obtained respectively. Compared with the traditional reinforcement learning method, the learning curve shows that the proposed architecture is able to quickly converge to the pre-designed MWIC model and the convergence rate is up to four orders of magnitude. This is the first study which has been shown that an agent through training or learning to automatically induct the relationship between MWIC's structures without incorporating any of the additional prior knowledge. Notably, the relationship can be explained in terms of the MWIC theory and electromagnetic field distribution. Our work bridges the divide between artificial intelligence and MWIC and can extend to mechanical wave, mechanics and other related fields.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision
Authors:
**g-Xuan Zhang,
Zhen-Hua Ling,
Yuan Jiang,
Li-Juan Liu,
Chen Liang,
Li-Rong Dai
Abstract:
This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text tra…
▽ More
This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.
△ Less
Submitted 20 November, 2018;
originally announced November 2018.
-
Failure Localization in Power Systems via Tree Partitions
Authors:
Linqi Guo,
Chen Liang,
Alessandro Zocca,
Steven H. Low,
Adam Wierman
Abstract:
Cascading failures in power systems propagate non-locally, making the control and mitigation of outages extremely hard. In this work, we use the emerging concept of the tree partition of transmission networks to provide an analytical characterization of line failure localizability in transmission systems. Our results rigorously establish the well perceived intuition in power community that failure…
▽ More
Cascading failures in power systems propagate non-locally, making the control and mitigation of outages extremely hard. In this work, we use the emerging concept of the tree partition of transmission networks to provide an analytical characterization of line failure localizability in transmission systems. Our results rigorously establish the well perceived intuition in power community that failures cannot cross bridges, and reveal a finer-grained concept that encodes more precise information on failure propagations within tree-partition regions. Specifically, when a non-bridge line is tripped, the impact of this failure only propagates within well-defined components, which we refer to as cells, of the tree partition defined by the bridges. In contrast, when a bridge line is tripped, the impact of this failure propagates globally across the network, affecting the power flow on all remaining transmission lines. This characterization suggests that it is possible to improve the system robustness by temporarily switching off certain transmission lines, so as to create more, smaller components in the tree partition; thus spatially localizing line failures and making the grid less vulnerable to large-scale outages. We illustrate this approach using the IEEE 118-bus test system and demonstrate that switching off a negligible portion of transmission lines allows the impact of line failures to be significantly more localized without substantial changes in line congestion.
△ Less
Submitted 16 August, 2018; v1 submitted 22 March, 2018;
originally announced March 2018.