Search | arXiv e-print repository

Enhancing Diagnostic Reliability of Foundation Model with Uncertainty Estimation in OCT Images

Authors: Yuanyuan Peng, Aidi Lin, Meng Wang, Tian Lin, Ke Zou, Yinglin Cheng, Tingkun Shi, Xulong Liao, Lixia Feng, Zhen Liang, Xinjian Chen, Huazhu Fu, Haoyu Chen

Abstract: Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RE… ▽ More Inability to express the confidence level and detect unseen classes has limited the clinical implementation of artificial intelligence in the real-world. We developed a foundation model with uncertainty estimation (FMUE) to detect 11 retinal conditions on optical coherence tomography (OCT). In the internal test set, FMUE achieved a higher F1 score of 96.76% than two state-of-the-art algorithms, RETFound and UIOS, and got further improvement with thresholding strategy to 98.44%. In the external test sets obtained from other OCT devices, FMUE achieved an accuracy of 88.75% and 92.73% before and after thresholding. Our model is superior to two ophthalmologists with a higher F1 score (95.17% vs. 61.93% &71.72%). Besides, our model correctly predicts high uncertainty scores for samples with ambiguous features, of non-target-category diseases, or with low-quality to prompt manual checks and prevent misdiagnosis. FMUE provides a trustworthy method for automatic retinal anomalies detection in the real-world clinical open set environment. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: All codes are available at https://github.com/yuanyuanpeng0129/FMUE

arXiv:2406.11636 [pdf, other]

Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities

Authors: Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse, David Menon, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous database… ▽ More Segmentation models for brain lesions in MRI are commonly developed for a specific disease and trained on data with a predefined set of MRI modalities. Each such model cannot segment the disease using data with a different set of MRI modalities, nor can it segment any other type of disease. Moreover, this training paradigm does not allow a model to benefit from learning from heterogeneous databases that may contain scans and segmentation labels for different types of brain pathologies and diverse sets of MRI modalities. Is it feasible to use Federated Learning (FL) for training a single model on client databases that contain scans and labels of different brain pathologies and diverse sets of MRI modalities? We demonstrate promising results by combining appropriate, simple, and practical modifications to the model and training strategy: Designing a model with input channels that cover the whole set of modalities available across clients, training with random modality drop, and exploring the effects of feature normalization methods. Evaluation on 7 brain MRI databases with 5 different diseases shows that such FL framework can train a single model that is shown to be very promising in segmenting all disease types seen during training. Importantly, it is able to segment these diseases in new databases that contain sets of modalities different from those in training clients. These results demonstrate, for the first time, feasibility and effectiveness of using FL to train a single segmentation model on decentralised data with diverse brain diseases and MRI modalities, a necessary step towards leveraging heterogeneous real-world databases. Code will be made available at: https://github.com/FelixWag/FL-MultiDisease-MRI △ Less

Submitted 17 June, 2024; originally announced June 2024.

ACM Class: I.4.9; I.4.6; I.2.11; I.4.0

arXiv:2406.09989 [pdf, other]

Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, **gzhe Lin, Jialin Wang, Quanying Liu

Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a platform equipped with a system identification module and a control strategy module, to validate the effectiveness of the hub of the epileptic brain network in suppressing seizure. The identified surrogate dynamics show high predictive performance in reconstructing neural dynamics which enables the model predictive framework to achieve accurate neural stimulation. The electrical stimulation on the hub of the epileptic brain network shows remarkable performance as the direct stimulation of SOZ in suppressing seizure dynamics. Underpinned by network control theory, our platform offers a general tool for the validation of neural stimulation. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08052 [pdf, other]

FakeSound: Deepfake General Audio Detection

Authors: Zeyu Xie, Baihan Li, Xuenan Xu, Zheng Liang, Kai Yu, Mengyue Wu

Abstract: With the advancement of audio generation, generative models can produce highly realistic audios. However, the proliferation of deepfake general audio can pose negative consequences. Therefore, we propose a new task, deepfake general audio detection, which aims to identify whether audio content is manipulated and to locate deepfake regions. Leveraging an automated manipulation pipeline, a dataset n… ▽ More With the advancement of audio generation, generative models can produce highly realistic audios. However, the proliferation of deepfake general audio can pose negative consequences. Therefore, we propose a new task, deepfake general audio detection, which aims to identify whether audio content is manipulated and to locate deepfake regions. Leveraging an automated manipulation pipeline, a dataset named FakeSound for deepfake general audio detection is proposed, and samples can be viewed on website https://FakeSoundData.github.io. The average binary accuracy of humans on all test sets is consistently below 0.6, which indicates the difficulty humans face in discerning deepfake audio and affirms the efficacy of the FakeSound dataset. A deepfake detection model utilizing a general audio pre-trained model is proposed as a benchmark system. Experimental results demonstrate that the performance of the proposed model surpasses the state-of-the-art in deepfake speech detection and human testers. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

MSC Class: 68Txx ACM Class: I.2

arXiv:2406.02422 [pdf, other]

IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

Authors: Ziyun Liang, Xiaoqing Guo, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They inten… ▽ More Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2 △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.11163 [pdf, other]

Domain Generalization for Zero-calibration BCIs with Knowledge Distillation-based Phase Invariant Feature Extraction

Authors: Zilin Liang, Zheng Zheng, Weihai Chen, Xinzhi Ma, Zhongcai Pei, Xiantao Sun

Abstract: The distribution shift of electroencephalography (EEG) data causes poor generalization of braincomputer interfaces (BCIs) in unseen domains. Some methods try to tackle this challenge by collecting a portion of user data for calibration. However, it is time-consuming, mentally fatiguing, and user-unfriendly. To achieve zerocalibration BCIs, most studies employ domain generalization (DG) techniques… ▽ More The distribution shift of electroencephalography (EEG) data causes poor generalization of braincomputer interfaces (BCIs) in unseen domains. Some methods try to tackle this challenge by collecting a portion of user data for calibration. However, it is time-consuming, mentally fatiguing, and user-unfriendly. To achieve zerocalibration BCIs, most studies employ domain generalization (DG) techniques to learn invariant features across different domains in the training set. However, they fail to fully explore invariant features within the same domain, leading to limited performance. In this paper, we present an novel method to learn domain-invariant features from both interdomain and intra-domain perspectives. For intra-domain invariant features, we propose a knowledge distillation framework to extract EEG phase-invariant features within one domain. As for inter-domain invariant features, correlation alignment is used to bridge distribution gaps across multiple domains. Experimental results on three public datasets validate the effectiveness of our method, showcasing stateof-the-art performance. To the best of our knowledge, this is the first domain generalization study that exploit Fourier phase information as an intra-domain invariant feature to facilitate EEG generalization. More importantly, the zerocalibration BCI based on inter- and intra-domain invariant features has significant potential to advance the practical applications of BCIs in real world. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.11155 [pdf, other]

Inner-approximate Reachability Computation via Zonotopic Boundary Analysis

Authors: De** Ren, Zhen Liang, Chenyu Wu, Jianqiang Ding, Taoran Wu, Bai Xue

Abstract: Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability… ▽ More Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability analysis problem based on the set-boundary reachability method for systems modelled by ordinary differential equations, in which the computed inner-approximations are represented with zonotopes. The set-boundary reachability method computes an inner-approximation by excluding states reached from the initial set's boundary. The effectiveness of this method is highly dependent on the efficient extraction of the exact boundary of the initial set. To address this, we propose methods leveraging boundary and tiling matrices that can efficiently extract and refine the exact boundary of the initial set represented by zonotopes. Additionally, we enhance the exclusion strategy by contracting the outer-approximations in a flexible way, which allows for the computation of less conservative inner-approximations. To evaluate the proposed method, we compare it with state-of-the-art methods against a series of benchmarks. The numerical results demonstrate that our method is not only efficient but also accurate in computing inner-approximations. △ Less

Submitted 21 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: the extended version of the paper accepted by CAV 2024

arXiv:2405.06971 [pdf, other]

Controlling network-coupled neural dynamics with nonlinear network control theory

Authors: Zhongye Xia, Weibin Li, Zhichao Liang, Kexin Lou, Quanying Liu

Abstract: This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-cou… ▽ More This paper addresses the problem of controlling the temporal dynamics of complex nonlinear network-coupled dynamical systems, specifically in terms of neurodynamics. Based on the Lyapunov direct method, we derive a control strategy with theoretical guarantees of controllability. To verify the performance of the derived control strategy, we perform numerical experiments on two nonlinear network-coupled dynamical systems that emulate phase synchronization and neural population dynamics. The results demonstrate the feasibility and effectiveness of our control strategy. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.03123 [pdf, other]

Revealing Decision Conservativeness Through Inverse Distributionally Robust Optimization

Authors: Qi Li, Zhirui Liang, Andrey Bernstein, Yury Dvorkin

Abstract: This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I… ▽ More This paper introduces Inverse Distributionally Robust Optimization (I-DRO) as a method to infer the conservativeness level of a decision-maker, represented by the size of a Wasserstein metric-based ambiguity set, from the optimal decisions made using Forward Distributionally Robust Optimization (F-DRO). By leveraging the Karush-Kuhn-Tucker (KKT) conditions of the convex F-DRO model, we formulate I-DRO as a bi-linear program, which can be solved using off-the-shelf optimization solvers. Additionally, this formulation exhibits several advantageous properties. We demonstrate that I-DRO not only guarantees the existence and uniqueness of an optimal solution but also establishes the necessary and sufficient conditions for this optimal solution to accurately match the actual conservativeness level in F-DRO. Furthermore, we identify three extreme scenarios that may impact I-DRO effectiveness. Our case study applies F-DRO for power system scheduling under uncertainty and employs I-DRO to recover the conservativeness level of system operators. Numerical experiments based on an IEEE 5-bus system and a realistic NYISO 11-zone system demonstrate I-DRO performance in both normal and extreme scenarios. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2405.00734 [pdf, other]

EEG-MACS: Manifold Attention and Confidence Stratification for EEG-based Cross-Center Brain Disease Diagnosis under Unreliable Annotations

Authors: Zhenxi Song, Ruihan Qin, Huixia Ren, Zhen Liang, Yi Guo, Min Zhang, Zhiguo Zhang

Abstract: Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Mani… ▽ More Cross-center data heterogeneity and annotation unreliability significantly challenge the intelligent diagnosis of diseases using brain signals. A notable example is the EEG-based diagnosis of neurodegenerative diseases, which features subtler abnormal neural dynamics typically observed in small-group settings. To advance this area, in this work, we introduce a transferable framework employing Manifold Attention and Confidence Stratification (MACS) to diagnose neurodegenerative disorders based on EEG signals sourced from four centers with unreliable annotations. The MACS framework's effectiveness stems from these features: 1) The Augmentor generates various EEG-represented brain variants to enrich the data space; 2) The Switcher enhances the feature space for trusted samples and reduces overfitting on incorrectly labeled samples; 3) The Encoder uses the Riemannian manifold and Euclidean metrics to capture spatiotemporal variations and dynamic synchronization in EEG; 4) The Projector, equipped with dual heads, monitors consistency across multiple brain variants and ensures diagnostic accuracy; 5) The Stratifier adaptively stratifies learned samples by confidence levels throughout the training process; 6) Forward and backpropagation in MACS are constrained by confidence stratification to stabilize the learning system amid unreliable annotations. Our subject-independent experiments, conducted on both neurocognitive and movement disorders using cross-center corpora, have demonstrated superior performance compared to existing related algorithms. This work not only improves EEG-based diagnostics for cross-center and small-setting brain diseases but also offers insights into extending MACS techniques to other data analyses, tackling data heterogeneity and annotation unreliability in multimedia and multimodal content understanding. △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2404.19214 [pdf, other]

EfficientASR: Speech Recognition Network Compression via Attention Redundancy and Chunk-Level FFN Optimization

Authors: Jianzong Wang, Ziqi Liang, Xulong Zhang, Ning Cheng, **g Xiao

Abstract: In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared… ▽ More In recent years, Transformer networks have shown remarkable performance in speech recognition tasks. However, their deployment poses challenges due to high computational and storage resource requirements. To address this issue, a lightweight model called EfficientASR is proposed in this paper, aiming to enhance the versatility of Transformer models. EfficientASR employs two primary modules: Shared Residual Multi-Head Attention (SRMHA) and Chunk-Level Feedforward Networks (CFFN). The SRMHA module effectively reduces redundant computations in the network, while the CFFN module captures spatial knowledge and reduces the number of parameters. The effectiveness of the EfficientASR model is validated on two public datasets, namely Aishell-1 and HKUST. Experimental results demonstrate a 36% reduction in parameters compared to the baseline Transformer network, along with improvements of 0.3% and 0.2% in Character Error Rate (CER) on the Aishell-1 and HKUST datasets, respectively. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

arXiv:2404.19212 [pdf, other]

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning

Authors: Ziqi Liang, Jianzong Wang, Xulong Zhang, Yong Zhang, Ning Cheng, **g Xiao

Abstract: Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of informat… ▽ More Using unsupervised learning to disentangle speech into content, rhythm, pitch, and timbre for voice conversion has become a hot research topic. Existing works generally take into account disentangling speech components through human-crafted bottleneck features which can not achieve sufficient information disentangling, while pitch and rhythm may still be mixed together. There is a risk of information overlap in the disentangling process which results in less speech naturalness. To overcome such limits, we propose a two-stage model to disentangle speech representations in a self-supervised manner without a human-crafted bottleneck design, which uses the Mutual Information (MI) with the designed upper bound estimator (IFUB) to separate overlap** information between speech components. Moreover, we design a Joint Text-Guided Consistent (TGC) module to guide the extraction of speech content and eliminate timbre leakage issues. Experiments show that our model can achieve a better performance than the baseline, regarding disentanglement effectiveness, speech naturalness, and similarity. Audio samples can be found at https://largeaudiomodel.com/eadvc. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

arXiv:2404.16357 [pdf, other]

Reverse engineering the brain input: Network control theory to identify cognitive task-related control nodes

Authors: Zhichao Liang, Yinuo Zhang, Jushen Wu, Quanying Liu

Abstract: The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is… ▽ More The human brain receives complex inputs when performing cognitive tasks, which range from external inputs via the senses to internal inputs from other brain regions. However, the explicit inputs to the brain during a cognitive task remain unclear. Here, we present an input identification framework for reverse engineering the control nodes and the corresponding inputs to the brain. The framework is verified with synthetic data generated by a predefined linear system, indicating it can robustly reconstruct data and recover the inputs. Then we apply the framework to the real motor-task fMRI data from 200 human subjects. Our results show that the model with sparse inputs can reconstruct neural dynamics in motor tasks ($EV=0.779$) and the identified 28 control nodes largely overlap with the motor system. Underpinned by network control theory, our framework offers a general tool for understanding brain inputs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.08164 [pdf, other]

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

Authors: Ziqi Liang, Haoxiang Shi, Jiawei Wang, Keda Lu

Abstract: Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techn… ▽ More Recently, deep learning-based Text-to-Speech (TTS) systems have achieved high-quality speech synthesis results. Recurrent neural networks have become a standard modeling technique for sequential data in TTS systems and are widely used. However, training a TTS model which includes RNN components requires powerful GPU performance and takes a long time. In contrast, CNN-based sequence synthesis techniques can significantly reduce the parameters and training time of a TTS model while guaranteeing a certain performance due to their high parallelism, which alleviate these economic costs of training. In this paper, we propose a lightweight TTS system based on deep convolutional neural networks, which is a two-stage training end-to-end TTS model and does not employ any recurrent units. Our model consists of two stages: Text2Spectrum and SSRN. The former is used to encode phonemes into a coarse mel spectrogram and the latter is used to synthesize the complete spectrum from the coarse mel spectrogram. Meanwhile, we improve the robustness of our model by a series of data augmentations, such as noise suppression, time war**, frequency masking and time masking, for solving the low resource mongolian problem. Experiments show that our model can reduce the training time and parameters while ensuring the quality and naturalness of the synthesized speech compared to using mainstream TTS models. Our method uses NCMMSC2022-MTTSC Challenge dataset for validation, which significantly reduces training time while maintaining a certain accuracy. △ Less

Submitted 17 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

Comments: Accepted by the 27th IEEE International Conference on Computer Supported Cooperative Work in Design (IEEE CSCWD 2024). arXiv admin note: substantial text overlap with arXiv:2211.01948

arXiv:2312.08862 [pdf, other]

Semantics-Division Duplexing: A Novel Full-Duplex Paradigm

Authors: Kai Niu, Zijian Liang, Chao Dong, **cheng Dai, Zhongwei Si, ** Zhang

Abstract: In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limi… ▽ More In-band full-duplex (IBFD) is a theoretically effective solution to increase the overall throughput for the future wireless communications system by enabling transmission and reception over the same time-frequency resources. However, reliable source reconstruction remains a great challenge in the practical IBFD systems due to the non-ideal elimination of the self-interference and the inherent limitations of the separate source and channel coding methods. On the other hand, artificial intelligence-enabled semantic communication can provide a viable direction for the optimization of the IBFD system. This article introduces a novel IBFD paradigm with the guidance of semantic communication called semantics-division duplexing (SDD). It utilizes semantic domain processing to further suppress self-interference, distinguish the expected semantic information, and recover the desired sources. Further integration of the digital and semantic domain processing can be implemented so as to achieve intelligent and concise communications. We present the advantages of the SDD paradigm with theoretical explanations and provide some visualized results to verify its effectiveness. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 9 pages, 5 figures, submitted to IEEE Wireless Communications Magazine

arXiv:2312.01727 [pdf]

Deep learning acceleration of iterative model-based light fluence correction for photoacoustic tomography

Authors: Zhaoyong Liang, Shuangyang Zhang, Zhichao Liang, Zhongxin Mo, Xiaoming Zhang, Yutian Zhong, Wufan Chen, Li Qi

Abstract: Photoacoustic tomography (PAT) is a promising imaging technique that can visualize the distribution of chromophores within biological tissue. However, the accuracy of PAT imaging is compromised by light fluence (LF), which hinders the quantification of light absorbers. Currently, model-based iterative methods are used for LF correction, but they require significant computational resources due to r… ▽ More Photoacoustic tomography (PAT) is a promising imaging technique that can visualize the distribution of chromophores within biological tissue. However, the accuracy of PAT imaging is compromised by light fluence (LF), which hinders the quantification of light absorbers. Currently, model-based iterative methods are used for LF correction, but they require significant computational resources due to repeated LF estimation based on differential light transport models. To improve LF correction efficiency, we propose to use Fourier neural operator (FNO), a neural network specially designed for solving differential equations, to learn the forward projection of light transport in PAT. Trained using paired finite-element-based LF simulation data, our FNO model replaces the traditional computational heavy LF estimator during iterative correction, such that the correction procedure is significantly accelerated. Simulation and experimental results demonstrate that our method achieves comparable LF correction quality to traditional iterative methods while reducing the correction time by over 30 times. △ Less

Submitted 7 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.09563 [pdf, other]

doi 10.1109/TEMPR.2024.3390760

Multi-Objective Transmission Expansion: An Offshore Wind Power Integration Case Study

Authors: Saroj Khanal, Christoph Graf, Zhirui Liang, Yury Dvorkin, Burçin Ünel

Abstract: Despite ambitious offshore wind targets in the U.S. and globally, offshore grid planning guidance remains notably scarce, contrasting with well-established frameworks for onshore grids. This gap, alongside the increasing penetration of offshore wind and other clean-energy resources in onshore grids, highlights the urgent need for a coordinated planning framework. Our paper describes a multi-object… ▽ More Despite ambitious offshore wind targets in the U.S. and globally, offshore grid planning guidance remains notably scarce, contrasting with well-established frameworks for onshore grids. This gap, alongside the increasing penetration of offshore wind and other clean-energy resources in onshore grids, highlights the urgent need for a coordinated planning framework. Our paper describes a multi-objective, multistage generation, storage and transmission expansion planning model to facilitate efficient and resilient large-scale adoption of offshore wind power. Recognizing regulatory emphasis and, in some cases, requirements to consider externalities, this model explicitly accounts for negative externalities: greenhouse gas emissions and local emission-induced air pollution. Utilizing an 8-zone ISO-NE test system and a 9-zone PJM test system, we explore grid expansion sensitivities such as impacts of optimizing Points of Interconnection (POIs) versus fixed POIs, negative externalities, and consideration of extreme operational scenarios resulting from offshore wind integration. Our results indicate that accounting for negative externalities necessitates greater upfront investment in clean generation and storage (balanced by lower expected operational costs). Optimizing POIs could significantly reshape offshore topology or POIs, and lower total cost. Finally, accounting for extreme operational scenarios typically results in greater operational costs and sometimes may alter onshore line investment. △ Less

Submitted 21 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.05188 [pdf, other]

Sound field reconstruction using neural processes with dynamic kernels

Authors: Zining Liang, Wen Zhang, Thushara D. Abhayapala

Abstract: Accurately representing the sound field with the high spatial resolution is critical for immersive and interactive sound field reproduction technology. To minimize experimental effort, data-driven methods have been proposed to estimate sound fields from a small number of discrete observations. In particular, kernel-based methods using Gaussian Processes (GPs) with a covariance function to model sp… ▽ More Accurately representing the sound field with the high spatial resolution is critical for immersive and interactive sound field reproduction technology. To minimize experimental effort, data-driven methods have been proposed to estimate sound fields from a small number of discrete observations. In particular, kernel-based methods using Gaussian Processes (GPs) with a covariance function to model spatial correlations have been used for sound field reconstruction. However, these methods have limitations due to the fixed kernels having limited expressiveness, requiring manual identification of optimal kernels for different sound fields. In this work, we propose a new approach that parameterizes GPs using a deep neural network based on Neural Processes (NPs) to reconstruct the magnitude of the sound field. This method has the advantage of dynamically learning kernels from simulated data using an attention mechanism, allowing for greater flexibility and adaptability to the acoustic properties of the sound field. Numerical experiments demonstrate that our proposed approach outperforms current methods in reconstructing accuracy, providing a promising alternative for sound field reconstruction. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.07648 [pdf, other]

Incorporating Class-based Language Model for Named Entity Recognition in Factorized Neural Transducer

Authors: Peng Wang, Yifan Yang, Zheng Liang, Tian Tan, Shiliang Zhang, Xie Chen

Abstract: Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along… ▽ More Despite advancements of end-to-end (E2E) models in speech recognition, named entity recognition (NER) is still challenging but critical for semantic understanding. Previous studies mainly focus on various rule-based or attention-based contextual biasing algorithms. However, their performance might be sensitive to the biasing weight or degraded by excessive attention to the named entity list, along with a risk of false triggering. Inspired by the success of the class-based language model (LM) in NER in conventional hybrid systems and the effective decoupling of acoustic and linguistic information in the factorized neural Transducer (FNT), we propose C-FNT, a novel E2E model that incorporates class-based LMs into FNT. In C-FNT, the LM score of named entities can be associated with the name class instead of its surface form. The experimental results show that our proposed C-FNT significantly reduces error in named entities without hurting performance in general word recognition. △ Less

Submitted 8 June, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Accepted in INTERSPEECH 2024

arXiv:2308.16150 [pdf, other]

Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRI

Authors: Ziyun Liang, Harry Anthony, Felix Wagner, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like… ▽ More Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality map**s, which are characteristic of tissue physiology. Thus, these learned map**s fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models. △ Less

Submitted 2 November, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted in Multiscale Multimodal Medical Imaging workshop in MICCAI 2023

arXiv:2308.11635 [pdf, other]

Semi-Supervised Dual-Stream Self-Attentive Adversarial Graph Contrastive Learning for Cross-Subject EEG-based Emotion Recognition

Authors: Weishan Ye, Zhiguo Zhang, Min Zhang, Fei Teng, Li Zhang, Linling Li, Gan Huang, Jianhong Wang, Dong Ni, Zhen Liang

Abstract: Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the ch… ▽ More Electroencephalography (EEG) is an objective tool for emotion recognition with promising applications. However, the scarcity of labeled data remains a major challenge in this field, limiting the widespread use of EEG-based emotion recognition. In this paper, a semi-supervised Dual-stream Self-Attentive Adversarial Graph Contrastive learning framework (termed as DS-AGC) is proposed to tackle the challenge of limited labeled data in cross-subject EEG-based emotion recognition. The DS-AGC framework includes two parallel streams for extracting non-structural and structural EEG features. The non-structural stream incorporates a semi-supervised multi-domain adaptation method to alleviate distribution discrepancy among labeled source domain, unlabeled source domain, and unknown target domain. The structural stream develops a graph contrastive learning method to extract effective graph-based feature representation from multiple EEG channels in a semi-supervised manner. Further, a self-attentive fusion module is developed for feature fusion, sample selection, and emotion recognition, which highlights EEG features more relevant to emotions and data samples in the labeled source domain that are closer to the target domain. Extensive experiments conducted on two benchmark databases (SEED and SEED-IV) using a semi-supervised cross-subject leave-one-subject-out cross-validation evaluation scheme show that the proposed model outperforms existing methods under different incomplete label conditions (with an average improvement of 5.83% on SEED and 6.99% on SEED-IV), demonstrating its effectiveness in addressing the label scarcity problem in cross-subject EEG-based emotion recognition. △ Less

Submitted 13 August, 2023; originally announced August 2023.

Comments: arXiv admin note: text overlap with arXiv:2304.06496

arXiv:2308.08968 [pdf, other]

On the Performance of Multidimensional Constellation Sha** for Linear and Nonlinear Optical Fiber Channel

Authors: Bin Chen, Zhiwei Liang, Shen Li, Yi Lei, Gabriele Liga, Alex Alvarado

Abstract: Multidimensional constellation sha** of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations. The results show that no constellation is universal and the balance of required and effective SNRs should be jointly considered for the specific optical transmission scenario. Multidimensional constellation sha** of up to 32 dimensions with different spectral efficiencies are compared through AWGN and fiber-optic simulations. The results show that no constellation is universal and the balance of required and effective SNRs should be jointly considered for the specific optical transmission scenario. △ Less

Submitted 18 October, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

Comments: The paper has been accepted by the ECOC 2023

arXiv:2306.10494 [pdf, other]

Semi-Supervised Learning for Multi-Label Cardiovascular Diseases Prediction:A Multi-Dataset Study

Authors: Rushuang Zhou, Lei Lu, Zijun Liu, Ting Xiang, Zhen Liang, David A. Clifton, Yining Dong, Yuan-Ting Zhang

Abstract: Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based… ▽ More Electrocardiography (ECG) is a non-invasive tool for predicting cardiovascular diseases (CVDs). Current ECG-based diagnosis systems show promising performance owing to the rapid development of deep learning techniques. However, the label scarcity problem, the co-occurrence of multiple CVDs and the poor performance on unseen datasets greatly hinder the widespread application of deep learning-based models. Addressing them in a unified framework remains a significant challenge. To this end, we propose a multi-label semi-supervised model (ECGMatch) to recognize multiple CVDs simultaneously with limited supervision. In the ECGMatch, an ECGAugment module is developed for weak and strong ECG data augmentation, which generates diverse samples for model training. Subsequently, a hyperparameter-efficient framework with neighbor agreement modeling and knowledge distillation is designed for pseudo-label generation and refinement, which mitigates the label scarcity problem. Finally, a label correlation alignment module is proposed to capture the co-occurrence information of different CVDs within labeled samples and propagate this information to unlabeled samples. Extensive experiments on four datasets and three protocols demonstrate the effectiveness and stability of the proposed model, especially on unseen datasets. As such, this model can pave the way for diagnostic systems that achieve robust performance on multi-label CVDs prediction with limited supervision. △ Less

Submitted 18 June, 2023; originally announced June 2023.

arXiv:2306.08588 [pdf, other]

Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation

Authors: Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen

Abstract: Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the cu… ▽ More Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition. However, there remain several challenging scenarios that E2E models are not competent in, such as code-switching and named entity recognition (NER). Data augmentation is a common and effective practice for these two scenarios. However, the current data augmentation methods mainly rely on audio splicing and text-to-speech (TTS) models, which might result in discontinuous, unrealistic, and less diversified speech. To mitigate these potential issues, we propose a novel data augmentation method by applying the text-based speech editing model. The augmented speech from speech editing systems is more coherent and diversified, also more akin to real speech. The experimental results on code-switching and NER tasks show that our proposed method can significantly outperform the audio splicing and neural TTS based data augmentation systems. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted by Interspeech 2023

arXiv:2306.07547 [pdf, other]

doi 10.1609/aaai.v38i16.29747

UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding

Authors: Chenpeng Du, Yiwei Guo, Feiyu Shen, Zhijun Liu, Zheng Liang, Xie Chen, Shuai Wang, Hui Zhang, Kai Yu

Abstract: The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted… ▽ More The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens, has been proven superior to traditional acoustic feature mel-spectrograms in terms of naturalness and robustness for text-to-speech (TTS) synthesis. Recent popular models, such as VALL-E and SPEAR-TTS, allow zero-shot speaker adaptation through auto-regressive (AR) continuation of acoustic tokens extracted from a short speech prompt. However, these AR models are restricted to generate speech only in a left-to-right direction, making them unsuitable for speech editing where both preceding and following contexts are provided. Furthermore, these models rely on acoustic tokens, which have audio quality limitations imposed by the performance of audio codec models. In this study, we propose a unified context-aware TTS framework called UniCATS, which is capable of both speech continuation and editing. UniCATS comprises two components, an acoustic model CTX-txt2vec and a vocoder CTX-vec2wav. CTX-txt2vec employs contextual VQ-diffusion to predict semantic tokens from the input text, enabling it to incorporate the semantic context and maintain seamless concatenation with the surrounding context. Following that, CTX-vec2wav utilizes contextual vocoding to convert these semantic tokens into waveforms, taking into consideration the acoustic context. Our experimental results demonstrate that CTX-vec2wav outperforms HifiGAN and AudioLM in terms of speech resynthesis from semantic tokens. Moreover, we show that UniCATS achieves state-of-the-art performance in both speech continuation and editing. △ Less

Submitted 28 March, 2024; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: Accepted to AAAI 2024

arXiv:2305.15193 [pdf, other]

Adaptive Policy Learning to Additional Tasks

Authors: Wenjian Hao, Zehui Lu, Zihao Liang, Tianyu Zhou, Shaoshuai Mou

Abstract: This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the converg… ▽ More This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/ε)$, respectively, where $T$ denotes the number of iterations and $ε$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.15188 [pdf, other]

Policy Learning based on Deep Koopman Representation

Authors: Wenjian Hao, Paulo C. Heredia, Bowen Huang, Zehui Lu, Zihao Liang, Shaoshuai Mou

Abstract: This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment. The proposed algorithm has two innovations: first, it introduces the so-called deep Koopman representation int… ▽ More This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment. The proposed algorithm has two innovations: first, it introduces the so-called deep Koopman representation into the policy gradient to achieve a linear approximation of the unknown dynamical system, all with the purpose of improving data efficiency; second, the accumulated errors for long-term tasks induced by approximating system dynamics are avoided by applying Bellman's principle of optimality. Furthermore, a theoretical analysis is provided to prove the asymptotic convergence of the proposed algorithm and characterize the corresponding sampling complexity. These conclusions are also supported by simulations on several challenging benchmark environments. △ Less

Submitted 24 May, 2023; originally announced May 2023.

arXiv:2305.07926 [pdf, other]

doi 10.1038/s41467-024-48785-1

Characteristic time of transient response of solid oxide cells (SOCs) to changes in voltage/current: from theory to applications

Authors: Zhaojian Liang, **gyi Wang, Liang An, Yang Wang, Meng Ni, Mengying Li

Abstract: The intermittency of solar and wind power can be addressed by integrating them with Solid Oxide Cells (SOCs). This study delves into the transient characteristics of SOCs and their dependence on dynamic heat and mass transfer processes. Non-dimensional analysis was used to identify influential parameters, followed by a 3-D numerical simulation-based parametric analysis to examine the dynamic gaseo… ▽ More The intermittency of solar and wind power can be addressed by integrating them with Solid Oxide Cells (SOCs). This study delves into the transient characteristics of SOCs and their dependence on dynamic heat and mass transfer processes. Non-dimensional analysis was used to identify influential parameters, followed by a 3-D numerical simulation-based parametric analysis to examine the dynamic gaseous and thermal responses of SOCs with varying dimensions, material properties, and operating conditions. For the first time, we proposed characteristic times to describe the relationship between SOC transients and multiple parameters. These characteristic times represent the overall heat and mass transfer rats in SOCs. Their effectiveness was validated against literature and demonstrated potential in characterizing the transient characteristics of other electrochemical cells. Besides, two examples are provided to illustrate how the characteristic times facilitate SOC design and control at minimal computational cost. △ Less

Submitted 30 May, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

Journal ref: Nat Commun 15, 4587 (2024)

arXiv:2304.06496 [pdf, other]

EEGMatch: Learning with Incomplete Labels for Semi-Supervised EEG-based Cross-Subject Emotion Recognition

Authors: Rushuang Zhou, Weishan Ye, Zhiguo Zhang, Yanyang Luo, Li Zhang, Linling Li, Gan Huang, Yining Dong, Yuan-Ting Zhang, Zhen Liang

Abstract: Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based… ▽ More Electroencephalography (EEG) is an objective tool for emotion recognition and shows promising performance. However, the label scarcity problem is a main challenge in this field, which limits the wide application of EEG-based emotion recognition. In this paper, we propose a novel semi-supervised learning framework (EEGMatch) to leverage both labeled and unlabeled EEG data. First, an EEG-Mixup based data augmentation method is developed to generate more valid samples for model learning. Second, a semi-supervised two-step pairwise learning method is proposed to bridge prototype-wise and instance-wise pairwise learning, where the prototype-wise pairwise learning measures the global relationship between EEG data and the prototypical representation of each emotion class and the instance-wise pairwise learning captures the local intrinsic relationship among EEG data. Third, a semi-supervised multi-domain adaptation is introduced to align the data representation among multiple domains (labeled source domain, unlabeled source domain, and target domain), where the distribution mismatch is alleviated. Extensive experiments are conducted on two benchmark databases (SEED and SEED-IV) under a cross-subject leave-one-subject-out cross-validation evaluation protocol. The results show the proposed EEGmatch performs better than the state-of-the-art methods under different incomplete label conditions (with 6.89% improvement on SEED and 1.44% improvement on SEED-IV), which demonstrates the effectiveness of the proposed EEGMatch in dealing with the label scarcity problem in emotion recognition using EEG signals. The source code is available at https://github.com/KAZABANA/EEGMatch. △ Less

Submitted 27 March, 2023; originally announced April 2023.

arXiv:2304.00100 [pdf, other]

A Data-Driven Approach for Inverse Optimal Control

Authors: Zihao Liang, Wenjian Hao, Shaoshuai Mou

Abstract: This paper proposes a data-driven, iterative approach for inverse optimal control (IOC), which aims to learn the objective function of a nonlinear optimal control system given its states and inputs. The approach solves the IOC problem in a challenging situation when the system dynamics is unknown. The key idea of the proposed approach comes from the deep Koopman representation of the unknown syste… ▽ More This paper proposes a data-driven, iterative approach for inverse optimal control (IOC), which aims to learn the objective function of a nonlinear optimal control system given its states and inputs. The approach solves the IOC problem in a challenging situation when the system dynamics is unknown. The key idea of the proposed approach comes from the deep Koopman representation of the unknown system, which employs a deep neural network to represent observables for the Koopman operator. By assuming the objective function to be learned is parameterized as a linear combination of features with unknown weights, the proposed approach for IOC is able to achieve a Koopman representation of the unknown dynamics and the unknown weights in objective function together. Simulation is provided to verify the proposed approach. △ Less

Submitted 31 March, 2023; originally announced April 2023.

arXiv:2304.00062 [pdf, other]

A Physics-Informed Machine Learning for Electricity Markets: A NYISO Case Study

Authors: Robert Ferrando, Laurent Pagnier, Robert Mieth, Zhirui Liang, Yury Dvorkin, Daniel Bienstock, Michael Chertkov

Abstract: This paper addresses the challenge of efficiently solving the optimal power flow problem in real-time electricity markets. The proposed solution, named Physics-Informed Market-Aware Active Set learning OPF (PIMA-AS-OPF), leverages physical constraints and market properties to ensure physical and economic feasibility of market-clearing outcomes. Specifically, PIMA-AS-OPF employs the active set lear… ▽ More This paper addresses the challenge of efficiently solving the optimal power flow problem in real-time electricity markets. The proposed solution, named Physics-Informed Market-Aware Active Set learning OPF (PIMA-AS-OPF), leverages physical constraints and market properties to ensure physical and economic feasibility of market-clearing outcomes. Specifically, PIMA-AS-OPF employs the active set learning technique and expands its capabilities to account for curtailment in load or renewable power generation, which is a common challenge in real-world power systems. The core of PIMA-AS-OPF is a fully-connected neural network that takes the net load and the system topology as input. The outputs of this neural network include active constraints such as saturated generators and transmission lines, as well as non-zero load shedding and wind curtailments. These outputs allow for reducing the original market-clearing optimization to a system of linear equations, which can be solved efficiently and yield both the dispatch decisions and the locational marginal prices (LMPs). The dispatch decisions and LMPs are then tested for their feasibility with respect to the requirements for efficient market-clearing results. The accuracy and scalability of the proposed method is tested on a realistic 1814-bus NYISO system with current and future renewable energy penetration levels. △ Less

Submitted 31 March, 2023; originally announced April 2023.

arXiv:2303.12360 [pdf]

Automatically Predict Material Properties with Microscopic Image Example Polymer Compatibility

Authors: Zhilong Liang, Zhenzhi Tan, Ruixin Hong, Wanli Ouyang, **ying Yuan, Changshui Zhang

Abstract: Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer miscibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Comput… ▽ More Many material properties are manifested in the morphological appearance and characterized with microscopic image, such as scanning electron microscopy (SEM). Polymer miscibility is a key physical quantity of polymer material and commonly and intuitively judged by SEM images. However, human observation and judgement for the images is time-consuming, labor-intensive and hard to be quantified. Computer image recognition with machine learning method can make up the defects of artificial judging, giving accurate and quantitative judgement. We achieve automatic miscibility recognition utilizing convolution neural network and transfer learning method, and the model obtains up to 94% accuracy. We also put forward a quantitative criterion for polymer miscibility with this model. The proposed method can be widely applied to the quantitative characterization of the microstructure and properties of various materials. △ Less

Submitted 3 August, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2302.06831 [pdf, other]

Analytical Model of Nonlinear Fiber Propagation for General Dual-Polarization Four-Dimensional Modulation Format

Authors: Zhiwei Liang, Bin Chen, Yi Lei, Gabriele Liga, Alex Alvarado

Abstract: Coherent dual-polarization (DP) optical transmission systems encode information on the four available degrees of freedom of an optical field: the two polarization states, each with two quadrature components. Such systems naturally operate based on a four-dimensional (4D) signal space. Having a general analytical model to accurately estimate nonlinear interference (NLI) is key to analyze such trans… ▽ More Coherent dual-polarization (DP) optical transmission systems encode information on the four available degrees of freedom of an optical field: the two polarization states, each with two quadrature components. Such systems naturally operate based on a four-dimensional (4D) signal space. Having a general analytical model to accurately estimate nonlinear interference (NLI) is key to analyze such transmission systems as well as to study how different DP-4D formats are affected by NLI. However, the available models in the literature are not completely general. They either do not apply to the entire DP-4D formats or do not consider all the NLI contributions. In this paper, we develop a model that applies to all DP-4D modulation formats with independent symbols. Our model takes self-channel interference, cross-channel interference and multiple-channel interference effects into account. As an application of our model, we further study the effects of signal-noise interactions in long-haul transmission via the proposed model. When compared to previous results in the literature, our model is more accurate at predicting the contribution of NLI for both low and high dispersion fibers in single- and multi-channel transmission systems. For the NLI, we report an average gap from split step Fourier simulation results below 0.15 dB. The simulation results further show that by considering signal-noise interactions, the proposed model in long-haul transmission can reduce the transmission reach prediction error by 4%. △ Less

Submitted 9 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 12 pages,8 figures

arXiv:2302.05498 [pdf, other]

Data-Driven Inverse Optimization for Marginal Offer Price Recovery in Electricity Markets

Authors: Zhirui Liang, Yury Dvorkin

Abstract: This paper presents a data-driven inverse optimization (IO) approach to recover the marginal offer prices of generators in a wholesale energy market. By leveraging underlying market-clearing processes, we establish a closed-form relationship between the unknown parameters and the publicly available market-clearing results. Based on this relationship, we formulate the data-driven IO problem as a co… ▽ More This paper presents a data-driven inverse optimization (IO) approach to recover the marginal offer prices of generators in a wholesale energy market. By leveraging underlying market-clearing processes, we establish a closed-form relationship between the unknown parameters and the publicly available market-clearing results. Based on this relationship, we formulate the data-driven IO problem as a computationally feasible single-level optimization problem. The solution of the data-driven model is based on the gradient descent method, which provides an error bound on the optimal solution and a sub-linear convergence rate. We also rigorously prove the existence and uniqueness of the global optimum to the proposed data-driven IO problem and analyze its robustness in two possible noisy settings. The effectiveness of the proposed method is demonstrated through simulations in both an illustrative IEEE 14-bus system and a realistic NYISO 1814-bus system. △ Less

Submitted 16 May, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

arXiv:2212.00661 [pdf, other]

Hybrid Gate-Pulse Model for Variational Quantum Algorithms

Authors: Zhiding Liang, Zhixin Song, **glei Cheng, Zichang He, Ji Liu, Hanrui Wang, Ruiyang Qin, Yiru Wang, Song Han, Xuehai Qian, Yiyu Shi

Abstract: Current quantum programs are mostly synthesized and compiled on the gate-level, where quantum circuits are composed of quantum gates. The gate-level workflow, however, introduces significant redundancy when quantum gates are eventually transformed into control signals and applied on quantum devices. For superconducting quantum computers, the control signals are microwave pulses. Therefore, pulse-l… ▽ More Current quantum programs are mostly synthesized and compiled on the gate-level, where quantum circuits are composed of quantum gates. The gate-level workflow, however, introduces significant redundancy when quantum gates are eventually transformed into control signals and applied on quantum devices. For superconducting quantum computers, the control signals are microwave pulses. Therefore, pulse-level optimization has gained more attention from researchers due to their advantages in terms of circuit duration. Recent works, however, are limited by their poor scalability brought by the large parameter space of control signals. In addition, the lack of gate-level "knowledge" also affects the performance of pure pulse-level frameworks. We present a hybrid gate-pulse model that can mitigate these problems. We propose to use gate-level compilation and optimization for "fixed" part of the quantum circuits and to use pulse-level methods for problem-agnostic parts. Experimental results demonstrate the efficiency of the proposed framework in discrete optimization tasks. We achieve a performance boost at most 8% with 60% shorter pulse duration in the problem-agnostic layer. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 8 pages, 6 figures

arXiv:2211.09854 [pdf, other]

An Iterative Method to Learn a Linear Control Barrier Function

Authors: Zihao Liang, Jason King Ching Lo

Abstract: Control barrier function (CBF) has recently started to serve as a basis to develop approaches for enforcing safety requirements in control systems. However, constructing such function for a general system is a non-trivial task. This paper proposes an iterative, optimization-based framework to obtain a CBF from a given user-specified set for a general control affine system. Without losing generalit… ▽ More Control barrier function (CBF) has recently started to serve as a basis to develop approaches for enforcing safety requirements in control systems. However, constructing such function for a general system is a non-trivial task. This paper proposes an iterative, optimization-based framework to obtain a CBF from a given user-specified set for a general control affine system. Without losing generality, we parameterize the CBF as a set of linear functions of states. By taking samples from the given user-specified set, we reformulate the problem of learning a CBF into an optimization problem that solves for linear function coefficients. The resulting linear functions construct the CBF and yield a safe set which has forward invariance property. In addition, the proposed framework explicitly addresses control input constraints during the construction of CBFs. Effectiveness of the proposed method is demonstrated by learning a CBF for an nonlinear Moore Greitzer jet engine, where the system trajectory is prevented from entering unsafe set. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2211.09381 [pdf, other]

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

Authors: Zhiyun Fan, Zhenlin Liang, Linhao Dong, Yi Liu, Shiyu Zhou, Meng Cai, Jun Zhang, Zejun Ma, Bo Xu

Abstract: In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to segment the audio and then transcribe each segmentation. These two stages are addressed separately by speaker change detection (SCD) and automatic speech recognition (ASR). Most previous SCD systems rely solely on speaker information and ignore the importance of speech content. In this p… ▽ More In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to segment the audio and then transcribe each segmentation. These two stages are addressed separately by speaker change detection (SCD) and automatic speech recognition (ASR). Most previous SCD systems rely solely on speaker information and ignore the importance of speech content. In this paper, we propose a novel SCD system that considers both cues of speaker difference and speech content. These two cues are converted into token-level representations by the continuous integrate-and-fire (CIF) mechanism and then combined for detecting speaker changes on the token acoustic boundaries. We evaluate the performance of our approach on a public real-recorded meeting dataset, AISHELL-4. The experiment results show that our method outperforms a competitive frame-level baseline system by 2.45% equal coverage-purity (ECP). In addition, we demonstrate the importance of speech content and speaker difference to the SCD task, and the advantages of conducting SCD on the token acoustic boundaries compared with conducting SCD frame by frame. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2210.05713 [pdf, other]

Explainable fMRI-based Brain Decoding via Spatial Temporal-pyramid Graph Convolutional Network

Authors: Ziyuan Ye, Youzhi Qu, Zhichao Liang, Mo Wang, Quanying Liu

Abstract: Brain decoding, aiming to identify the brain states using neural activity, is important for cognitive neuroscience and neural engineering. However, existing machine learning methods for fMRI-based brain decoding either suffer from low classification performance or poor explainability. Here, we address this issue by proposing a biologically inspired architecture, Spatial Temporal-pyramid Graph Conv… ▽ More Brain decoding, aiming to identify the brain states using neural activity, is important for cognitive neuroscience and neural engineering. However, existing machine learning methods for fMRI-based brain decoding either suffer from low classification performance or poor explainability. Here, we address this issue by proposing a biologically inspired architecture, Spatial Temporal-pyramid Graph Convolutional Network (STpGCN), to capture the spatial-temporal graph representation of functional brain activities. By designing multi-scale spatial-temporal pathways and bottom-up pathways that mimic the information process and temporal integration in the brain, STpGCN is capable of explicitly utilizing the multi-scale temporal dependency of brain activities via graph, thereby achieving high brain decoding performance. Additionally, we propose a sensitivity analysis method called BrainNetX to better explain the decoding results by automatically annotating task-related brain regions from the brain-network standpoint. We conduct extensive experiments on fMRI data under 23 cognitive tasks from Human Connectome Project (HCP) S1200. The results show that STpGCN significantly improves brain decoding performance compared to competing baseline models; BrainNetX successfully annotates task-relevant brain regions. Post hoc analysis based on these regions further validates that the hierarchical structure in STpGCN significantly contributes to the explainability, robustness and generalization of the model. Our methods not only provide insights into information representation in the brain under multiple cognitive tasks but also indicate a bright future for fMRI-based brain decoding. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2209.02604 [pdf, other]

Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

Authors: Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

Abstract: Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Un… ▽ More Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Under such circumstances, in this work, we emphasize making non-verbal cues matter for the MSA task. Firstly, from the resource perspective, we present the CH-SIMS v2.0 dataset, an extension and enhancement of the CH-SIMS. Compared with the original dataset, the CH-SIMS v2.0 doubles its size with another 2121 refined video segments with both unimodal and multimodal annotations and collects 10161 unlabelled raw video segments with rich acoustic and visual emotion-bearing context to highlight non-verbal cues for sentiment prediction. Secondly, from the model perspective, benefiting from the unimodal annotations and the unsupervised data in the CH-SIMS v2.0, the Acoustic Visual Mixup Consistent (AV-MC) framework is proposed. The designed modality mixup module can be regarded as an augmentation, which mixes the acoustic and visual modalities from different videos. Through drawing unobserved multimodal context along with the text, the model can learn to be aware of different non-verbal contexts for sentiment prediction. Our evaluations demonstrate that both CH-SIMS v2.0 and AV-MC framework enables further research for discovering emotion-bearing acoustic and visual cues and paves the path to interpretable end-to-end HCI applications for real-world scenarios. △ Less

Submitted 21 August, 2022; originally announced September 2022.

Comments: 16pages, 7 figures, accepted by ICMI 2022

arXiv:2209.00707 [pdf, other]

Weather-Driven Flexibility Reserve Procurement: A NYISO Offshore Wind Power Case Study

Authors: Zhirui Liang, Robert Mieth, Yury Dvorkin, Miguel A. Ortega-Vazquez

Abstract: The growing penetration of variable renewable energy sources (VRES) requires additional flexibility reserve to ensure reliable power system operations. Current industry practice typically assumes a certain fraction of the VRES power production forecast as flexibility reserve, thus ignoring other relevant information, such as weather conditions. To address this, probability- and risk-based reserve… ▽ More The growing penetration of variable renewable energy sources (VRES) requires additional flexibility reserve to ensure reliable power system operations. Current industry practice typically assumes a certain fraction of the VRES power production forecast as flexibility reserve, thus ignoring other relevant information, such as weather conditions. To address this, probability- and risk-based reserve sizing models have been proposed, which use probabilistic VRES power forecasts that mostly rely on historical forecast and actual VRES power data for model training. Hence, these approaches are not suitable for planned or newly installed wind farms, where no or insufficient historical data is available. This paper addresses this caveat. First, we propose a weather-driven probabilistic forecasting method for wind power installations using publicly available weather data. Second, we apply the resulting probabilistic forecasts to a novel risk-based flexibility reserve sizing model that is compatible with the current reserve procuring pipeline used by US ISOs. Finally, we compare the risk-based reserve requirements to industry practice, state-of-the-art reserve procurement methods, and a weather-ignorant benchmark with respect to system cost and security. Our results are obtained from real-world data on a 1819-bus NYISO system model with both on- and projected off-shore wind power installations, which highlight the usefulness of weather information wind power forecasting and demonstrate efficiency gains from risk-aware reserve procurement. △ Less

Submitted 10 December, 2022; v1 submitted 1 September, 2022; originally announced September 2022.

arXiv:2207.10282 [pdf]

doi 10.1109/JSEN.2021.3070689

An Evolutionary Game based Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Wireless Sensor Networks

Authors: Liu Yang, Yinzhi Lu, Simon X. Yang, Yuanchang Zhong, Tan Guo, Zhifang Liang

Abstract: Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluati… ▽ More Trustworthy and reliable data delivery is a challenging task in Wireless Sensor Networks (WSNs) due to unique characteristics and constraints. To acquire secured data delivery and address the conflict between security and energy, in this paper we present an evolutionary game based secure clustering protocol with fuzzy trust evaluation and outlier detection for WSNs. Firstly, a fuzzy trust evaluation method is presented to transform the transmission evidences into trust values while effectively alleviating the trust uncertainty. And then, a K-Means based outlier detection scheme is proposed to further analyze plenty of trust values obtained via fuzzy trust evaluation or trust recommendation. It can discover the commonalities and differences among sensor nodes while improving the accuracy of outlier detection. Finally, we present an evolutionary game based secure clustering protocol to achieve a trade-off between security assurance and energy saving for sensor nodes when electing for the cluster heads. A sensor node which failed to be the cluster head can securely choose its own head by isolating the suspicious nodes. Simulation results verify that our secure clustering protocol can effectively defend the network against the attacks from internal selfish or compromised nodes. Correspondingly, the timely data transfer rate can be improved significantly. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.09936 [pdf]

doi 10.1109/TII.2020.3019286

A Secure Clustering Protocol with Fuzzy Trust Evaluation and Outlier Detection for Industrial Wireless Sensor Networks

Authors: Liu Yang, Yinzhi Lu, Simon X. Yang, Tan Guo, Zhifang Liang

Abstract: Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And the… ▽ More Security is one of the major concerns in Industrial Wireless Sensor Networks (IWSNs). To assure the security in clustered IWSNs, this paper presents a secure clustering protocol with fuzzy trust evaluation and outlier detection (SCFTO). Firstly, to deal with the transmission uncertainty in an open wireless medium, an interval type-2 fuzzy logic controller is adopted to estimate the trusts. And then a density based outlier detection mechanism is introduced to acquire an adaptive trust threshold used to isolate the malicious nodes from being cluster heads. Finally, a fuzzy based cluster heads election method is proposed to achieve a balance between energy saving and security assurance, so that a normal sensor node with more residual energy or less confidence on other nodes has higher probability to be the cluster head. Extensive experiments verify that our secure clustering protocol can effectively defend the network against attacks from internal malicious or compromised nodes. △ Less

Submitted 20 July, 2022; originally announced July 2022.

arXiv:2207.01152 [pdf, other]

doi 10.1109/JLT.2022.3204101

Geometrically-Shaped Multi-Dimensional Modulation Formats in Coherent Optical Transmission Systems

Authors: Bin Chen, Yi Lei, Gabriele Liga, Zhiwei Liang, Wei Ling, Xuwei Xue, Alex Alvarado

Abstract: Sha** modulation formats in multi-dimensional (MD) space is an effective approach to harvest spectral efficiency gains in both the additive white Gaussian noise (AWGN) channel and the optical fiber channel. In the first part of this paper, existing MD geometrically-shaped modulations for fiber optical communications are reviewed. It is shown that large gains can be obtained by exploiting correla… ▽ More Sha** modulation formats in multi-dimensional (MD) space is an effective approach to harvest spectral efficiency gains in both the additive white Gaussian noise (AWGN) channel and the optical fiber channel. In the first part of this paper, existing MD geometrically-shaped modulations for fiber optical communications are reviewed. It is shown that large gains can be obtained by exploiting correlation in the dimensions or/and by increasing the cardinality of the modulation format. Practical limitations and challenges are also discussed together with efficient solutions. In the second part, we extend the recently proposed four-dimensional (4D) modulation format family based on the constraint of orthant-symmetry to high spectrum efficiencies up to 10 bit/4D-sym by maximizing generalized mutual information for AWGN channel. Reach increases of up to 25% for a multi-span optical fiber transmission system are reported. Lastly,with the help of a recently introduced nonlinear interference (NLI) model, an optimization for designing nonlinear-tolerant 4D modulation formats is introduced for a single-span optical fiber system. Simulation results show that the proposed NLI model-based 4D modulation format could increase the effective SNRs by 0.25 dB with respect to the AWGN channel-optimal 4D modulation format. △ Less

Submitted 31 August, 2022; v1 submitted 3 July, 2022; originally announced July 2022.

Comments: 14 pages, 10 figures, accepted by JLT

arXiv:2206.06214 [pdf, other]

Real-World Light Field Image Super-Resolution via Degradation Modulation

Authors: Yingqian Wang, Zhengyu Liang, Longguang Wang, Jungang Yang, Wei An, Yulan Guo

Abstract: Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradation. In this paper, we propose a simple yet effective method for real-w… ▽ More Recent years have witnessed the great advances of deep neural networks (DNNs) in light field (LF) image super-resolution (SR). However, existing DNN-based LF image SR methods are developed on a single fixed degradation (e.g., bicubic downsampling), and thus cannot be applied to super-resolve real LF images with diverse degradation. In this paper, we propose a simple yet effective method for real-world LF image SR. In our method, a practical LF degradation model is developed to formulate the degradation process of real LF images. Then, a convolutional neural network is designed to incorporate the degradation prior into the SR process. By training on LF images using our formulated degradation, our network can learn to modulate different degradation while incorporating both spatial and angular information in LF images. Extensive experiments on both synthetically degraded and real-world LF images demonstrate the effectiveness of our method. Compared with existing state-of-the-art single and LF image SR methods, our method achieves superior SR performance under a wide range of degradation, and generalizes better to real LF images. Codes and models are available at https://yingqianwang.github.io/LF-DMnet/. △ Less

Submitted 30 November, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

Comments: 15 pages, 10 figures

arXiv:2206.00866 [pdf, other]

Analytical SNR Prediction in Long-Haul Optical Transmission using General Dual-Polarization 4D Formats

Authors: Zhiwei Liang, Bin Chen, Yi Lei, Gabriele Liga, Alex Alvarado

Abstract: Nonlinear interference models for dual-polarization 4D (DP-4D) modulation have only been used so far to predict signal-signal nonlinear interference. We show that including the signal-noise term in the prediction of the effective signal-to-noise ratio in long distance DP-4D transmission improves the accuracy by up to 0.2 dB. Nonlinear interference models for dual-polarization 4D (DP-4D) modulation have only been used so far to predict signal-signal nonlinear interference. We show that including the signal-noise term in the prediction of the effective signal-to-noise ratio in long distance DP-4D transmission improves the accuracy by up to 0.2 dB. △ Less

Submitted 15 July, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

Comments: 4 pages

arXiv:2205.14029 [pdf]

Lesion classification by model-based feature extraction: A differential affine invariant model of soft tissue elasticity

Authors: Weiguo Cao, Marc J. Pomeroy, Zhengrong Liang, Yongfeng Gao, Yongyi Shi, Jiaxing Tan, Fangfang Han, **g Wang, Jianhua Ma, Hongbin Lu, Almas F. Abbasi, Perry J. Pickhardt

Abstract: The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomogra… ▽ More The elasticity of soft tissues has been widely considered as a characteristic property to differentiate between healthy and vicious tissues and, therefore, motivated several elasticity imaging modalities, such as Ultrasound Elastography, Magnetic Resonance Elastography, and Optical Coherence Elastography. This paper proposes an alternative approach of modeling the elasticity using Computed Tomography (CT) imaging modality for model-based feature extraction machine learning (ML) differentiation of lesions. The model describes a dynamic non-rigid (or elastic) deformation in differential manifold to mimic the soft tissues elasticity under wave fluctuation in vivo. Based on the model, three local deformation invariants are constructed by two tensors defined by the first and second order derivatives from the CT images and used to generate elastic feature maps after normalization via a novel signal suppression method. The model-based elastic image features are extracted from the feature maps and fed to machine learning to perform lesion classifications. Two pathologically proven image datasets of colon polyps (44 malignant and 43 benign) and lung nodules (46 malignant and 20 benign) were used to evaluate the proposed model-based lesion classification. The outcomes of this modeling approach reached the score of area under the curve of the receiver operating characteristics of 94.2 % for the polyps and 87.4 % for the nodules, resulting in an average gain of 5 % to 30 % over ten existing state-of-the-art lesion classification methods. The gains by modeling tissue elasticity for ML differentiation of lesions are striking, indicating the great potential of exploring the modeling strategy to other tissue properties for ML differentiation of lesions. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 12 pages, 4 figures, 3 tables

arXiv:2201.12806 [pdf, other]

Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

Authors: Minglun Han, Linhao Dong, Zhenlin Liang, Meng Cai, Shiyu Zhou, Zejun Ma, Bo Xu

Abstract: Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigati… ▽ More Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). In FineCoS, we introduce fine-grained knowledge to reduce the uncertainty of token predictions. Specifically, we first apply phrase selection to narrow the range of phrase candidates, and then conduct token attention on the tokens in the selected phrase candidates. Moreover, we re-normalize the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations, and inject position information to better discriminate phrases or tokens. On LibriSpeech and an in-house 160,000-hour dataset, we explore the proposed methods based on a controllable all-neural biasing method, collaborative decoding (ColDec). The proposed methods provide at most 6.1% relative word error rate reduction on LibriSpeech and 16.4% relative character error rate reduction on the in-house dataset over ColDec. △ Less

Submitted 2 March, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: Accepted by ICASSP 2022

arXiv:2112.14948 [pdf, other]

Data-Driven State Estimation for Light-Emitting Diode Underwater Optical Communication

Authors: Yingquan Li, Zhenwen Liang, Ibrahima N'Doye, Xiangliang Zhang, Mohamed-Slim Alouini, Taous-Meriem Laleg-Kirati

Abstract: Light-Emitting Diodes (LEDs) based underwater optical wireless communications (UOWCs), a technology with low latency and high data rates, have attracted significant importance for underwater robots. However, maintaining a controlled line of sight link between transmitter and receiver is challenging due to the constant movement of the underlying optical platform caused by the dynamic uncertainties… ▽ More Light-Emitting Diodes (LEDs) based underwater optical wireless communications (UOWCs), a technology with low latency and high data rates, have attracted significant importance for underwater robots. However, maintaining a controlled line of sight link between transmitter and receiver is challenging due to the constant movement of the underlying optical platform caused by the dynamic uncertainties of the LED model and vibration effects. Additionally, the alignment angle required for tracking is not directly measured and has to be estimated. Besides, the light scattering propagates beam pulse in water temporally, resulting in time-varying underwater optical links with interference. We address the state estimation problem by designing an LED communication system that provides the angular position and velocity information to overcome the challenges. In this way, we leverage the power of deep learning-based observer design to explore the LED communication's state space properly. Simulation results are presented to illustrate the performance of the data-driven LED state estimation. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Comments: 12 pages, 11 figures

arXiv:2111.11112 [pdf, other]

Data Sensing and Offloading in Edge Computing Networks: TDMA or NOMA?

Authors: Zezu Liang, Hanbiao Chen, Yuan Liu, Fangjiong Chen

Abstract: With the development of Internet-of-Things (IoT), we witness the explosive growth in the number of devices with sensing, computing, and communication capabilities, along with a large amount of raw data generated at the network edge. Mobile (multi-access) edge computing (MEC), acquiring and processing data at network edge (like base station (BS)) via wireless links, has emerged as a promising techn… ▽ More With the development of Internet-of-Things (IoT), we witness the explosive growth in the number of devices with sensing, computing, and communication capabilities, along with a large amount of raw data generated at the network edge. Mobile (multi-access) edge computing (MEC), acquiring and processing data at network edge (like base station (BS)) via wireless links, has emerged as a promising technique for real-time applications. In this paper, we consider the scenario that multiple devices sense then offload data to an edge server/BS, and the offloading throughput maximization problems are studied by joint radio-and-computation resource allocation, based on time-division multiple access (TDMA) and non-orthogonal multiple access (NOMA) multiuser computation offloading. Particularly, we take the sequence of TDMA-based multiuser transmission/offloading into account. The studied problems are NP-hard and non-convex. A set of low-complexity algorithms are designed based on decomposition approach and exploration of valuable insights of problems. They are either optimal or can achieve close-to-optimal performance as shown by simulation. The comprehensive simulation results show that the sequence optimized TDMA scheme achieves better throughput performance than the NOMA scheme, while the NOMA scheme is better under the assumptions of time-sharing strategy and the identical sensing capability of the devices. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: To appear in IEEE Transactions on Wireless Communications

arXiv:2110.02152 [pdf, other]

Operation-Adversarial Scenario Generation

Authors: Zhirui Liang, Robert Mieth, Yury Dvorkin

Abstract: This paper proposes a modified conditional generative adversarial network (cGAN) model to generate net load scenarios for power systems that are statistically credible, conditioned by given labels (e.g., seasons), and, at the same time, "stressful" to the system operations and dispatch decisions. The measure of stress used in this paper is based on the operating cost increases due to net load chan… ▽ More This paper proposes a modified conditional generative adversarial network (cGAN) model to generate net load scenarios for power systems that are statistically credible, conditioned by given labels (e.g., seasons), and, at the same time, "stressful" to the system operations and dispatch decisions. The measure of stress used in this paper is based on the operating cost increases due to net load changes. The proposed operation-adversarial cGAN (OA-cGAN) internalizes a DC optimal power flow model and seeks to maximize the operating cost and achieve a worst-case data generation. The training and testing stages employed in the proposed OA-cGAN use historical day-ahead net load forecast errors and has been implemented for the realistic NYISO 11-zone system. Our numerical experiments demonstrate that the generated operation-adversarial forecast errors lead to more cost-effective and reliable dispatch decisions. △ Less

Submitted 11 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Showing 1–50 of 70 results for author: Liang, Z