Search | arXiv e-print repository

arXiv:2406.09317 [pdf, other]

Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits superior performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top5 accuracy scores of 0.8430 for 15 fundus diseases and 0.7561 for 52 fundus diseases. For image retrieval, it achieves Top5 scores of 0.9500 and 0.8860 for the same disease sets, respectively. Clinical evaluations show that RetiZero's Top3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China and the United States. Furthermore, RetiZero significantly enhances clinicians' accuracy in diagnosing fundus disease. These findings underscore the value of integrating the RetiZero foundation model into clinical settings, where a variety of fundus diseases are encountered. △ Less

Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08782 [pdf, other]

Hybrid Spatial-spectral Neural Network for Hyperspectral Image Denoising

Authors: Hao Liang, Chengjie, Kun Li, Xin Tian

Abstract: Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid… ▽ More Hyperspectral image (HSI) denoising is an essential procedure for HSI applications. Unfortunately, the existing Transformer-based methods mainly focus on non-local modeling, neglecting the importance of locality in image denoising. Moreover, deep learning methods employ complex spectral learning mechanisms, thus introducing large computation costs. To address these problems, we propose a hybrid spatial-spectral denoising network (HSSD), in which we design a novel hybrid dual-path network inspired by CNN and Transformer characteristics, leading to capturing both local and non-local spatial details while suppressing noise efficiently. Furthermore, to reduce computational complexity, we adopt a simple but effective decoupling strategy that disentangles the learning of space and spectral channels, where multilayer perception with few parameters is utilized to learn the global correlations among spectra. The synthetic and real experiments demonstrate that our proposed method outperforms state-of-the-art methods on spatial and spectral reconstruction. The code and details are available on https://github.com/HLImg/HSSD. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.04685 [pdf, other]

Statistical QoS Provisioning Architecture for 6G Satellite-Terrestrial Integrated Networks

Authors: **gqing Wang, Wenchi Cheng, Wei Zhang, Hui Liang

Abstract: The emergence of massive ultra-reliable and low latency communications (mURLLC) as a category of time/reliability-sensitive service over 6G networks has received considerable research attention, which has presented unprecedented challenges. As one of the key enablers for 6G, satellite-terrestrial integrated networks (STIN) have been developed to offer more expansive connectivity and comprehensive… ▽ More The emergence of massive ultra-reliable and low latency communications (mURLLC) as a category of time/reliability-sensitive service over 6G networks has received considerable research attention, which has presented unprecedented challenges. As one of the key enablers for 6G, satellite-terrestrial integrated networks (STIN) have been developed to offer more expansive connectivity and comprehensive 3D coverage in space-aerial-terrestrial domains for supporting 6G mission-critical mURLLC applications while fulfilling diverse and rigorous quality of service (QoS) requirements. In the context of these mURLLC-driven satellite services, data freshness assumes paramount importance, as outdated data may engender unpredictable or catastrophic outcomes. To effectively measure data freshness in satellite-terrestrial integrated communications, age of information (AoI) has recently surfaced as a new dimension of QoS metric to support time-sensitive applications. It is crucial to design new analytical models that ensure stringent and diverse QoS metrics bounded by different key parameters, including AoI, delay, and reliability, over 6G satellite-terrestrial integrated networks. However, due to the complicated and dynamic nature of satellite-terrestrial integrated network environments, the research on efficiently defining new statistical QoS schemes while taking into account varying degrees of freedom has still been in their infancy. To remedy these deficiencies, in this paper we develop statistical QoS provisioning schemes over 6G satellite-terrestrial integrated networks in the finite blocklength regime. Particularly, we firstly introduce and review key technologies for supporting mURLLC. Secondly, we formulate a number of novel fundamental statistical-QoS metrics in the finite blocklength regime. Finally, we conduct a set of simulations to evaluate our developed statistical QoS schemes. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2404.11171 [pdf, other]

Personalized Heart Disease Detection via ECG Digital Twin Generation

Authors: Yaojun Hu, **tai Chen, Lianting Hu, Dantong Li, Jiahuan Yan, Haochao Ying, Huiying Liang, Jian Wu

Abstract: Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ dig… ▽ More Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. △ Less

Submitted 11 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2402.01380 [pdf, other]

Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization

Authors: Zhiyu Zhang, Guo Lu, Huanxiong Liang, Anni Tang, Qiang Hu, Li Song

Abstract: Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, R… ▽ More Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, ReRF separates the modeling from compression process, resulting in suboptimal compression efficiency. In contrast, in this paper, we propose a volumetric video compression method based on dynamic NeRF in a more compact manner. Specifically, we decompose the NeRF representation into the coefficient fields and the basis fields, incrementally updating the basis fields in the temporal domain to achieve dynamic modeling. Additionally, we perform end-to-end joint optimization on the modeling and compression process to further improve the compression efficiency. Extensive experiments demonstrate that our method achieves higher compression efficiency compared to ReRF on various datasets. △ Less

Submitted 2 February, 2024; originally announced February 2024.

arXiv:2402.00080 [pdf, ps, other]

Arithmetic Average Density Fusion -- Part IV: Distributed Heterogeneous Fusion of RFS and LRFS Filters via Variational Approximation

Authors: Tiancheng Li, Haozhe Liang, Guchong Li, Jesús García Herrero, Quan Pan

Abstract: This paper, the fourth part of a series of papers on the arithmetic average (AA) density fusion approach and its application for target tracking, addresses the intricate challenge of distributed heterogeneous multisensor multitarget tracking, where each inter-connected sensor operates a probability hypothesis density (PHD) filter, a multiple Bernoulli (MB) filter or a labeled MB (LMB) filter and t… ▽ More This paper, the fourth part of a series of papers on the arithmetic average (AA) density fusion approach and its application for target tracking, addresses the intricate challenge of distributed heterogeneous multisensor multitarget tracking, where each inter-connected sensor operates a probability hypothesis density (PHD) filter, a multiple Bernoulli (MB) filter or a labeled MB (LMB) filter and they cooperate with each other via information fusion. Earlier papers in this series have proven that the proper AA fusion of these filters is all exactly built on averaging their respective unlabeled/labeled PHDs. Based on this finding, two PHD-AA fusion approaches are proposed via variational minimization of the upper bound of the Kullback-Leibler divergence between the local and multi-filter averaged PHDs subject to cardinality consensus based on the Gaussian mixture implementation, enabling heterogeneous filter cooperation. One focuses solely on fitting the weights of the local Gaussian components (L-GCs), while the other simultaneously fits all the parameters of the L-GCs at each sensor, both seeking average consensus on the unlabeled PHD, irrespective of the specific posterior form of the local filters. For the distributed peer-to-peer communication, both the classic consensus and flooding paradigms have been investigated. Simulations have demonstrated the effectiveness and flexibility of the proposed approaches in both homogeneous and heterogeneous scenarios. △ Less

Submitted 30 January, 2024; originally announced February 2024.

Comments: 13 pages,14 figures

arXiv:2401.00225 [pdf]

Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform

Authors: Ting Zhu, Shufei Duan, Camille Dingam, Huizhi Liang, Wei Zhang

Abstract: Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast… ▽ More Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features. With the proposed algorithm, the fast Fourier transform of the dysarthria speech is first performed and then followed by EMD to get intrinsic mode functions (IMFs). After that, FWHT is used to output new coefficients and to extract statistical features based on IMFs, power spectral density, and enhanced gammatone frequency cepstral coefficients. To evaluate the proposed approach, we conducted experiments on two public pathological speech databases including UA Speech and TORGO. The results show that our algorithm performed better than traditional features in classification. We achieved improvements of 13.8% (UA Speech) and 3.84% (TORGO), respectively. Furthermore, the incorporation of an imbalanced classification algorithm to address data imbalance has resulted in a 12.18% increase in recognition accuracy. This algorithm effectively addresses the challenges of the imbalanced dataset and non-linearity in dysarthric speech and simultaneously provides a robust representation of the local pathological features of the vocal folds and tracts. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.16057 [pdf, other]

Semantic Importance-Aware Based for Multi-User Communication Over MIMO Fading Channels

Authors: Haotai Liang, Zhicheng Bao, Wannian An, Chen Dong, Xiaodong Xu

Abstract: Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO… ▽ More Semantic communication, as a novel communication paradigm, has attracted the interest of many scholars, with multi-user, multi-input multi-output (MIMO) scenarios being one of the critical contexts. This paper presents a semantic importance-aware based communication system (SIA-SC) over MIMO Rayleigh fading channels. Combining the semantic symbols' inequality and the equivalent subchannels of MIMO channels based on Singular Value Decomposition (SVD) maximizes the end-to-end semantic performance through the new layer map** method. For multi-user scenarios, a method of semantic interference cancellation is proposed. Furthermore, a new metric, namely semantic information distortion (SID), is established to unify the expressions of semantic performance, which is affected by channel bandwidth ratio (CBR) and signal-to-noise ratio (SNR). With the help of the proposed metric, we derived performance expressions and Semantic Outage Probability (SOP) of SIA-SC for Single-User Single-Input Single-Output (SU-SISO), Single-User MIMO (SU-MIMO), Multi-Users SISO (MU-MIMO) and Multi-Users MIMO (MU-MIMO) scenarios. Numerical experiments show that SIA-SC can significantly improve semantic performance across various scenarios. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.10051 [pdf, other]

Semantic Synchronization for Enhanced Reliability in Communication Systems

Authors: Xiaoyi Liu, Haotai Liang, Chen Dong, Xiaodong Xu

Abstract: As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication,… ▽ More As a new communication paradigm, semantic communication has received widespread attention in communication fields. However, since the decoding of semantic signals relies on contextual knowledge, misalignment between the starting position of the semantic signal and the AI-based semantic decoder would prevent source signal recovery and reconstruction. To achieve more precise semantic communication, this study proposes an image-based semantic synchronization method leveraging intrinsic semantic features of image content. Specifically, a shared synchronized image (SyncImg) is encoded into a synchronization vector header at the transmitter and sent to the receiver. The receiver adopts a sliding window semantic decoder combined with classification and template matching methods to locate the synchronization point. Experimental results demonstrate that compared with traditional methods, the proposed method achieves a lower miss detected ratio (MDR) and root-mean-square error (RMSE) under low signal-to-noise ratios, realizing accurate synchronization of semantic signals across different devices. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2312.08998 [pdf]

Design, construction and evaluation of emotional multimodal pathological speech database

Authors: Ting Zhu, Shufei Duan, Huizhi Liang, Wei Zhang

Abstract: The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, ang… ▽ More The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, angry and neutral emotions. All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program. The subjective analysis justifies from emotion discrimination accuracy, speech intelligibility, valence-arousal spatial distribution, and correlation between SCL-90 and disease severity. The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2309.12849 [pdf, other]

DeepOPF-U: A Unified Deep Neural Network to Solve AC Optimal Power Flow in Multiple Networks

Authors: Heng Liang, Changhong Zhao

Abstract: The traditional machine learning models to solve optimal power flow (OPF) are mostly trained for a given power network and lack generalizability to today's power networks with varying topologies and growing plug-and-play distributed energy resources (DERs). In this paper, we propose DeepOPF-U, which uses one unified deep neural network (DNN) to solve alternating-current (AC) OPF problems in differ… ▽ More The traditional machine learning models to solve optimal power flow (OPF) are mostly trained for a given power network and lack generalizability to today's power networks with varying topologies and growing plug-and-play distributed energy resources (DERs). In this paper, we propose DeepOPF-U, which uses one unified deep neural network (DNN) to solve alternating-current (AC) OPF problems in different power networks, including a set of power networks that is successively expanding. Specifically, we design elastic input and output layers for the vectors of given loads and OPF solutions with varying lengths in different networks. The proposed method, using a single unified DNN, can deal with different and growing numbers of buses, lines, loads, and DERs. Simulations of IEEE 57/118/300-bus test systems and a network growing from 73 to 118 buses verify the improved performance of DeepOPF-U compared to existing DNN-based solution methods. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Comments: 3 pages, 2 figures

arXiv:2308.16738 [pdf, other]

SFUSNet: A Spatial-Frequency domain-based Multi-branch Network for diagnosis of Cervical Lymph Node Lesions in Ultrasound Images

Authors: Yubiao Yue, Jun Xue, Haihua Liang, Bingchun Luo, Zhenzhang Li

Abstract: Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal… ▽ More Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named SFUSNet. SFUSNet not only discerns variances in ultrasound images from the spatial domain but also adeptly captures micro-structural alterations across various lesions in the frequency domain. To ascertain the potential of SFUSNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that SFUSNet is the state-of-the-art model and can achieve 92.89% accuracy. Moreover, its average precision, average sensitivity and average specificity for four types of lesions achieve 90.46%, 89.95% and 97.49%, respectively. △ Less

Submitted 4 October, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.14081

U-SEANNet: A Simple, Efficient and Applied U-Shaped Network for Diagnosis of Nasal Diseases on Nasal Endoscopic Images

Authors: Yubiao Yue, Jun Xue, Chao Wang, Haihua Liang, Zhenzhang Li

Abstract: Numerous studies have affirmed that deep learning models can facilitate early diagnosis of lesions in endoscopic images. However, the lack of available datasets stymies advancements in research on nasal endoscopy, and existing models fail to strike a good trade-off between model diagnosis performance, model complexity and parameters size, rendering them unsuitable for real-world application. To br… ▽ More Numerous studies have affirmed that deep learning models can facilitate early diagnosis of lesions in endoscopic images. However, the lack of available datasets stymies advancements in research on nasal endoscopy, and existing models fail to strike a good trade-off between model diagnosis performance, model complexity and parameters size, rendering them unsuitable for real-world application. To bridge these gaps, we created the first large-scale nasal endoscopy dataset, named 7-NasalEID, comprising 11,352 images that contain six common nasal diseases and normal samples. Subsequently, we proposed U-SEANNet, an innovative U-shaped architecture, underpinned by depth-wise separable convolution. Moreover, to enhance its capacity for detecting nuanced discrepancies in input images, U-SEANNet employs the Global-Local Channel Feature Fusion module, enabling it to utilize salient channel features from both global and local contexts. To demonstrate U-SEANNet's potential, we benchmarked U-SEANNet against seventeen modern architectures through five-fold cross-validation. The experimental results show that U-SEANNet achieves a commendable accuracy of 93.58%. Notably, U-SEANNet's parameters size and GFLOPs are only 0.78M and 0.21, respectively. Our findings suggest U-SEANNet is the state-of-the-art model for nasal diseases diagnosis in endoscopic images. △ Less

Submitted 11 February, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

Comments: There are some descriptive errors in the manuscript

arXiv:2308.04805 [pdf, other]

doi 10.1145/3581783.3613750

DiVa: An Iterative Framework to Harvest More Diverse and Valid Labels from User Comments for Music

Authors: Hongru Liang, **gyao Liu, Yuanxin Xiang, Jiachen Du, Lanjun Zhou, Shushen Pan, Wenqiang Lei

Abstract: Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an… ▽ More Towards sufficient music searching, it is vital to form a complete set of labels for each song. However, current solutions fail to resolve it as they cannot produce diverse enough map**s to make up for the information missed by the gold labels. Based on the observation that such missing information may already be presented in user comments, we propose to study the automated music labeling in an essential but under-explored setting, where the model is required to harvest more diverse and valid labels from the users' comments given limited gold labels. To this end, we design an iterative framework (DiVa) to harvest more $\underline{\text{Di}}$verse and $\underline{\text{Va}}$lid labels from user comments for music. The framework makes a classifier able to form complete sets of labels for songs via pseudo-labels inferred from pre-trained classifiers and a novel joint score function. The experiment on a densely annotated testing set reveals the superiority of the Diva over state-of-the-art solutions in producing more diverse labels missed by the gold labels. We hope our work can inspire future research on automated music labeling. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, published to ACM MM 2023

arXiv:2306.10772 [pdf, other]

Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming

Authors: Hao Liang, Guanxing Zhou, Xiaotong Tu, Andreas Jakobsson, Xinghao Ding, Yue Huang

Abstract: Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As a… ▽ More Recently, many forms of audio industrial applications, such as sound monitoring and source localization, have begun exploiting smart multi-modal devices equipped with a microphone array. Regrettably, model-based methods are often difficult to employ for such devices due to their high computational complexity, as well as the difficulty of appropriately selecting the user-determined parameters. As an alternative, one may use deep network-based methods, but these are often difficult to generalize, nor can they generate the desired beamforming map directly. In this paper, a computationally efficient acoustic beamforming algorithm is proposed, which may be unrolled to form a model-based deep learning network for real-time imaging, here termed the DAMAS-FISTA-Net. By exploiting the natural structure of an acoustic beamformer, the proposed network inherits the physical knowledge of the acoustic system, and thus learns the underlying physical properties of the propagation. As a result, all the network parameters may be learned end-to-end, guided by a model-based prior using back-propagation. Notably, the proposed network enables an excellent interpretability and the ability of being able to process the raw data directly. Extensive numerical experiments using both simulated and real-world data illustrate the preferable performance of the DAMAS-FISTA-Net as compared to alternative approaches. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 12 pages, 9 figures

arXiv:2305.00149 [pdf, other]

X-ray Recognition: Patient identification from X-rays using a contrastive objective

Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

Abstract: Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest po… ▽ More Recent research demonstrates that deep learning models are capable of precisely extracting bio-information (e.g. race, gender and age) from patients' Chest X-Rays (CXRs). In this paper, we further show that deep learning models are also surprisingly accurate at recognition, i.e., distinguishing CXRs belonging to the same patient from those belonging to different patients. These findings suggest potential privacy considerations that the medical imaging community should consider with the proliferation of large public CXR databases. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2305.00147 [pdf, other]

Visualizing chest X-ray dataset biases using GANs

Authors: Hao Liang, Kevin Ni, Guha Balakrishnan

Abstract: Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), t… ▽ More Recent work demonstrates that images from various chest X-ray datasets contain visual features that are strongly correlated with protected demographic attributes like race and gender. This finding raises issues of fairness, since some of these factors may be used by downstream algorithms for clinical predictions. In this work, we propose a framework, using generative adversarial networks (GANs), to visualize what features are most different between X-rays belonging to two demographic subgroups. △ Less

Submitted 5 September, 2023; v1 submitted 28 April, 2023; originally announced May 2023.

Comments: Medical Imaging with Deep Learning(MIDL) 2023

arXiv:2303.15206 [pdf, other]

Perceptual Quality Assessment of NeRF and Neural View Synthesis Methods for Front-Facing Views

Authors: Hanxue Liang, Tianhao Wu, Param Hanji, Francesco Banterle, Hongyun Gao, Rafal Mantiuk, Cengiz Oztireli

Abstract: Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS metho… ▽ More Neural view synthesis (NVS) is one of the most successful techniques for synthesizing free viewpoint videos, capable of achieving high fidelity from only a sparse set of captured images. This success has led to many variants of the techniques, each evaluated on a set of test views typically using image quality metrics such as PSNR, SSIM, or LPIPS. There has been a lack of research on how NVS methods perform with respect to perceived video quality. We present the first study on perceptual evaluation of NVS and NeRF variants. For this study, we collected two datasets of scenes captured in a controlled lab environment as well as in-the-wild. In contrast to existing datasets, these scenes come with reference video sequences, allowing us to test for temporal artifacts and subtle distortions that are easily overlooked when viewing only static images. We measured the quality of videos synthesized by several NVS methods in a well-controlled perceptual quality assessment experiment as well as with many existing state-of-the-art image/video quality metrics. We present a detailed analysis of the results and recommendations for dataset and metric selection for NVS evaluation. △ Less

Submitted 24 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2303.11692 [pdf, other]

ByteCover3: Accurate Cover Song Identification on Short Queries

Authors: Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma

Abstract: Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and w… ▽ More Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and waiting for an industrial-level solution. In this paper, we upgrade the previous ByteCover systems to ByteCover3 that utilizes local features to further improve the identification performance of short music queries. ByteCover3 is designed with a local alignment loss (LAL) module and a two-stage feature retrieval pipeline, allowing the system to perform CSI in a more precise and efficient way. We evaluated ByteCover3 on multiple datasets with different benchmark settings, where ByteCover3 beat all the compared methods including its previous versions. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: Accepeted by ICASSP 2023

arXiv:2303.06597 [pdf, ps, other]

doi 10.1109/TCCN.2023.3306852

Non-Orthogonal Multiple Access Enhanced Multi-User Semantic Communication

Authors: Weizhi Li, Haotai Liang, Chen Dong, Xiaodong Xu, ** Zhang, Kaijun Liu

Abstract: Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-base… ▽ More Semantic communication serves as a novel paradigm and attracts the broad interest of researchers. One critical aspect of it is the multi-user semantic communication theory, which can further promote its application to the practical network environment. While most existing works focused on the design of end-to-end single-user semantic transmission, a novel non-orthogonal multiple access (NOMA)-based multi-user semantic communication system named NOMASC is proposed in this paper. The proposed system can support semantic tranmission of multiple users with diverse modalities of source information. To avoid high demand for hardware, an asymmetric quantizer is employed at the end of the semantic encoder for discretizing the continuous full-resolution semantic feature. In addition, a neural network model is proposed for map** the discrete feature into self-learned symbols and accomplishing intelligent multi-user detection (MUD) at the receiver. Simulation results demonstrate that the proposed system holds good performance in non-orthogonal transmission of multiple user signals and outperforms the other methods, especially at low-to-medium SNRs. Moreover, it has high robustness under various simulation settings and mismatched test scenarios. △ Less

Submitted 20 November, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: accepted by IEEE Transactions on Cognitive Communications and Networking

arXiv:2303.01175 [pdf, ps, other]

A Field-Theoretic Approach to Unlabeled Sensing

Authors: Hao Liang, **gyu Lu, Manolis C. Tsakiris, Lihong Zhi

Abstract: We study the recent problem of unlabeled sensing from the information sciences in a field-theoretic framework. Our main result asserts that, for sufficiently generic data, the unique solution can be obtained by solving n + 1 polynomial equations in n unknowns. We study the recent problem of unlabeled sensing from the information sciences in a field-theoretic framework. Our main result asserts that, for sufficiently generic data, the unique solution can be obtained by solving n + 1 polynomial equations in n unknowns. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: 17 pages, 2 tables

arXiv:2301.03331 [pdf, other]

doi 10.1109/TPWRD.2023.3337274

A Specific Task-oriented Semantic Image Communication System for substation patrol inspection

Authors: Senran Fan, Haotai Liang, Chen Dong, Xiaodong Xu, Geng Liu

Abstract: Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To… ▽ More Intelligent inspection robots are widely used in substation patrol inspection, which can help check potential safety hazards by patrolling the substation and sending back scene images. However, when patrolling some marginal areas with weak signal, the scene images cannot be sucessfully transmissted to be used for hidden danger elimination, which greatly reduces the quality of robots'daily work. To solve such problem, a Specific Task-oriented Semantic Communication System for Imag-STSCI is designed, which involves the semantic features extraction, transmission, restoration and enhancement to get clearer images sent by intelligent robots under weak signals. Inspired by that only some specific details of the image are needed in such substation patrol inspection task, we proposed a new paradigm of semantic enhancement in such specific task to ensure the clarity of key semantic information when facing a lower bit rate or a low signal-to-noise ratio situation. Across the reality-based simulation, experiments show our STSCI can generally surpass traditional image-compression-based and channel-codingbased or other semantic communication system in the substation patrol inspection task with a lower bit rate even under a low signal-to-noise ratio situation. △ Less

Submitted 13 April, 2024; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: 9 pages, 8 figures

Journal ref: IEEE Transactions on Power Delivery; vol. 39; no. 2; pp. 835-844; April 2024

arXiv:2212.03093 [pdf]

Cooperative Guidance Strategy for Active Defense Spacecraft with Imperfect Information via Deep Reinforcement Learning

Authors: Li Zhi, Haizhao Liang, **ze Wu, Jianying Wang, Yu Zheng

Abstract: In this paper, an adaptive cooperative guidance strategy for the active protection of a target spacecraft trying to evade an interceptor was developed. The target spacecraft performs evasive maneuvers, launching an active defense vehicle to divert the interceptor. Instead of classical strategies, which are based on optimal control or differential game theory, the problem was solved by using the de… ▽ More In this paper, an adaptive cooperative guidance strategy for the active protection of a target spacecraft trying to evade an interceptor was developed. The target spacecraft performs evasive maneuvers, launching an active defense vehicle to divert the interceptor. Instead of classical strategies, which are based on optimal control or differential game theory, the problem was solved by using the deep reinforcement learning method, and imperfect information was assumed for the interceptor maneuverability. To address the sparse reward problem, a universal reward design method and an increasingly difficult training approach were presented utilizing the sha** technique. Guidance law, reward function, and training approach were demonstrated through the learning process and Monte Carlo simulations. The application of the non-sparse reward function and increasingly difficult training approach accelerated the model convergence, alleviating the overfitting problem. Considering a standard optimal guidance law as a benchmark, the effectiveness, and the advantages, that guarantee the target spacecraft's escape and win rates in a multi-agent game, of the proposed guidance strategy were validated by the simulation results. The trained agent's adaptiveness to the interceptor maneuverability was superior to the optimal guidance law. Moreover, compared to the standard optimal guidance law, the proposed guidance strategy performed better with less prior knowledge. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2211.02320 [pdf, other]

Aircraft Ground Taxiing Deduction and Conflict Early Warning Method Based on Control Command Information

Authors: **gchang Zhuge, Huiyuan Liang, Yiming Zhang, Shichao Li, Xinyu Yang, Jun Wu

Abstract: Aircraft taxiing conflict is a threat to the safety of airport operations, mainly due to the human error in control command infor-mation. In order to solve the problem, The aircraft taxiing deduction and conflict early warning method based on control order information is proposed. This method does not need additional equipment and operating costs, and is completely based on his-torical data and co… ▽ More Aircraft taxiing conflict is a threat to the safety of airport operations, mainly due to the human error in control command infor-mation. In order to solve the problem, The aircraft taxiing deduction and conflict early warning method based on control order information is proposed. This method does not need additional equipment and operating costs, and is completely based on his-torical data and control command information. When the aircraft taxiing command is given, the future route information will be deduced, and the probability of conflict with other taxiing aircraft will be calculated to achieve conflict detection and early warning of different levels. The method is validated by the aircraft taxi data from real airports. The results show that the method can effectively predict the aircraft taxiing process, and can provide early warning of possible conflicts. Due to the advantages of low cost and high accuracy, this method has the potential to be applied to airport operation decision support system. △ Less

Submitted 4 November, 2022; originally announced November 2022.

arXiv:2210.00621 [pdf, other]

Optimization for Robustness Evaluation beyond $\ell_p$ Metrics

Authors: Hengyue Liang, Buyun Liang, Ying Cui, Tim Mitchell, Ju Sun

Abstract: Empirical evaluation of deep learning models against adversarial attacks entails solving nontrivial constrained optimization problems. Popular algorithms for solving these constrained problems rely on projected gradient descent (PGD) and require careful tuning of multiple hyperparameters. Moreover, PGD can only handle $\ell_1$, $\ell_2$, and $\ell_\infty$ attack models due to the use of analytical… ▽ More Empirical evaluation of deep learning models against adversarial attacks entails solving nontrivial constrained optimization problems. Popular algorithms for solving these constrained problems rely on projected gradient descent (PGD) and require careful tuning of multiple hyperparameters. Moreover, PGD can only handle $\ell_1$, $\ell_2$, and $\ell_\infty$ attack models due to the use of analytical projectors. In this paper, we introduce a novel algorithmic framework that blends a general-purpose constrained-optimization solver PyGRANSO, With Constraint-Folding (PWCF), to add reliability and generality to robustness evaluation. PWCF 1) finds good-quality solutions without the need of delicate hyperparameter tuning, and 2) can handle general attack models, e.g., general $\ell_p$ ($p \geq 0$) and perceptual attacks, which are inaccessible to PGD-based algorithms. △ Less

Submitted 13 November, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: 5 pages, 1 figure, 3 tables, accepted by the 14th International OPT Workshop on Optimization for Machine Learning, and submitted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2203.16988 [pdf]

Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification

Authors: Guanxing Zhou, Hao Liang, Xinghao Ding, Yue Huang, Xiaotong Tu, Saqlain Abbas

Abstract: Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these met… ▽ More Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these methods with complex network structures are hard to be recognized by hardware. In this paper, a novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals. The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed, which may generalize well to real data. The code and trained models are available at https://github.com/JoaquinChou/Acoustic-Net. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2203.10674 [pdf, other]

RareGAN: Generating Samples for Rare Classes

Authors: Zinan Lin, Hao Liang, Giulia Fanti, Vyas Sekar

Abstract: We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g… ▽ More We study the problem of learning generative adversarial networks (GANs) for a rare class of an unlabeled dataset subject to a labeling budget. This problem is motivated from practical applications in domains including security (e.g., synthesizing packets for DNS amplification attacks), systems and networking (e.g., synthesizing workloads that trigger high resource usage), and machine learning (e.g., generating images from a rare class). Existing approaches are unsuitable, either requiring fully-labeled datasets or sacrificing the fidelity of the rare class for that of the common classes. We propose RareGAN, a novel synthesis of three key ideas: (1) extending conditional GANs to use labelled and unlabelled data for better generalization; (2) an active learning approach that requests the most useful labels; and (3) a weighted loss function to favor learning the rare class. We show that RareGAN achieves a better fidelity-diversity tradeoff on the rare class than prior work across different applications, budgets, rare class fractions, GAN losses, and architectures. △ Less

Submitted 20 March, 2022; originally announced March 2022.

Comments: Published in AAAI 2022

arXiv:2203.05087 [pdf, other]

False Data Injection Attack on Electric Vehicle-Assisted Voltage Regulation

Authors: Yuan Liu, Omid Ardakanian, Ioanis Nikolaidis, Hao Liang

Abstract: With the large scale penetration of electric vehicles (EVs) and the advent of bidirectional chargers, EV aggregators will become a major player in the voltage regulation market. This paper proposes a novel false data injection attack (FDIA) against the voltage regulation capacity estimation of EV charging stations, the process that underpins voltage regulation in distribution system. The proposed… ▽ More With the large scale penetration of electric vehicles (EVs) and the advent of bidirectional chargers, EV aggregators will become a major player in the voltage regulation market. This paper proposes a novel false data injection attack (FDIA) against the voltage regulation capacity estimation of EV charging stations, the process that underpins voltage regulation in distribution system. The proposed FDIA takes into account the uncertainty in EV mobility and network conditions. The attack vector with the largest expected adverse impact is the solution of a stochastic optimization problem subject to a constraint that ensures it can bypass bad data detection. We show that this attack vector can be determined by solving a sequence of convex quadratically constrained linear programs. The case studies examined in a co-simulation platform, based on two standard test feeders, reveal the vulnerability of the voltage regulation capacity estimation. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 10 pages

arXiv:2202.09595 [pdf, other]

Innovative semantic communication system

Authors: Chen Dong, Haotai Liang, Xiaodong Xu, Shujun Han, Bizhu Wang, ** Zhang

Abstract: Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm,… ▽ More Traditional communication systems focus on the transmission process, and the context-dependent meaning has been ignored. The fact that 5G system has approached Shannon limit and the increasing amount of data will cause communication bottleneck, such as the increased delay problems. Inspired by the ability of artificial intelligence to understand semantics, we propose a new communication paradigm, which integrates artificial intelligence and communication, the semantic communication system. Semantic communication is at the second level of communication based on Shannon and Weaver\cite{6197583}, which retains the semantic features of the transmitted information and recovers the signal at the receiver, thus compressing the communication traffic without losing important information. Different from other semantic communication systems, the proposed system not only transmits semantic information but also transmits semantic decoder. In addition, a general semantic metrics is proposed to measure the quality of semantic communication system. In particular, the semantic communication system for image, namely AESC-I, is designed to verify the feasibility of the new paradigm. Simulations are conducted on our system with the additive white Gaussian noise (AWGN) and the multipath fading channel using MNIST and Cifar10 datasets. The experimental results show that DeepSC-I can effectively extract semantic information and reconstruct images at a relatively low SNR. △ Less

Submitted 19 February, 2022; originally announced February 2022.

arXiv:2112.06074 [pdf, other]

Early Stop** for Deep Image Prior

Authors: Hengkang Wang, Taihui Li, Zhong Zhuang, Tiancong Chen, Hengyue Liang, Ju Sun

Abstract: Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the… ▽ More Deep image prior (DIP) and its variants have showed remarkable potential for solving inverse problems in computer vision, without any extra training data. Practical DIP models are often substantially overparameterized. During the fitting process, these models learn mostly the desired visual content first, and then pick up the potential modeling and observational noise, i.e., overfitting. Thus, the practicality of DIP often depends critically on good early stop** (ES) that captures the transition period. In this regard, the majority of DIP works for vision tasks only demonstrates the potential of the models -- reporting the peak performance against the ground truth, but provides no clue about how to operationally obtain near-peak performance without access to the groundtruth. In this paper, we set to break this practicality barrier of DIP, and propose an efficient ES strategy, which consistently detects near-peak performance across several vision tasks and DIP variants. Based on a simple measure of dispersion of consecutive DIP reconstructions, our ES method not only outpaces the existing ones -- which only work in very narrow domains, but also remains effective when combined with a number of methods that try to mitigate the overfitting. The code is available at https://github.com/sun-umn/Early_Stop**_for_DIP. △ Less

Submitted 11 December, 2023; v1 submitted 11 December, 2021; originally announced December 2021.

Comments: Published in TMLR (https://openreview.net/forum?id=231ZzrLC8X)

Journal ref: Transactions on Machine Learning Research (TMLR), 2835-8856 (12/2023)

arXiv:2112.05844 [pdf, other]

Economic MPC-based planning for marine vehicles: Tuning safety and energy efficiency

Authors: Haojiao Liang, Hui** Li, Jian Gao, Rongxin Cui, Demin Xu

Abstract: Energy efficiency and safety are two critical objectives for marine vehicles operating in environments with obstacles, and they generally conflict with each other. In this paper, we propose a novel online motion planning method of marine vehicles which can make trade-offs between the two design objectives based on the framework of economic model predictive control (EMPC). Firstly, the feasible tra… ▽ More Energy efficiency and safety are two critical objectives for marine vehicles operating in environments with obstacles, and they generally conflict with each other. In this paper, we propose a novel online motion planning method of marine vehicles which can make trade-offs between the two design objectives based on the framework of economic model predictive control (EMPC). Firstly, the feasible trajectory with the most safety margin is designed and utilized as tracking reference. Secondly, the EMPC-based receding horizon motion planning algorithm is designed, in which the practical consumed energy and safety measure (i.e., the distance between the planning trajectory and the reference) are considered. Experimental results verify the effectiveness and feasibility of the proposed method. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2110.12271 [pdf, other]

Self-Validation: Early Stop** for Single-Instance Deep Generative Priors

Authors: Taihui Li, Zhong Zhuang, Hengyue Liang, Le Peng, Hengkang Wang, Ju Sun

Abstract: Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled… ▽ More Recent works have shown the surprising effectiveness of deep generative models in solving numerous image reconstruction (IR) tasks, even without training data. We call these models, such as deep image prior and deep decoder, collectively as single-instance deep generative priors (SIDGPs). The successes, however, often hinge on appropriate early stop** (ES), which by far has largely been handled in an ad-hoc manner. In this paper, we propose the first principled method for ES when applying SIDGPs to IR, taking advantage of the typical bell trend of the reconstruction quality. In particular, our method is based on collaborative training and self-validation: the primal reconstruction process is monitored by a deep autoencoder, which is trained online with the historic reconstructed images and used to validate the reconstruction quality constantly. Experimentally, on several IR problems and different SIDGPs, our self-validation method is able to reliably detect near-peak performance and signal good ES points. Our code is available at https://sun-umn.github.io/Self-Validation/. △ Less

Submitted 23 October, 2021; originally announced October 2021.

Comments: To appear in British Machine Vision Conference (BMVC) 2021

arXiv:2107.07988 [pdf, other]

Controlled AutoEncoders to Generate Faces from Voices

Authors: Hao Liang, Lulan Yu, Guikang Xu, Bhiksha Raj, Rita Singh

Abstract: Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face… ▽ More Multiple studies in the past have shown that there is a strong correlation between human vocal characteristics and facial features. However, existing approaches generate faces simply from voice, without exploring the set of features that contribute to these observed correlations. A computational methodology to explore this can be devised by rephrasing the question to: "how much would a target face have to change in order to be perceived as the originator of a source voice?" With this in perspective, we propose a framework to morph a target face in response to a given voice in a way that facial features are implicitly guided by learned voice-face correlation in this paper. Our framework includes a guided autoencoder that converts one face to another, controlled by a unique model-conditioning component called a gating controller which modifies the reconstructed face based on input voice recordings. We evaluate the framework on VoxCelab and VGGFace datasets through human subjects and face retrieval. Various experiments demonstrate the effectiveness of our proposed model. △ Less

Submitted 16 July, 2021; originally announced July 2021.

arXiv:2106.12511 [pdf]

doi 10.1001/jamacardio.2021.6059

High-Throughput Precision Phenoty** of Left Ventricular Hypertrophy with Cardiovascular Deep Learning

Authors: Grant Duffy, Paul P Cheng, Neal Yuan, Bryan He, Alan C. Kwan, Matthew J. Shun-Shin, Kevin M. Alexander, Joseph Ebinger, Matthew P. Lungren, Florian Rader, David H. Liang, Ingela Schnittger, Euan A. Ashley, James Y. Zou, Jignesh Patel, Ronald Witteles, Susan Cheng, David Ouyang

Abstract: Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di… ▽ More Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.05152 [pdf, other]

Rethinking Transfer Learning for Medical Image Classification

Authors: Le Peng, Hengyue Liang, Gaoxiang Luo, Taihui Li, Ju Sun

Abstract: Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), whi… ▽ More Transfer learning (TL) from pretrained deep models is a standard practice in modern medical image classification (MIC). However, what levels of features to be reused are problem-dependent, and uniformly finetuning all layers of pretrained models may be suboptimal. This insight has partly motivated the recent differential TL strategies, such as TransFusion (TF) and layer-wise finetuning (LWFT), which treat the layers in the pretrained models differentially. In this paper, we add one more strategy into this family, called TruncatedTL, which reuses and finetunes appropriate bottom layers and directly discards the remaining layers. This yields not only superior MIC performance but also compact models for efficient inference, compared to other differential TL methods. Our code is available at: https://github.com/sun-umn/TTL △ Less

Submitted 26 May, 2024; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted by BMVC2023 (oral)

arXiv:2103.00345 [pdf, other]

End-to-end Uncertainty-based Mitigation of Adversarial Attacks to Automated Lane Centering

Authors: Ruochen Jiao, Hengyi Liang, Takami Sato, Junjie Shen, Qi Alfred Chen, Qi Zhu

Abstract: In the development of advanced driver-assistance systems (ADAS) and autonomous vehicles, machine learning techniques that are based on deep neural networks (DNNs) have been widely used for vehicle perception. These techniques offer significant improvement on average perception accuracy over traditional methods, however, have been shown to be susceptible to adversarial attacks, where small perturba… ▽ More In the development of advanced driver-assistance systems (ADAS) and autonomous vehicles, machine learning techniques that are based on deep neural networks (DNNs) have been widely used for vehicle perception. These techniques offer significant improvement on average perception accuracy over traditional methods, however, have been shown to be susceptible to adversarial attacks, where small perturbations in the input may cause significant errors in the perception results and lead to system failure. Most prior works addressing such adversarial attacks focus only on the sensing and perception modules. In this work, we propose an end-to-end approach that addresses the impact of adversarial attacks throughout perception, planning, and control modules. In particular, we choose a target ADAS application, the automated lane centering system in OpenPilot, quantify the perception uncertainty under adversarial attacks, and design a robust planning and control module accordingly based on the uncertainty analysis. We evaluate our proposed approach using both the public dataset and production-grade autonomous driving simulator. The experiment results demonstrate that our approach can effectively mitigate the impact of adversarial attacks and can achieve 55% to 90% improvement over the original OpenPilot. △ Less

Submitted 27 February, 2021; originally announced March 2021.

Comments: 8 pages for conference

arXiv:2012.09154 [pdf]

Exploration of Whether Skylight Polarization Patterns Contain Three-dimensional Attitude Information

Authors: Huaju Liang, Hongyang Bai, Tong Zhou

Abstract: Our previous work has demonstrated that Rayleigh model, which is widely used in polarized skylight navigation to describe skylight polarization patterns, does not contain three-dimensional (3D) attitude information [1]. However, it is still necessary to further explore whether the skylight polarization patterns contain 3D attitude information. So, in this paper, a social spider optimization (SSO)… ▽ More Our previous work has demonstrated that Rayleigh model, which is widely used in polarized skylight navigation to describe skylight polarization patterns, does not contain three-dimensional (3D) attitude information [1]. However, it is still necessary to further explore whether the skylight polarization patterns contain 3D attitude information. So, in this paper, a social spider optimization (SSO) method is proposed to estimate three Euler angles, which considers the difference of each pixel among polarization images based on template matching (TM) to make full use of the captured polarization information. In addition, to explore this problem, we not only use angle of polarization (AOP) and degree of polarization (DOP) information, but also the light intensity (LI) information. So, a sky model is established, which combines Berry model and Hosek model to fully describe AOP, DOP, and LI information in the sky, and considers the influence of four neutral points, ground albedo, atmospheric turbidity, and wavelength. The results of simulation show that the SSO algorithm can estimate 3D attitude and the established sky model contains 3D attitude information. However, when there are measurement noise or model error, the accuracy of 3D attitude estimation drops significantly. Especially in field experiment, it is very difficult to estimate 3D attitude. Finally, the results are discussed in detail. △ Less

Submitted 30 November, 2020; originally announced December 2020.

arXiv:2010.08091 [pdf, other]

doi 10.1145/3394171.3414032

PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music

Authors: Hongru Liang, Wenqiang Lei, Paul Yaozhu Chan, Zhenglu Yang, Maosong Sun, Tat-Seng Chua

Abstract: Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1)… ▽ More Definitive embeddings remain a fundamental challenge of computational musicology for symbolic music in deep learning today. Analogous to natural language, music can be modeled as a sequence of tokens. This motivates the majority of existing solutions to explore the utilization of word embedding models to build music embeddings. However, music differs from natural languages in two key aspects: (1) musical token is multi-faceted -- it comprises of pitch, rhythm and dynamics information; and (2) musical context is two-dimensional -- each musical token is dependent on both melodic and harmonic contexts. In this work, we provide a comprehensive solution by proposing a novel framework named PiRhDy that integrates pitch, rhythm, and dynamics information seamlessly. PiRhDy adopts a hierarchical strategy which can be decomposed into two steps: (1) token (i.e., note event) modeling, which separately represents pitch, rhythm, and dynamics and integrates them into a single token embedding; and (2) context modeling, which utilizes melodic and harmonic knowledge to train the token embedding. A thorough study was made on each component and sub-strategy of PiRhDy. We further validate our embeddings in three downstream tasks -- melody completion, accompaniment suggestion, and genre classification. Results indicate a significant advancement of the neural approach towards symbolic music as well as PiRhDy's potential as a pretrained tool for a broad range of symbolic music applications. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: ACM Multimedia 2020 -- best paper

arXiv:2008.06192 [pdf, other]

doi 10.1145/3400302.3415717

Leveraging Weakly-hard Constraints for Improving System Fault Tolerance with Functional and Timing Guarantees

Authors: Hengyi Liang, Zhilu Wang, Ruochen Jiao, Qi Zhu

Abstract: Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to their resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints,… ▽ More Many safety-critical real-time systems operate under harsh environment and are subject to soft errors caused by transient or intermittent faults. It is critical and yet often very challenging to apply fault tolerance techniques in these systems, due to their resource limitations and stringent constraints on timing and functionality. In this work, we leverage the concept of weakly-hard constraints, which allows task deadline misses in a bounded manner, to improve system's capability to accommodate fault tolerance techniques while ensuring timing and functional correctness. In particular, we 1) quantitatively measure control cost under different deadline hit/miss scenarios and identify weak-hard constraints that guarantee control stability, 2) employ typical worst-case analysis (TWCA) to bound the number of deadline misses and approximate system control cost, 3) develop an event-based simulation method to check the task execution pattern and evaluate system control cost for any given solution and 4) develop a meta-heuristic algorithm that consists of heuristic methods and a simulated annealing procedure to explore the design space. Our experiments on an industrial case study and a set of synthetic examples demonstrate the effectiveness of our approach. △ Less

Submitted 14 August, 2020; originally announced August 2020.

Comments: ICCAD 2020

arXiv:2007.12578 [pdf, other]

Stain Style Transfer of Histopathology Images Via Structure-Preserved Generative Learning

Authors: Hanwen Liang, Konstantinos N. Plataniotis, Xingyu Li

Abstract: Computational histopathology image diagnosis becomes increasingly popular and important, where images are segmented or classified for disease diagnosis by computers. While pathologists do not struggle with color variations in slides, computational solutions usually suffer from this critical issue. To address the issue of color variations in histopathology images, this study proposes two stain styl… ▽ More Computational histopathology image diagnosis becomes increasingly popular and important, where images are segmented or classified for disease diagnosis by computers. While pathologists do not struggle with color variations in slides, computational solutions usually suffer from this critical issue. To address the issue of color variations in histopathology images, this study proposes two stain style transfer models, SSIM-GAN and DSCSI-GAN, based on the generative adversarial networks. By cooperating structural preservation metrics and feedback of an auxiliary diagnosis net in learning, medical-relevant information presented by image texture, structure, and chroma-contrast features is preserved in color-normalized images. Particularly, the smart treat of chromatic image content in our DSCSI-GAN model helps to achieve noticeable normalization improvement in image regions where stains mix due to histological substances co-localization. Extensive experimentation on public histopathology image sets indicates that our methods outperform prior arts in terms of generating more stain-consistent images, better preserving histological information in images, and obtaining significantly higher learning efficiency. Our python implementation is published on https://github.com/hanwen0529/DSCSI-GAN. △ Less

Submitted 24 July, 2020; originally announced July 2020.

arXiv:2005.11842 [pdf, other]

Cross-Layer Design of Automotive Systems

Authors: Zhilu Wang, Hengyi Liang, Chao Huang, Qi Zhu

Abstract: With growing system complexity and closer cyber-physical interaction, there are increasingly stronger dependencies between different function and architecture layers in automotive systems. This paper first introduces several cross-layer approaches we developed in the past for holistically addressing multiple system layers in the design of individual vehicles and of connected vehicle applications;… ▽ More With growing system complexity and closer cyber-physical interaction, there are increasingly stronger dependencies between different function and architecture layers in automotive systems. This paper first introduces several cross-layer approaches we developed in the past for holistically addressing multiple system layers in the design of individual vehicles and of connected vehicle applications; and then presents a new methodology based on the weakly-hard paradigm for leveraging the scheduling flexibility in architecture layer to improve the system performance at function layer. The results of these works demonstrate the importance and effectiveness of cross-layer design for automotive systems. △ Less

Submitted 31 May, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

arXiv:2003.10689 [pdf]

Learning regularization and intensity-gradient-based fidelity for single image super resolution

Authors: Hu Liang, Shengrong Zhao

Abstract: How to extract more and useful information for single image super resolution is an imperative and difficult problem. Learning-based method is a representative method for such task. However, the results are not so stable as there may exist big difference between the training data and the test data. The regularization-based method can effectively utilize the self-information of observation. However,… ▽ More How to extract more and useful information for single image super resolution is an imperative and difficult problem. Learning-based method is a representative method for such task. However, the results are not so stable as there may exist big difference between the training data and the test data. The regularization-based method can effectively utilize the self-information of observation. However, the degradation model used in regularization-based method just considers the degradation in intensity space. It may not reconstruct images well as the degradation reflections in various feature space are not considered. In this paper, we first study the image degradation progress, and establish degradation model both in intensity and gradient space. Thus, a comprehensive data consistency constraint is established for the reconstruction. Consequently, more useful information can be extracted from the observed data. Second, the regularization term is learned by a designed symmetric residual deep neural-network. It can search similar external information from a predefined dataset avoiding the artificial tendency. Finally, the proposed fidelity term and designed regularization term are embedded into the regularization framework. Further, an optimization method is developed based on the half-quadratic splitting method and the pseudo conjugate method. Experimental results indicated that the subjective and the objective metric corresponding to the proposed method were better than those obtained by the comparison methods. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:2003.00342 [pdf, other]

doi 10.1109/IROS45743.2020.9340859

Robust Robotic Pouring using Audition and Haptics

Authors: Hongzhuo Liang, Chuangchuang Zhou, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Marcus Stoffel, Jianwei Zhang

Abstract: Robust and accurate estimation of liquid height lies as an essential part of pouring tasks for service robots. However, vision-based methods often fail in occluded conditions while audio-based methods cannot work well in a noisy environment. We instead propose a multimodal pouring network (MP-Net) that is able to robustly predict liquid height by conditioning on both audition and haptics input. MP… ▽ More Robust and accurate estimation of liquid height lies as an essential part of pouring tasks for service robots. However, vision-based methods often fail in occluded conditions while audio-based methods cannot work well in a noisy environment. We instead propose a multimodal pouring network (MP-Net) that is able to robustly predict liquid height by conditioning on both audition and haptics input. MP-Net is trained on a self-collected multimodal pouring dataset. This dataset contains 300 robot pouring recordings with audio and force/torque measurements for three types of target containers. We also augment the audio data by inserting robot noise. We evaluated MP-Net on our collected dataset and a wide variety of robot experiments. Both network training results and robot experiments demonstrate that MP-Net is robust against noise and changes to the task and environment. Moreover, we further combine the predicted height and force data to estimate the shape of the target container. △ Less

Submitted 14 October, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

Comments: accepted by IROS2020

Journal ref: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1903.00650 [pdf, other]

doi 10.1109/IROS40897.2019.8968303

Making Sense of Audio Vibration for Liquid Height Estimation in Robotic Pouring

Authors: Hongzhuo Liang, Shuang Li, Xiaojian Ma, Norman Hendrich, Timo Gerkmann, Fuchun Sun, Jianwei Zhang

Abstract: In this paper, we focus on the challenging perception problem in robotic pouring. Most of the existing approaches either leverage visual or haptic information. However, these techniques may suffer from poor generalization performances on opaque containers or concerning measuring precision. To tackle these drawbacks, we propose to make use of audio vibration sensing and design a deep neural network… ▽ More In this paper, we focus on the challenging perception problem in robotic pouring. Most of the existing approaches either leverage visual or haptic information. However, these techniques may suffer from poor generalization performances on opaque containers or concerning measuring precision. To tackle these drawbacks, we propose to make use of audio vibration sensing and design a deep neural network PouringNet to predict the liquid height from the audio fragment during the robotic pouring task. PouringNet is trained on our collected real-world pouring dataset with multimodal sensing data, which contains more than 3000 recordings of audio, force feedback, video and trajectory data of the human hand that performs the pouring task. Each record represents a complete pouring procedure. We conduct several evaluations on PouringNet with our dataset and robotic hardware. The results demonstrate that our PouringNet generalizes well across different liquid containers, positions of the audio receiver, initial liquid heights and types of liquid, and facilitates a more robust and accurate audio-based perception for robotic pouring. △ Less

Submitted 21 July, 2019; v1 submitted 2 March, 2019; originally announced March 2019.

Comments: Accepted to IROS 2019

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1806.01989 [pdf]

doi 10.1109/TNS.2019.2916973

Design of Voltage Pulse Control Module for Free Space Measurement-Device-Independent Quantum Key Distribution

Authors: Sijie Zhang, Nan Zhou, Fanshui Deng, Hao Liang

Abstract: Measurement-Device-Independent Quantum Key Distribution (MDIQKD) protocol has been proved that it is unaffected by all hacking attacks, and ensures the security of information theory even when the performance of single-photon detectors is not ideal. Fiber channel has been used by the previous MDIQKD experimental device. However, the signal attenuation increases exponentially as the transmission di… ▽ More Measurement-Device-Independent Quantum Key Distribution (MDIQKD) protocol has been proved that it is unaffected by all hacking attacks, and ensures the security of information theory even when the performance of single-photon detectors is not ideal. Fiber channel has been used by the previous MDIQKD experimental device. However, the signal attenuation increases exponentially as the transmission distance increases. In order to overcome this, we regard free space as the channel of signal transmission, and the signal attenuation increases square as the transmission distance increases (regardless of the atmospheric scattering), which can effectively reduce the signal attenuation trend. In order to implement the free space MDIQKD experiments, a modulation module is needed to modulate the wide pulse chop**, decoy-state, normalization, phase encoding and time encoding. In this paper, we present the design of the Voltage Pulse Control Module for the free space MDIQKD. △ Less

Submitted 20 June, 2018; v1 submitted 5 June, 2018; originally announced June 2018.

arXiv:1806.01490 [pdf]

Design of 32-channel TDC Based on Single FPGA for μSR Spectrometer at CSNS

Authors: Fanshui Deng, Hao Liang, Bangjiao Ye, **gyu Tang

Abstract: Muon Spin Rotation, Relaxation and Resonance (μSR) technology has an irreplaceable role in studying the microstructure and properties of materials, especially micro-magnetic properties. An experimental muon source is being built in China Spallation Neutron Source (CSNS) now. At the same time, a 128-channel μSR spectrometer as China's first μSR spectrometer is being developed. The time spectrum of… ▽ More Muon Spin Rotation, Relaxation and Resonance (μSR) technology has an irreplaceable role in studying the microstructure and properties of materials, especially micro-magnetic properties. An experimental muon source is being built in China Spallation Neutron Source (CSNS) now. At the same time, a 128-channel μSR spectrometer as China's first μSR spectrometer is being developed. The time spectrum of μSR can be obtained by fitting the curve of positron count rate with time. This paper presents a 32-channel Time-to-Digital Converter (TDC) implemented in a Xilinx Virtex-6 Field Programmable Gate Array (FPGA) for measuring the positron's flight time of μSR Spectrometer. Signal of each channel is sampled by 16 equidistant shifted-phase 200 MHz sampling clocks, so the TDC bin size is 312.5ps. The measuring range is up to 327us. This TDC has the ability to store multiple hit signals in a short time with a deep hit-buffer up to 512. Time tag is added to each data to record the moment when the data was detected. Programmable time window and channel shielding give the flexibility to choose the time range and channels of interest. The delay of each channel can be calibrated. The data is transmitted to data acquisition system (DAQ) through Gigabit Ethernet. TDC and control logic are configured in real time by DAQ. The results of test show that the Full Width at Half Maximum (FWHM) precision of single channel is better than 273 ps with a low sensitivity to temperature and the linearity is pretty well. △ Less

Submitted 5 June, 2018; originally announced June 2018.

Showing 1–46 of 46 results for author: Liang, H