Search | arXiv e-print repository

arXiv:2406.15160 [pdf, other]

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swap** (VPS) technique to extend an audio channel swap** (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: accepted by icme2024

arXiv:2406.02653 [pdf, other]

Pancreatic Tumor Segmentation as Anomaly Detection in CT Images Using Denoising Diffusion Models

Authors: Reza Babaei, Samuel Cheng, Theresa Thai, Shangqing Zhao

Abstract: Despite the advances in medicine, cancer has remained a formidable challenge. Particularly in the case of pancreatic tumors, characterized by their diversity and late diagnosis, early detection poses a significant challenge crucial for effective treatment. The advancement of deep learning techniques, particularly supervised algorithms, has significantly propelled pancreatic tumor detection in the… ▽ More Despite the advances in medicine, cancer has remained a formidable challenge. Particularly in the case of pancreatic tumors, characterized by their diversity and late diagnosis, early detection poses a significant challenge crucial for effective treatment. The advancement of deep learning techniques, particularly supervised algorithms, has significantly propelled pancreatic tumor detection in the medical field. However, supervised deep learning approaches necessitate extensive labeled medical images for training, yet acquiring such annotations is both limited and costly. Conversely, weakly supervised anomaly detection methods, requiring only image-level annotations, have garnered interest. Existing methodologies predominantly hinge on generative adversarial networks (GANs) or autoencoder models, which can pose complexity in training and, these models may face difficulties in accurately preserving fine image details. This research presents a novel approach to pancreatic tumor detection, employing weak supervision anomaly detection through denoising diffusion algorithms. By incorporating a deterministic iterative process of adding and removing noise along with classifier guidance, the method enables seamless translation of images between diseased and healthy subjects, resulting in detailed anomaly maps without requiring complex training protocols and segmentation masks. This study explores denoising diffusion models as a recent advancement over traditional generative models like GANs, contributing to the field of pancreatic tumor detection. Recognizing the low survival rates of pancreatic cancer, this study emphasizes the need for continued research to leverage diffusion models' efficiency in medical segmentation tasks. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.10145 [pdf]

Deep Koopman Operator-Informed Safety Command Governor for Autonomous Vehicles

Authors: Hao Chen, Xiangkun He, Shuo Cheng, Chen Lv

Abstract: Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the… ▽ More Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the infinite-dimensional Koopman operator, finite-dimensional kernel functions are selected as an approximation. Given its flexible structure and high accuracy, deep learning is initially employed to extract kernel functions from data and acquire a linear evolution dynamic of the autonomous vehicle in the lifted space. Additionally, the control barrier function (CBF) converts the state constraints to the constraints on the input to render safety property. Then, in terms of the lateral stability of the in-wheel motor driven vehicle, the CBF conditions are incorporated with the learned deep Koopman model. Because of the linear fashion of the deep Koopman model, the quadratic programming problem is formulated to generate the applied driving torque with minimal perturbation to the original driving torque as a safety command governor. In the end, to validate the fidelity of the deep Koopman model compared to other mainstream approaches and demonstrate the lateral improvement achieved by the proposed safety command governor, data collection and safety testing scenarios are conducted on a hardware-in-the-loop platform. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2404.08326 [pdf, other]

Quaternion-Based Attitude Stabilization Using Synergistic Hybrid Feedback With Minimal Potential Functions

Authors: Xin Tong, Qingpeng Ding, Haiyang Fang, Shing Shin Cheng

Abstract: This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body at… ▽ More This paper investigates the robust global attitude stabilization problem for a rigid-body system using quaternion-based feedback. We propose a novel synergistic hybrid feedback with the following notable features: (1) It demonstrates central synergism by utilizing a minimal number of potential functions; (2) It ensures consistency with respect to the unit quaternion representation of rigid-body attitude; (3) Its state-feedback laws incorporate a shared action term that steers the system toward the desired attitude. We demonstrate that the proposed hybrid feedback method effectively solves the problem at hand and guarantees robust uniform global asymptotic stability. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 14 pages, 6 figures, extended version of a paper accepted for publication in Automatica

arXiv:2401.15508 [pdf, other]

Proto-MPC: An Encoder-Prototype-Decoder Approach for Quadrotor Control in Challenging Winds

Authors: Yuliang Gu, Sheng Cheng, Naira Hovakimyan

Abstract: Quadrotors are increasingly used in the evolving field of aerial robotics for their agility and mechanical simplicity. However, inherent uncertainties, such as aerodynamic effects coupled with quadrotors' operation in dynamically changing environments, pose significant challenges for traditional, nominal model-based control designs. We propose a multi-task meta-learning method called Encoder-Proto… ▽ More Quadrotors are increasingly used in the evolving field of aerial robotics for their agility and mechanical simplicity. However, inherent uncertainties, such as aerodynamic effects coupled with quadrotors' operation in dynamically changing environments, pose significant challenges for traditional, nominal model-based control designs. We propose a multi-task meta-learning method called Encoder-Prototype-Decoder (EPD), which has the advantage of effectively balancing shared and distinctive representations across diverse training tasks. Subsequently, we integrate the EPD model into a model predictive control problem (Proto-MPC) to enhance the quadrotor's ability to adapt and operate across a spectrum of dynamically changing tasks with an efficient online implementation. We validate the proposed method in simulations, which demonstrates Proto-MPC's robust performance in trajectory tracking of a quadrotor being subject to static and spatially varying side winds. △ Less

Submitted 21 May, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

arXiv:2312.15190 [pdf, other]

SAIC: Integration of Speech Anonymization and Identity Classification

Authors: Ming Cheng, Xingjian Diao, Shitong Cheng, Wenjun Liu

Abstract: Speech anonymization and de-identification have garnered significant attention recently, especially in the healthcare area including telehealth consultations, patient voiceprint matching, and patient real-time monitoring. Speaker identity classification tasks, which involve recognizing specific speakers from audio to learn identity features, are crucial for de-identification. Since rare studies ha… ▽ More Speech anonymization and de-identification have garnered significant attention recently, especially in the healthcare area including telehealth consultations, patient voiceprint matching, and patient real-time monitoring. Speaker identity classification tasks, which involve recognizing specific speakers from audio to learn identity features, are crucial for de-identification. Since rare studies have effectively combined speech anonymization with identity classification, we propose SAIC - an innovative pipeline for integrating Speech Anonymization and Identity Classification. SAIC demonstrates remarkable performance and reaches state-of-the-art in the speaker identity classification task on the Voxceleb1 dataset, with a top-1 accuracy of 96.1%. Although SAIC is not trained or evaluated specifically on clinical data, the result strongly proves the model's effectiveness and the possibility to generalize into the healthcare area, providing insightful guidance for future work. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2312.13585 [pdf, other]

Speech Translation with Large Language Models: An Industrial Practice

Authors: Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

Abstract: Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long au… ▽ More Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST. Through rigorous experimentation on English and Chinese datasets, we showcase the exceptional performance of LLM-ST, establishing a new benchmark in the field of speech translation. Demo: https://speechtranslation.github.io/llm-st/. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Technical report. 13 pages. Demo: https://speechtranslation.github.io/llm-st/

arXiv:2312.12744 [pdf]

3D-CLMI: A Motor Imagery EEG Classification Model via Fusion of 3D-CNN and LSTM with Attention

Authors: Shiwei Cheng, Yuejiang Hao

Abstract: Due to the limitations in the accuracy and robustness of current electroencephalogram (EEG) classification algorithms, applying motor imagery (MI) for practical Brain-Computer Interface (BCI) applications remains challenging. This paper proposed a model that combined a three-dimensional convolutional neural network (CNN) with a long short-term memory (LSTM) network with attention to classify MI-EE… ▽ More Due to the limitations in the accuracy and robustness of current electroencephalogram (EEG) classification algorithms, applying motor imagery (MI) for practical Brain-Computer Interface (BCI) applications remains challenging. This paper proposed a model that combined a three-dimensional convolutional neural network (CNN) with a long short-term memory (LSTM) network with attention to classify MI-EEG signals. This model combined MI-EEG signals from different channels into three-dimensional features and extracted spatial features through convolution operations with multiple three-dimensional convolutional kernels of different scales. At the same time, to ensure the integrity of the extracted MI-EEG signal temporal features, the LSTM network was directly trained on the preprocessed raw signal. Finally, the features obtained from these two networks were combined and used for classification. Experimental results showed that this model achieved a classification accuracy of 92.7% and an F1-score of 0.91 on the public dataset BCI Competition IV dataset 2a, which were both higher than the state-of-the-art models in the field of MI tasks. Additionally, 12 participants were invited to complete a four-class MI task in our lab, and experiments on the collected dataset showed that the 3D-CLMI model also maintained the highest classification accuracy and F1-score. The model greatly improved the classification accuracy of users' motor imagery intentions, giving brain-computer interfaces better application prospects in emerging fields such as autonomous vehicles and medical rehabilitation. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11384 [pdf, ps, other]

DiffTune-MPC: Closed-Loop Learning for Model Predictive Control

Authors: Ran Tao, Sheng Cheng, Xiaofeng Wang, Shenlong Wang, Naira Hovakimyan

Abstract: Model predictive control (MPC) has been applied to many platforms in robotics and autonomous systems for its capability to predict a system's future behavior while incorporating constraints that a system may have. To enhance the performance of a system with an MPC controller, one can manually tune the MPC's cost function. However, it can be challenging due to the possibly high dimension of the par… ▽ More Model predictive control (MPC) has been applied to many platforms in robotics and autonomous systems for its capability to predict a system's future behavior while incorporating constraints that a system may have. To enhance the performance of a system with an MPC controller, one can manually tune the MPC's cost function. However, it can be challenging due to the possibly high dimension of the parameter space as well as the potential difference between the open-loop cost function in MPC and the overall closed-loop performance metric function. This paper presents DiffTune-MPC, a novel learning method, to learn the cost function of an MPC in a closed-loop manner. The proposed framework is compatible with the scenario where the time interval for performance evaluation and MPC's planning horizon have different lengths. We show the auxiliary problem whose solution admits the analytical gradients of MPC and discuss its variations in different MPC settings, including nonlinear MPCs that are solved using sequential quadratic programming. Simulation results demonstrate the learning capability of DiffTune-MPC and the generalization capability of the learned MPC parameters. △ Less

Submitted 30 March, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The first two authors contributed equally to this work

arXiv:2312.02937 [pdf, other]

doi 10.2514/6.2024-1167

Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing

Authors: Ayoosh Bansal, Yang Zhao, James Zhu, Sheng Cheng, Yuliang Gu, Hyung-** Yoon, Hunmin Kim, Naira Hovakimyan, Lui Sha

Abstract: Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou… ▽ More Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: To appear in AIAA SciTech 2024

ACM Class: C.3; C.4; J.7

Journal ref: AIAA SCITECH 2024 Forum, p. 1167

arXiv:2309.07925 [pdf, other]

doi 10.1145/3581783.3612859

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

arXiv:2309.07145 [pdf, other]

ETP: Learning Transferable ECG Representations via ECG-Text Pre-training

Authors: Che Liu, Zhongwei Wan, Sibo Cheng, Mi Zhang, Rossella Arcucci

Abstract: In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text… ▽ More In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: under review

arXiv:2309.04960 [pdf, other]

SdCT-GAN: Reconstructing CT from Biplanar X-Rays with Self-driven Generative Adversarial Networks

Authors: Shuangqin Cheng, Qingliang Chen, Qiyi Zhang, Ming Li, Yamuhanmode Alike, Kaile Su, Pengcheng Wen

Abstract: Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing met… ▽ More Computed Tomography (CT) is a medical imaging modality that can generate more informative 3D images than 2D X-rays. However, this advantage comes at the expense of more radiation exposure, higher costs, and longer acquisition time. Hence, the reconstruction of 3D CT images using a limited number of 2D X-rays has gained significant importance as an economical alternative. Nevertheless, existing methods primarily prioritize minimizing pixel/voxel-level intensity discrepancies, often neglecting the preservation of textural details in the synthesized images. This oversight directly impacts the quality of the reconstructed images and thus affects the clinical diagnosis. To address the deficits, this paper presents a new self-driven generative adversarial network model (SdCT-GAN), which is motivated to pay more attention to image details by introducing a novel auto-encoder structure in the discriminator. In addition, a Sobel Gradient Guider (SGG) idea is applied throughout the model, where the edge information from the 2D X-ray image at the input can be integrated. Moreover, LPIPS (Learned Perceptual Image Patch Similarity) evaluation metric is adopted that can quantitatively evaluate the fine contours and textures of reconstructed images better than the existing ones. Finally, the qualitative and quantitative results of the empirical studies justify the power of the proposed model compared to mainstream state-of-the-art baselines. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2308.02429 [pdf, other]

Nonconvex optimization for optimum retrieval of the transmission matrix of a multimode fiber

Authors: Shengfu Cheng, Xuyu Zhang, Tianting Zhong, Huanhao Li, Haoran Li, Lei Gong, Honglin Liu, Puxiang Lai

Abstract: Transmission matrix (TM) allows light control through complex media such as multimode fibers (MMFs), gaining great attention in areas like biophotonics over the past decade. The measurement of a complex-valued TM is highly desired as it supports full modulation of the light field, yet demanding as the holographic setup is usually entailed. Efforts have been taken to retrieve a TM directly from int… ▽ More Transmission matrix (TM) allows light control through complex media such as multimode fibers (MMFs), gaining great attention in areas like biophotonics over the past decade. The measurement of a complex-valued TM is highly desired as it supports full modulation of the light field, yet demanding as the holographic setup is usually entailed. Efforts have been taken to retrieve a TM directly from intensity measurements with several representative phase retrieval algorithms, which still see limitations like slow or suboptimum recovery, especially under noisy environment. Here, a modified non-convex optimization approach is proposed. Through numerical evaluations, it shows that the nonconvex method offers an optimum efficiency of focusing with less running time or sampling rate. The comparative test under different signal-to-noise levels further indicates its improved robustness for TM retrieval. Experimentally, the optimum retrieval of the TM of a MMF is collectively validated by multiple groups of single-spot and multi-spot focusing demonstrations. Focus scanning on the working plane of the MMF is also conducted where our method achieves 93.6% efficiency of the gold standard holography method when the sampling rate is 8. Based on the recovered TM, image transmission through the MMF with high fidelity can be realized via another phase retrieval. Thanks to parallel operation and GPU acceleration, the nonconvex approach can retrieve an 8685$\times$1024 TM (sampling rate=8) with 42.3 s on a regular computer. In brief, the proposed method provides optimum efficiency and fast implementation for TM retrieval, which will facilitate wide applications in deep-tissue optical imaging, manipulation and treatment. △ Less

Submitted 2 August, 2023; originally announced August 2023.

arXiv:2306.05234 [pdf, ps, other]

doi 10.1109/TAC.2023.3281341

Global Stabilization of Antipodal Points on n-Sphere with Application to Attitude Tracking

Authors: Xin Tong, Shing Shin Cheng

Abstract: Existing approaches to robust global asymptotic stabilization of a pair of antipodal points on unit $n$-sphere $\mathbb{S}^n$ typically involve the non-centrally synergistic hybrid controllers for attitude tracking on unit quaternion space. However, when switching faults occur due to parameter errors, the non-centrally synergistic property can lead to the unwinding problem or in some cases, destab… ▽ More Existing approaches to robust global asymptotic stabilization of a pair of antipodal points on unit $n$-sphere $\mathbb{S}^n$ typically involve the non-centrally synergistic hybrid controllers for attitude tracking on unit quaternion space. However, when switching faults occur due to parameter errors, the non-centrally synergistic property can lead to the unwinding problem or in some cases, destabilize the desired set. In this work, a hybrid controller is first proposed based on a novel centrally synergistic family of potential functions on $\mathbb{S}^n$, which is generated from a basic potential function through angular war**. The synergistic parameter can be explicitly expressed if the war** angle has a positive lower bound at the undesired critical points of the family. Next, the proposed approach induces a new quaternion-based controller for global attitude tracking. It has three advantageous features over existing synergistic designs: 1) it is consistent, i.e., free from the ambiguity of unit quaternion representation; 2) it is switching-fault-tolerant, i.e., the desired closed-loop equilibria remain asymptotically stable even when the switching mechanism does not work; 3) it relaxes the assumption on the parameter of the basic potential function in literature. Comprehensive simulation confirms the high robustness of the proposed centrally synergistic approach compared with existing non-centrally synergistic approaches. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 8 pages

arXiv:2305.06511 [pdf, other]

ParamNet: A Parameter-variable Network for Fast Stain Normalization

Authors: Hongtao Kang, Die Luo, Li Chen, Junbo Hu, Shenghua Cheng, Tingwei Quan, Shaoqun Zeng, Xiuli Liu

Abstract: In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images,… ▽ More In practice, digital pathology images are often affected by various factors, resulting in very large differences in color and brightness. Stain normalization can effectively reduce the differences in color and brightness of digital pathology images, thus improving the performance of computer-aided diagnostic systems. Conventional stain normalization methods rely on one or several reference images, but one or several images are difficult to represent the entire dataset. Although learning-based stain normalization methods are a general approach, they use complex deep networks, which not only greatly reduce computational efficiency, but also risk introducing artifacts. StainNet is a fast and robust stain normalization network, but it has not a sufficient capability for complex stain normalization due to its too simple network structure. In this study, we proposed a parameter-variable stain normalization network, ParamNet. ParamNet contains a parameter prediction sub-network and a color map** sub-network, where the parameter prediction sub-network can automatically determine the appropriate parameters for the color map** sub-network according to each input image. The feature of parameter variable ensures that our network has a sufficient capability for various stain normalization tasks. The color map** sub-network is a fully 1x1 convolutional network with a total of 59 variable parameters, which allows our network to be extremely computationally efficient and does not introduce artifacts. The results on cytopathology and histopathology datasets show that our ParamNet outperforms state-of-the-art methods and can effectively improve the generalization of classifiers on pathology diagnosis tasks. The code has been available at https://github.com/khtao/ParamNet. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.02596 [pdf]

A Soft Coordination Method of Heterogeneous Devices in Distribution System Voltage Control

Authors: Licheng Wang, Tao Wang, Gang Huang, Ruifeng Yan, Kai Wang, Youbing Zhang, Shijie Cheng

Abstract: With the continuous increase of photovoltaic (PV) penetration, the voltage control interactions between newly installed PV inverters and previously deployed on-load tap-changer (OLTC) transformers become ever more significant. To achieve coordinated voltage regulation, current methods often rely on a decision-making algorithm to fully take over the control of all devices, requiring OLTC to give up… ▽ More With the continuous increase of photovoltaic (PV) penetration, the voltage control interactions between newly installed PV inverters and previously deployed on-load tap-changer (OLTC) transformers become ever more significant. To achieve coordinated voltage regulation, current methods often rely on a decision-making algorithm to fully take over the control of all devices, requiring OLTC to give up its existing tap switching logic and execute corresponding upgrades. Aiming at bridging this gap, a soft coordination framework is proposed in this paper. Specifically, the decision-making commands are only applied on inverters, and OLTC that retains its own operation rule will be indirectly controlled by the changed system voltage, which is a result of appropriately adjusting inverters' Var output. The proposed method achieves the soft coordination by establishing a modified actor-critic algorithm to train a proxy model of inverters. The well-trained proxy model can properly adjust inverters' Var output to "softly" coordinate OLTC's tap operations, which finally attains coordinated voltage regulation and line loss minimization. Simulation results verify the superiority of our proposed method over traditional ones in coordinating heterogeneous devices for voltage control. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2303.15733 [pdf, ps, other]

doi 10.1016/j.automatica.2023.111070

Synergistic Potential Functions from Single Modified Trace Function on SO(3)

Authors: Xin Tong, Shing Shin Cheng

Abstract: This paper is about the construction of a family of centrally synergistic potential functions from a single modified trace function on SO(3). First, we demonstrate that it is possible to complete the construction through angular war** with multiple directions, particularly effective in the unresolved cases in the literature. Second, it can be shown that for each potential function in the family,… ▽ More This paper is about the construction of a family of centrally synergistic potential functions from a single modified trace function on SO(3). First, we demonstrate that it is possible to complete the construction through angular war** with multiple directions, particularly effective in the unresolved cases in the literature. Second, it can be shown that for each potential function in the family, there exists a subset of the family such that the synergistic gap is positive at the unwanted critical points. This allows the switching condition to be checked within the selected subsets while implementing synergistic hybrid control. Furthermore, the positive lower bound of synergistic gap is explicitly expressed by selecting a traditional war** angle function. Finally, we apply the proposed synergistic potential functions to obtain robust global attitude tracking. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: Extended version of the paper accepted for publication in Automatica

arXiv:2303.13819 [pdf, other]

Verification of $L_1$ Adaptive Control using Verse Library: A Case Study of Quadrotors

Authors: Lin Song, Yangge Li, Sheng Cheng, Pan Zhao, Sayan Mitra, Naira Hovakimyan

Abstract: $L_1$ adaptive control ($L_1$AC) is a control design technique that can handle a broad class of system uncertainties and provide transient performance guarantees. In this work-in-progress abstract, we discuss how existing formal verification tools can be applied to check the performance of $L_1$AC systems. We show that the theoretical transient performance and robustness guarantees of an $L_1… ▽ More $L_1$ adaptive control ($L_1$AC) is a control design technique that can handle a broad class of system uncertainties and provide transient performance guarantees. In this work-in-progress abstract, we discuss how existing formal verification tools can be applied to check the performance of $L_1$AC systems. We show that the theoretical transient performance and robustness guarantees of an $L_1$ adaptive controller for an 18-dimensional quadrotor system can be verified using the recently developed Verse reachability analysis tool. We will further consider the performance verification of $L_1$AC on systems with learning-enabled components. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: accepted to ICCPS-wip 2023

arXiv:2303.10160 [pdf, other]

Visual Information Matters for ASR Error Correction

Authors: Vanya Bannihatti Kumar, Shanbo Cheng, Ningxin Peng, Yuchen Zhang

Abstract: Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus on using text or/ and speech data, which hinders the performance gain when not only text and speech information, but other modalities, such as visual information… ▽ More Aiming to improve the Automatic Speech Recognition (ASR) outputs with a post-processing step, ASR error correction (EC) techniques have been widely developed due to their efficiency in using parallel text data. Previous works mainly focus on using text or/ and speech data, which hinders the performance gain when not only text and speech information, but other modalities, such as visual information are critical for EC. The challenges are mainly two folds: one is that previous work fails to emphasize visual information, thus rare exploration has been studied. The other is that the community lacks a high-quality benchmark where visual information matters for the EC models. Therefore, this paper provides 1) simple yet effective methods, namely gated fusion and image captions as prompts to incorporate visual information to help EC; 2) large-scale benchmark datasets, namely Visual-ASR-EC, where each item in the training data consists of visual, speech, and text information, and the test data are carefully selected by human annotators to ensure that even humans could make mistakes when visual information is missing. Experimental results show that using captions as prompts could effectively use the visual information and surpass state-of-the-art methods by upto 1.2% in Word Error Rate(WER), which also indicates that visual information is critical in our proposed Visual-ASR-EC dataset △ Less

Submitted 26 May, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted at ICASSP 2023

arXiv:2303.06207 [pdf, other]

A New Super-Resolution Measurement of Perceptual Quality and Fidelity

Authors: Sheng Cheng

Abstract: Super-resolution results are usually measured by full-reference image quality metrics or human rating scores. However, these evaluation methods are general image quality measurement, and do not account for the nature of the super-resolution problem. In this work, we analyze the evaluation problem based on the one-to-many map** nature of super-resolution, and propose a novel distribution-based me… ▽ More Super-resolution results are usually measured by full-reference image quality metrics or human rating scores. However, these evaluation methods are general image quality measurement, and do not account for the nature of the super-resolution problem. In this work, we analyze the evaluation problem based on the one-to-many map** nature of super-resolution, and propose a novel distribution-based metric for super-resolution. Starting from the distribution distance, we derive the proposed metric to make it accessible and easy to compute. Through a human subject study on super-resolution, we show that the proposed metric is highly correlated with the human perceptual quality, and better than most existing metrics. Moreover, the proposed metric has a higher correlation with the fidelity measure compared to the perception-based metrics. To understand the properties of the proposed metric, we conduct extensive evaluation in terms of its design choices, and show that the metric is robust to its design choices. Finally, we show that the metric can be used to train super-resolution networks for better perceptual quality. △ Less

Submitted 10 March, 2023; originally announced March 2023.

arXiv:2302.07341 [pdf, other]

Cooperative Perception for Safe Control of Autonomous Vehicles under LiDAR Spoofing Attacks

Authors: Hongchao Zhang, Zhouchi Li, Shiyu Cheng, Andrew Clark

Abstract: Autonomous vehicles rely on LiDAR sensors to detect obstacles such as pedestrians, other vehicles, and fixed infrastructures. LiDAR spoofing attacks have been demonstrated that either create erroneous obstacles or prevent detection of real obstacles, resulting in unsafe driving behaviors. In this paper, we propose an approach to detect and mitigate LiDAR spoofing attacks by leveraging LiDAR scan d… ▽ More Autonomous vehicles rely on LiDAR sensors to detect obstacles such as pedestrians, other vehicles, and fixed infrastructures. LiDAR spoofing attacks have been demonstrated that either create erroneous obstacles or prevent detection of real obstacles, resulting in unsafe driving behaviors. In this paper, we propose an approach to detect and mitigate LiDAR spoofing attacks by leveraging LiDAR scan data from other neighboring vehicles. This approach exploits the fact that spoofing attacks can typically only be mounted on one vehicle at a time, and introduce additional points into the victim's scan that can be readily detected by comparison from other, non-modified scans. We develop a Fault Detection, Identification, and Isolation procedure that identifies non-existing obstacle, physical removal, and adversarial object attacks, while also estimating the actual locations of obstacles. We propose a control algorithm that guarantees that these estimated object locations are avoided. We validate our framework using a CARLA simulation study, in which we verify that our FDII algorithm correctly detects each attack pattern. △ Less

Submitted 14 February, 2023; originally announced February 2023.

arXiv:2302.07208 [pdf, other]

$\mathcal{L}_1$Quad: $\mathcal{L}_1$ Adaptive Augmentation of Geometric Control for Agile Quadrotors with Performance Guarantees

Authors: Zhuohuan Wu, Sheng Cheng, Pan Zhao, Aditya Gahlawat, Kasey A. Ackerman, Arun Lakshmanan, Chengyu Yang, Jiahao Yu, Naira Hovakimyan

Abstract: Quadrotors that can operate safely in the presence of imperfect model knowledge and external disturbances are crucial in safety-critical applications. We present L1Quad, a control architecture for quadrotors based on the L1 adaptive control. L1Quad enables safe tubes centered around a desired trajectory that the quadrotor is always guaranteed to remain inside. Our design applies to both the rotati… ▽ More Quadrotors that can operate safely in the presence of imperfect model knowledge and external disturbances are crucial in safety-critical applications. We present L1Quad, a control architecture for quadrotors based on the L1 adaptive control. L1Quad enables safe tubes centered around a desired trajectory that the quadrotor is always guaranteed to remain inside. Our design applies to both the rotational and the translational dynamics of the quadrotor. We lump various types of uncertainties and disturbances as unknown nonlinear (time- and state-dependent) forces and moments. Without assuming or enforcing parametric structures, L1Quad can accurately estimate and compensate for these unknown forces and moments. Extensive experimental results demonstrate that L1Quad is able to significantly outperform baseline controllers under a variety of uncertainties with consistently small tracking errors. △ Less

Submitted 14 February, 2023; originally announced February 2023.

Comments: The first two authors contributed equally to this work

arXiv:2301.10171 [pdf, other]

Spectral Cross-Domain Neural Network with Soft-adaptive Threshold Spectral Enhancement

Authors: Che Liu, Sibo Cheng, Wei** Ding, Rossella Arcucci

Abstract: Electrocardiography (ECG) signals can be considered as multi-variable time-series. The state-of-the-art ECG data classification approaches, based on either feature engineering or deep learning techniques, treat separately spectral and time domains in machine learning systems. No spectral-time domain communication mechanism inside the classifier model can be found in current approaches, leading to… ▽ More Electrocardiography (ECG) signals can be considered as multi-variable time-series. The state-of-the-art ECG data classification approaches, based on either feature engineering or deep learning techniques, treat separately spectral and time domains in machine learning systems. No spectral-time domain communication mechanism inside the classifier model can be found in current approaches, leading to difficulties in identifying complex ECG forms. In this paper, we proposed a novel deep learning model named Spectral Cross-domain neural network (SCDNN) with a new block called Soft-adaptive threshold spectral enhancement (SATSE), to simultaneously reveal the key information embedded in spectral and time domains inside the neural network. More precisely, the domain-cross information is captured by a general Convolutional neural network (CNN) backbone, and different information sources are merged by a self-adaptive mechanism to mine the connection between time and spectral domains. In SATSE, the knowledge from time and spectral domains is extracted via the Fast Fourier Transformation (FFT) with soft trainable thresholds in modified Sigmoid functions. The proposed SCDNN is tested with several classification tasks implemented on the public ECG databases \textit{PTB-XL} and \textit{MIT-BIH}. SCDNN outperforms the state-of-the-art approaches with a low computational cost regarding a variety of metrics in all classification tasks on both databases, by finding appropriate domains from the infinite spectral map**. The convergence of the trainable thresholds in the spectral domain is also numerically investigated in this paper. The robust performance of SCDNN provides a new perspective to exploit knowledge across deep learning models from time and spectral domains. The repository can be found: https://github.com/DL-WG/SCDNN-TS △ Less

Submitted 9 November, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2211.15902 [pdf, ps, other]

Simultaneous Spatial and Temporal Assignment for Fast UAV Trajectory Optimization using Bilevel Optimization

Authors: Qianzhong Chen, Sheng Cheng, Naira Hovakimyan

Abstract: In this paper, we propose a framework for fast trajectory planning for unmanned aerial vehicles (UAVs). Our framework is reformulated from an existing bilevel optimization, in which the lower-level problem solves for the optimal trajectory with a fixed time allocation, whereas the upper-level problem updates the time allocation using analytical gradients. The lower-level problem incorporates the s… ▽ More In this paper, we propose a framework for fast trajectory planning for unmanned aerial vehicles (UAVs). Our framework is reformulated from an existing bilevel optimization, in which the lower-level problem solves for the optimal trajectory with a fixed time allocation, whereas the upper-level problem updates the time allocation using analytical gradients. The lower-level problem incorporates the safety-set constraints (in the form of inequality constraints) and is cast as a convex quadratic program (QP). Our formulation modifies the lower-level QP by excluding the inequality constraints for the safety sets, which significantly reduces the computation time. The safety-set constraints are moved to the upper-level problem, where the feasible waypoints are updated together with the time allocation using analytical gradients enabled by the OptNet. We validate our approach in simulations, where our method's computation time scales linearly with respect to the number of safety sets, in contrast to the state-of-the-art that scales exponentially. △ Less

Submitted 13 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: accepted by IEEE RA-L

arXiv:2210.10879 [pdf, other]

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Authors: Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park

Abstract: Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as… ▽ More Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies. △ Less

Submitted 24 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 6 pages, accepted at SLT 2022. Updated with copyright

arXiv:2209.10024 [pdf, other]

Geometric Tracking Control of Omnidirectional Multirotors in the Presence of Rotor Dynamics

Authors: Hyungyu Lee, Sheng Cheng, Zhuohuan Wu, Naira Hovakimyan

Abstract: An omnidirectional multirotor has the advantageous maneuverability of decoupled translational and rotational motions, drastically superseding the traditional multirotors' motion capability. Such maneuverability requires an omnidirectional multirotor to frequently alter the thrust amplitude and even direction, which is prone to the rotors' settling time induced from the rotors' own dynamics. Furthe… ▽ More An omnidirectional multirotor has the advantageous maneuverability of decoupled translational and rotational motions, drastically superseding the traditional multirotors' motion capability. Such maneuverability requires an omnidirectional multirotor to frequently alter the thrust amplitude and even direction, which is prone to the rotors' settling time induced from the rotors' own dynamics. Furthermore, the omnidirectional multirotor's stability for tracking control in the presence of rotor dynamics has not yet been addressed. To resolve this issue, we propose a geometric tracking controller that takes the rotor dynamics into account. We show that the proposed controller yields the zero equilibrium of the error dynamics almost globally exponentially stable. The controller's tracking performance and stability are verified in simulations. Furthermore, the single-axis force experiment with the omnidirectional multirotor has been performed to confirm the proposed controller's performance in mitigating the rotors' settling time in the real world. △ Less

Submitted 20 September, 2022; originally announced September 2022.

arXiv:2208.05944 [pdf, ps, other]

Barrier Certificate based Safe Control for LiDAR-based Systems under Sensor Faults and Attacks

Authors: Hongchao Zhang, Shiyu Cheng, Luyao Niu, Andrew Clark

Abstract: Autonomous Cyber-Physical Systems (CPS) fuse proprioceptive sensors such as GPS and exteroceptive sensors including Light Detection and Ranging (LiDAR) and cameras for state estimation and environmental observation. It has been shown that both types of sensors can be compromised by malicious attacks, leading to unacceptable safety violations. We study the problem of safety-critical control of a Li… ▽ More Autonomous Cyber-Physical Systems (CPS) fuse proprioceptive sensors such as GPS and exteroceptive sensors including Light Detection and Ranging (LiDAR) and cameras for state estimation and environmental observation. It has been shown that both types of sensors can be compromised by malicious attacks, leading to unacceptable safety violations. We study the problem of safety-critical control of a LiDAR-based system under sensor faults and attacks. We propose a framework consisting of fault tolerant estimation and fault tolerant control. The former reconstructs a LiDAR scan with state estimations, and excludes the possible faulty estimations that are not aligned with LiDAR measurements. We also verify the correctness of LiDAR scans by comparing them with the reconstructed ones and removing the possibly compromised sector in the scan. Fault tolerant control computes a control signal with the remaining estimations at each time step. We prove that the synthesized control input guarantees system safety using control barrier certificates. We validate our proposed framework using a UAV delivery system in an urban environment. We show that our proposed approach guarantees safety for the UAV whereas a baseline fails. △ Less

Submitted 11 August, 2022; originally announced August 2022.

arXiv:2205.05675 [pdf, other]

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

arXiv:2205.03242 [pdf]

Electrocardiographic Deep Learning for Predicting Post-Procedural Mortality

Authors: David Ouyang, John Theurer, Nathan R. Stein, J. Weston Hughes, Pierre Elias, Bryan He, Neal Yuan, Grant Duffy, Roopinder K. Sandhu, Joseph Ebinger, Patrick Botting, Melvin Jujjavarapu, Brian Claggett, James E. Tooley, Tim Poterucha, Jonathan H. Chen, Michael Nurok, Marco Perez, Adler Perotte, James Y. Zou, Nancy R. Cook, Sumeet S. Chugh, Susan Cheng, Christine M. Albert

Abstract: Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was… ▽ More Background. Pre-operative risk assessments used in clinical practice are limited in their ability to identify risk for post-operative mortality. We hypothesize that electrocardiograms contain hidden risk markers that can help prognosticate post-operative mortality. Methods. In a derivation cohort of 45,969 pre-operative patients (age 59+- 19 years, 55 percent women), a deep learning algorithm was developed to leverage waveform signals from pre-operative ECGs to discriminate post-operative mortality. Model performance was assessed in a holdout internal test dataset and in two external hospital cohorts and compared with the Revised Cardiac Risk Index (RCRI) score. Results. In the derivation cohort, there were 1,452 deaths. The algorithm discriminates mortality with an AUC of 0.83 (95% CI 0.79-0.87) surpassing the discrimination of the RCRI score with an AUC of 0.67 (CI 0.61-0.72) in the held out test cohort. Patients determined to be high risk by the deep learning model's risk prediction had an unadjusted odds ratio (OR) of 8.83 (5.57-13.20) for post-operative mortality as compared to an unadjusted OR of 2.08 (CI 0.77-3.50) for post-operative mortality for RCRI greater than 2. The deep learning algorithm performed similarly for patients undergoing cardiac surgery with an AUC of 0.85 (CI 0.77-0.92), non-cardiac surgery with an AUC of 0.83 (0.79-0.88), and catherization or endoscopy suite procedures with an AUC of 0.76 (0.72-0.81). The algorithm similarly discriminated risk for mortality in two separate external validation cohorts from independent healthcare systems with AUCs of 0.79 (0.75-0.83) and 0.75 (0.74-0.76) respectively. Conclusion. The findings demonstrate how a novel deep learning algorithm, applied to pre-operative ECGs, can improve discrimination of post-operative mortality. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2202.11889 [pdf, other]

A spectral-spatial fusion anomaly detection method for hyperspectral imagery

Authors: Zengfu Hou, Siyuan Cheng, Ting Hu

Abstract: In hyperspectral, high-quality spectral signals convey subtle spectral differences to distinguish similar materials, thereby providing unique advantage for anomaly detection. Hence fine spectra of anomalous pixels can be effectively screened out from heterogeneous background pixels. Since the same materials have similar characteristics in spatial and spectral dimension, detection performance can b… ▽ More In hyperspectral, high-quality spectral signals convey subtle spectral differences to distinguish similar materials, thereby providing unique advantage for anomaly detection. Hence fine spectra of anomalous pixels can be effectively screened out from heterogeneous background pixels. Since the same materials have similar characteristics in spatial and spectral dimension, detection performance can be significantly enhanced by jointing spatial and spectral information. In this paper, a spectralspatial fusion anomaly detection (SSFAD) method is proposed for hyperspectral imagery. First, original spectral signals are mapped to a local linear background space composed of median and mean with high confidence, where saliency weight and feature enhancement strategies are implemented to obtain an initial detection map in spectral domain. Futhermore, to make full use of similarity information of local background around testing pixel, a new detector is designed to extract the local similarity spatial features of patch images in spatial domain. Finally, anomalies are detected by adaptively combining the spectral and spatial detection maps. The experimental results demonstrate that our proposed method has superior detection performance than traditional methods. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2112.01288 [pdf, other]

How to quantify fields or textures? A guide to the scattering transform

Authors: Sihao Cheng, Brice Ménard

Abstract: Extracting information from stochastic fields or textures is a ubiquitous task in science, from exploratory data analysis to classification and parameter estimation. From physics to biology, it tends to be done either through a power spectrum analysis, which is often too limited, or the use of convolutional neural networks (CNNs), which require large training sets and lack interpretability. In thi… ▽ More Extracting information from stochastic fields or textures is a ubiquitous task in science, from exploratory data analysis to classification and parameter estimation. From physics to biology, it tends to be done either through a power spectrum analysis, which is often too limited, or the use of convolutional neural networks (CNNs), which require large training sets and lack interpretability. In this paper, we advocate for the use of the scattering transform (Mallat 2012), a powerful statistic which borrows mathematical ideas from CNNs but does not require any training, and is interpretable. We show that it provides a relatively compact set of summary statistics with visual interpretation and which carries most of the relevant information in a wide range of scientific applications. We present a non-technical introduction to this estimator and we argue that it can benefit data analysis, comparison to models and parameter inference in many fields of science. Interestingly, understanding the core operations of the scattering transform allows one to decipher many key aspects of the inner workings of CNNs. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: 18 pages, 16 figures

arXiv:2109.06998 [pdf, other]

doi 10.1109/ICRA46639.2022.9811946

$\mathcal{L}_1$ Adaptive Augmentation for Geometric Tracking Control of Quadrotors

Authors: Zhuohuan Wu, Sheng Cheng, Kasey A. Ackerman, Aditya Gahlawat, Arun Lakshmanan, Pan Zhao, Naira Hovakimyan

Abstract: This paper introduces an $\mathcal{L}_1$ adaptive control augmentation for geometric tracking control of quadrotors. In the proposed design, the $\mathcal{L}_1$ augmentation handles nonlinear (time- and state-dependent) uncertainties in the quadrotor dynamics without assuming or enforcing parametric structures, while the baseline geometric controller achieves stabilization of the known nonlinear m… ▽ More This paper introduces an $\mathcal{L}_1$ adaptive control augmentation for geometric tracking control of quadrotors. In the proposed design, the $\mathcal{L}_1$ augmentation handles nonlinear (time- and state-dependent) uncertainties in the quadrotor dynamics without assuming or enforcing parametric structures, while the baseline geometric controller achieves stabilization of the known nonlinear model of the system dynamics. The $\mathcal{L}_1$ augmentation applies to both the rotational and the translational dynamics. Experimental results demonstrate that the augmented geometric controller shows consistent and (on average five times) smaller trajectory tracking errors compared with the geometric controller alone when tested for different trajectories and under various types of uncertainties/disturbances. △ Less

Submitted 2 March, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: accepted by ICRA 2022

arXiv:2109.00339 [pdf, other]

How likely is a random graph shift-enabled?

Authors: Liyan Chen, Samuel Cheng, Vladimir Stankovic, Lina Stankovic

Abstract: The shift-enabled property of an underlying graph is essential in designing distributed filters. This article discusses when a random graph is shift-enabled. In particular, popular graph models ER, WS, BA random graph are used, weighted and unweighted, as well as signed graphs. Our results show that the considered unweighted connected random graphs are shift-enabled with high probability when the… ▽ More The shift-enabled property of an underlying graph is essential in designing distributed filters. This article discusses when a random graph is shift-enabled. In particular, popular graph models ER, WS, BA random graph are used, weighted and unweighted, as well as signed graphs. Our results show that the considered unweighted connected random graphs are shift-enabled with high probability when the number of edges is moderately high. However, very dense graphs, as well as fully connected graphs, are not shift-enabled. Interestingly, this behaviour is not observed for weighted connected graphs, which are always shift-enabled unless the number of edges in the graph is very low. △ Less

Submitted 28 August, 2021; originally announced September 2021.

Comments: 9 pages

arXiv:2106.12511 [pdf]

doi 10.1001/jamacardio.2021.6059

High-Throughput Precision Phenoty** of Left Ventricular Hypertrophy with Cardiovascular Deep Learning

Authors: Grant Duffy, Paul P Cheng, Neal Yuan, Bryan He, Alan C. Kwan, Matthew J. Shun-Shin, Kevin M. Alexander, Joseph Ebinger, Matthew P. Lungren, Florian Rader, David H. Liang, Ingela Schnittger, Euan A. Ashley, James Y. Zou, Jignesh Patel, Ronald Witteles, Susan Cheng, David Ouyang

Abstract: Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and di… ▽ More Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos. △ Less

Submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.08429 [pdf, other]

doi 10.1016/j.automatica.2021.109866

Optimal control of a 2D diffusion-advection process with a team of mobile actuators under jointly optimal guidance

Authors: Sheng Cheng, Derek A. Paley

Abstract: This paper describes an optimization framework to control a distributed parameter system (DPS) using a team of mobile actuators. The framework simultaneously seeks optimal control of the DPS and optimal guidance of the mobile actuators such that a cost function associated with both the DPS and the mobile actuators is minimized subject to the dynamics of each. The cost incurred from controlling the… ▽ More This paper describes an optimization framework to control a distributed parameter system (DPS) using a team of mobile actuators. The framework simultaneously seeks optimal control of the DPS and optimal guidance of the mobile actuators such that a cost function associated with both the DPS and the mobile actuators is minimized subject to the dynamics of each. The cost incurred from controlling the DPS is linear-quadratic, which is transformed into an equivalent form as a quadratic term associated with an operator-valued Riccati equation. This equivalent form reduces the problem to seeking for guidance only because the optimal control can be recovered once the optimal guidance is obtained. We establish conditions for the existence of a solution to the proposed problem. Since computing an optimal solution requires approximation, we also establish the conditions for convergence to the exact optimal solution of the approximate optimal solution. That is, when evaluating these two solutions by the original cost function, the difference becomes arbitrarily small as the approximation gets finer. Two numerical examples demonstrate the performance of the optimal control and guidance obtained from the proposed approach. △ Less

Submitted 18 June, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

Comments: Proofs for Lemmas~2.3, 2.5, and D.1 are attached in the supplement at the end

arXiv:2105.08629 [pdf, other]

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report

Authors: Andrey Ignatov, Kim Byeoung-su, Radu Timofte, Angeline Pouget, Fenglong Song, Cheng Li, Shuai Xiao, Zhongqian Fu, Matteo Maggioni, Yibin Huang, Shen Cheng, Xin Lu, Yifeng Zhou, Liangyu Chen, Donghao Liu, Xiangyu Zhang, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Minsu Kwon, Myungje Lee, Jaeyoon Yoo, Changbeom Kang, Shinjo Wang, Bin Huang , et al. (7 additional authors not shown)

Abstract: Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut… ▽ More Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: substantial text overlap with arXiv:2105.07809, arXiv:2105.07825

arXiv:2103.03011 [pdf, other]

Reinforcement Learning Trajectory Generation and Control for Aggressive Perching on Vertical Walls with Quadrotors

Authors: Chen-Huan Pi, Kai-Chun Hu, Yu-Ting Huang, Stone Cheng

Abstract: Micro aerial vehicles are widely being researched and employed due to their relative low operation costs and high flexibility in various applications. We study the under-actuated quadrotor perching problem, designing a trajectory planner and controller which generates feasible trajectories and drives quadrotors to desired state in state space. This paper proposes a trajectory generating and tracki… ▽ More Micro aerial vehicles are widely being researched and employed due to their relative low operation costs and high flexibility in various applications. We study the under-actuated quadrotor perching problem, designing a trajectory planner and controller which generates feasible trajectories and drives quadrotors to desired state in state space. This paper proposes a trajectory generating and tracking method for quadrotor perching that takes the advantages of reinforcement learning controller and traditional controller. The trained low-level reinforcement learning controller would manipulate quadrotor toward the perching point in simulation environment. Once the simulated quadrotor has successfully perched, the relative trajectory information in simulation will be sent to tracking controller on real quadrotor and start the actual perching task. Generating feasible trajectories via the trained reinforcement learning controller requires less time, and the traditional trajectory tracking controller could easily be modified to control the quadrotor and mathematically analysis its stability and robustness. We show that this approach permits the control structure of trajectories and controllers enabling such aggressive maneuvers perching on vertical surfaces with high precision. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2009.09574 [pdf, other]

Reconstruct high-resolution multi-focal plane images from a single 2D wide field image

Authors: Jiabo Ma, Sibo Liu, Shenghua Cheng, Xiuli Liu, Li Cheng, Shaoqun Zeng

Abstract: High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the p… ▽ More High-resolution 3D medical images are important for analysis and diagnosis, but axial scanning to acquire them is very time-consuming. In this paper, we propose a fast end-to-end multi-focal plane imaging network (MFPINet) to reconstruct high-resolution multi-focal plane images from a single 2D low-resolution wild filed image without relying on scanning. To acquire realistic MFP images fast, the proposed MFPINet adopts generative adversarial network framework and the strategies of post-sampling and refocusing all focal planes at one time. We conduct a series experiments on cytology microscopy images and demonstrate that MFPINet performs well on both axial refocusing and horizontal super resolution. Furthermore, MFPINet is approximately 24 times faster than current refocusing methods for reconstructing the same volume images. The proposed method has the potential to greatly increase the speed of high-resolution 3D imaging and expand the application of low-resolution wide-field images. △ Less

Submitted 20 September, 2020; originally announced September 2020.

Comments: 9 pages, 4 figures,3 Tables

arXiv:2007.12951 [pdf, ps, other]

Comparison of Machine Learning Methods for Predicting Karst Spring Discharge in North China

Authors: Shu Cheng, Xiaojuan Qiao, Yaolin Shi, Dawei Wang

Abstract: The quantitative analyses of karst spring discharge typically rely on physical-based models, which are inherently uncertain. To improve the understanding of the mechanism of spring discharge fluctuation and the relationship between precipitation and spring discharge, three machine learning methods were developed to reduce the predictive errors of physical-based groundwater models, simulate the dis… ▽ More The quantitative analyses of karst spring discharge typically rely on physical-based models, which are inherently uncertain. To improve the understanding of the mechanism of spring discharge fluctuation and the relationship between precipitation and spring discharge, three machine learning methods were developed to reduce the predictive errors of physical-based groundwater models, simulate the discharge of Longzici Spring's karst area, and predict changes in the spring on the basis of long time series precipitation monitoring and spring water flow data from 1987 to 2018. The three machine learning methods included two artificial neural networks (ANNs), namely, multilayer perceptron (MLP) and long short-term memory-recurrent neural network (LSTM-RNN), and support vector regression (SVR). A normalization method was introduced for data preprocessing to make the three methods robust and computationally efficient. To compare and evaluate the capability of the three machine learning methods, the mean squared error (MSE), mean absolute error (MAE), and root-mean-square error (RMSE) were selected as the performance metrics for these methods. Simulations showed that MLP reduced MSE, MAE, and RMSE to 0.0010, 0.0254, and 0.0318, respectively. Meanwhile, LSTM-RNN reduced MSE to 0.0010, MAE to 0.0272, and RMSE to 0.0329. Moreover, the decrease in MSE, MAE, and RMSE were 0.0910, 0.1852, and 0.3017, respectively, for SVR. Results indicated that MLP performed slightly better than LSTM-RNN, and MLP and LSTM-RNN performed considerably better than SVR. Furthermore, ANNs were demonstrated to be prior machine learning methods for simulating and predicting karst spring discharge. △ Less

Submitted 25 July, 2020; originally announced July 2020.

arXiv:2005.11902 [pdf, other]

ASR-Free Pronunciation Assessment

Authors: Sitong Cheng, Zhixin Liu, Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng

Abstract: Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/… ▽ More Most of the pronunciation assessment methods are based on local features derived from automatic speech recognition (ASR), e.g., the Goodness of Pronunciation (GOP) score. In this paper, we investigate an ASR-free scoring approach that is derived from the marginal distribution of raw speech signals. The hypothesis is that even if we have no knowledge of the language (so cannot recognize the phones/words), we can still tell how good a pronunciation is, by comparatively listening to some speech data from the target language. Our analysis shows that this new scoring approach provides an interesting correction for the phone-competition problem of GOP. Experimental results on the ERJ dataset demonstrated that combining the ASR-free score and GOP can achieve better performance than the GOP baseline. △ Less

Submitted 24 May, 2020; originally announced May 2020.

Comments: submitted to INTRESPEECH 2020

arXiv:2005.10455 [pdf, other]

Single Image Super-Resolution via Residual Neuron Attention Networks

Authors: Wenjie Ai, Xiaoguang Tu, Shilei Cheng, Mei Xie

Abstract: Deep Convolutional Neural Networks (DCNNs) have achieved impressive performance in Single Image Super-Resolution (SISR). To further improve the performance, existing CNN-based methods generally focus on designing deeper architecture of the network. However, we argue blindly increasing network's depth is not the most sensible way. In this paper, we propose a novel end-to-end Residual Neuron Attenti… ▽ More Deep Convolutional Neural Networks (DCNNs) have achieved impressive performance in Single Image Super-Resolution (SISR). To further improve the performance, existing CNN-based methods generally focus on designing deeper architecture of the network. However, we argue blindly increasing network's depth is not the most sensible way. In this paper, we propose a novel end-to-end Residual Neuron Attention Networks (RNAN) for more efficient and effective SISR. Structurally, our RNAN is a sequential integration of the well-designed Global Context-enhanced Residual Groups (GCRGs), which extracts super-resolved features from coarse to fine. Our GCRG is designed with two novelties. Firstly, the Residual Neuron Attention (RNA) mechanism is proposed in each block of GCRG to reveal the relevance of neurons for better feature representation. Furthermore, the Global Context (GC) block is embedded into RNAN at the end of each GCRG for effectively modeling the global contextual information. Experiments results demonstrate that our RNAN achieves the comparable results with state-of-the-art methods in terms of both quantitative metrics and visual quality, however, with simplified network architecture. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: 6 pages, 4 figures, Accepted by IEEE ICIP 2020

arXiv:2005.07318 [pdf]

doi 10.1364/OE.411291

Displacement-agnostic coherent imaging through scatter with an interpretable deep neural network

Authors: Yuzhe Li, Shiyi Cheng, Yujia Xue, Lei Tian

Abstract: Coherent imaging through scatter is a challenging task in computational imaging. Both model-based and data-driven approaches have been explored to solve the inverse scattering problem. In our previous work, we have shown that a deep learning approach can make high-quality and highly generalizable predictions through unseen diffusers. Here, we propose a new deep neural network (DNN) model that is a… ▽ More Coherent imaging through scatter is a challenging task in computational imaging. Both model-based and data-driven approaches have been explored to solve the inverse scattering problem. In our previous work, we have shown that a deep learning approach can make high-quality and highly generalizable predictions through unseen diffusers. Here, we propose a new deep neural network (DNN) model that is agnostic to a broader class of perturbations including scatterer change, displacements, and system defocus up to 10X depth of field. In addition, we develop a new analysis framework for interpreting the mechanism of our DNN model and visualizing its generalizability based on an unsupervised dimension reduction technique. We show that our DNN can unmix the scattering-specific information and extract the object-specific information so as to achieve generalization under different scattering conditions. Our work paves the way to a highly robust and interpretable deep learning approach to imaging through scattering media. △ Less

Submitted 1 September, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

arXiv:2005.00834 [pdf]

Learning-based super interpolation and extrapolation for speckled image reconstruction

Authors: Huanhao Li, Zhipeng Yu, Yunqi Luo, Shengfu Cheng, Lihong V. Wang, Yuan** Zheng, Puxiang Lai

Abstract: Speckles arise when coherent light interacts with biological tissues. Information retrieval from speckles is desired yet challenging, requiring understanding or map** of the multiple scattering process, or reliable capability to reverse or compensate for the scattering-induced phase distortions. In whatever situation, insufficient sampling of speckles undermines the encoded information, impeding… ▽ More Speckles arise when coherent light interacts with biological tissues. Information retrieval from speckles is desired yet challenging, requiring understanding or map** of the multiple scattering process, or reliable capability to reverse or compensate for the scattering-induced phase distortions. In whatever situation, insufficient sampling of speckles undermines the encoded information, impeding successful object reconstruction from speckle patterns. In this work, we propose a deep learning method to combat the physical limit: the sub-Nyquist sampled speckles (~14 below the Nyquist criterion) are interpolated up to a well-resolved level (1024 times more pixels to resolve the same FOV) with smoothed morphology fine-textured. More importantly, the lost information can be retraced, which is impossible with classic interpolation or any existing methods. The learning network inspires a new perspective on the nature of speckles and a promising platform for efficient processing or deciphering of massive scattered optical signals, enabling widefield high-resolution imaging in complex scenarios. △ Less

Submitted 2 May, 2020; originally announced May 2020.

arXiv:2004.12385 [pdf, other]

Towards Feature Space Adversarial Attack

Authors: Qiuling Xu, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang

Abstract: We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by inject… ▽ More We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting imperceptible style changes through an optimization procedure. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art unbounded attacks. The experiment also supports that existing pixel-space adversarial attack detection and defense techniques can hardly ensure robustness in the style related feature space. △ Less

Submitted 15 December, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

Comments: AAAI 2021

arXiv:2001.00692 [pdf]

FFusionCGAN: An end-to-end fusion method for few-focus images using conditional GAN in cytopathological digital slides

Authors: Xiebo Geng, Sibo Liua, Wei Han, Xu Li, Jiabo Ma, **gya Yu, Xiuli Liu, Sahoqun Zeng, Li Chen, Shenghua Cheng

Abstract: Multi-focus image fusion technologies compress different focus depth images into an image in which most objects are in focus. However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view. This criterion may not be met in… ▽ More Multi-focus image fusion technologies compress different focus depth images into an image in which most objects are in focus. However, although existing image fusion techniques, including traditional algorithms and deep learning-based algorithms, can generate high-quality fused images, they need multiple images with different focus depths in the same field of view. This criterion may not be met in some cases where time efficiency is required or the hardware is insufficient. The problem is especially prominent in large-size whole slide images. This paper focused on the multi-focus image fusion of cytopathological digital slide images, and proposed a novel method for generating fused images from single-focus or few-focus images based on conditional generative adversarial network (GAN). Through the adversarial learning of the generator and discriminator, the method is capable of generating fused images with clear textures and large depth of field. Combined with the characteristics of cytopathological images, this paper designs a new generator architecture combining U-Net and DenseBlock, which can effectively improve the network's receptive field and comprehensively encode image features. Meanwhile, this paper develops a semantic segmentation network that identifies the blurred regions in cytopathological images. By integrating the network into the generative model, the quality of the generated fused images is effectively improved. Our method can generate fused images from only single-focus or few-focus images, thereby avoiding the problem of collecting multiple images of different focus depths with increased time and hardware costs. Furthermore, our model is designed to learn the direct map** of input source images to fused images without the need to manually design complex activity level measurements and fusion rules as in traditional methods. △ Less

Submitted 2 January, 2020; originally announced January 2020.

arXiv:1911.01799 [pdf, ps, other]

CN-CELEB: a challenging Chinese speaker recognition dataset

Authors: Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang

Abstract: Recently, researchers set an ambitious goal of conducting speaker recognition in unconstrained conditions where the variations on ambient, channel and emotion could be arbitrary. However, most publicly available datasets are collected under constrained environments, i.e., with little noise and limited channel variation. These datasets tend to deliver over optimistic performance and do not meet the… ▽ More Recently, researchers set an ambitious goal of conducting speaker recognition in unconstrained conditions where the variations on ambient, channel and emotion could be arbitrary. However, most publicly available datasets are collected under constrained environments, i.e., with little noise and limited channel variation. These datasets tend to deliver over optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions. In this paper, we present CN-Celeb, a large-scale speaker recognition dataset collected `in the wild'. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world. Experiments conducted with two state-of-the-art speaker recognition approaches (i-vector and x-vector) show that the performance on CN-Celeb is far inferior to the one obtained on VoxCeleb, a widely used speaker recognition dataset. This result demonstrates that in real-life conditions, the performance of existing techniques might be much worse than it was thought. Our database is free for researchers and can be downloaded from http://project.cslt.org. △ Less

Submitted 31 October, 2019; originally announced November 2019.

arXiv:1910.13046 [pdf, other]

Converged Deep Framework Assembling Principled Modules for CS-MRI

Authors: Risheng Liu, Yuxi Zhang, Shichao Cheng, Zhongxuan Luo, Xin Fan

Abstract: Compressed Sensing Magnetic Resonance Imaging (CS-MRI) significantly accelerates MR data acquisition at a sampling rate much lower than the Nyquist criterion. A major challenge for CS-MRI lies in solving the severely ill-posed inverse problem to reconstruct aliasing-free MR images from the sparse k-space data. Conventional methods typically optimize an energy function, producing reconstruction of… ▽ More Compressed Sensing Magnetic Resonance Imaging (CS-MRI) significantly accelerates MR data acquisition at a sampling rate much lower than the Nyquist criterion. A major challenge for CS-MRI lies in solving the severely ill-posed inverse problem to reconstruct aliasing-free MR images from the sparse k-space data. Conventional methods typically optimize an energy function, producing reconstruction of high quality, but their iterative numerical solvers unavoidably bring extremely slow processing. Recent data-driven techniques are able to provide fast restoration by either learning direct prediction to final reconstruction or plugging learned modules into the energy optimizer. Nevertheless, these data-driven predictors cannot guarantee the reconstruction following constraints underlying the regularizers of conventional methods so that the reliability of their reconstruction results are questionable. In this paper, we propose a converged deep framework assembling principled modules for CS-MRI that fuses learning strategy with the iterative solver of a conventional reconstruction energy. This framework embeds an optimal condition checking mechanism, fostering \emph{efficient} and \emph{reliable} reconstruction. We also apply the framework to two practical tasks, \emph{i.e.}, parallel imaging and reconstruction with Rician noise. Extensive experiments on both benchmark and manufacturer-testing images demonstrate that the proposed method reliably converges to the optimal solution more efficiently and accurately than the state-of-the-art in various scenarios. △ Less

Submitted 28 October, 2019; originally announced October 2019.

arXiv:1909.05184 [pdf]

Multi-stage domain adversarial style reconstruction for cytopathological image stain normalization

Authors: Xihao Chen, **gya Yu, Li Chen, Shaoqun Zeng, Xiuli Liu, Shenghua Cheng

Abstract: The different stain styles of cytopathological images have a negative effect on the generalization ability of automated image analysis algorithms. This article proposes a new framework that normalizes the stain style for cytopathological images through a stain removal module and a multi-stage domain adversarial style reconstruction module. We convert colorful images into grayscale images with a co… ▽ More The different stain styles of cytopathological images have a negative effect on the generalization ability of automated image analysis algorithms. This article proposes a new framework that normalizes the stain style for cytopathological images through a stain removal module and a multi-stage domain adversarial style reconstruction module. We convert colorful images into grayscale images with a color-encoding mask. Using the mask, reconstructed images retain their basic color without red and blue mixing, which is important for cytopathological image interpretation. The style reconstruction module consists of per-pixel regression with intradomain adversarial learning, inter-domain adversarial learning, and optional task-based refining. Per-pixel regression with intradomain adversarial learning establishes the generative network from the decolorized input to the reconstructed output. The interdomain adversarial learning further reduces the difference in stain style. The generation network can be optimized by combining it with the task network. Experimental results show that the proposed techniques help to optimize the generation network. The average accuracy increases from 75.41% to 84.79% after the intra-domain adversarial learning, and to 87.00% after interdomain adversarial learning. Under the guidance of the task network, the average accuracy rate reaches 89.58%. The proposed method achieves unsupervised stain normalization of cytopathological images, while preserving the cell structure, texture structure, and cell color properties of the image. This method overcomes the problem of generalizing the task models between different stain styles of cytopathological images. △ Less

Submitted 11 September, 2019; originally announced September 2019.

Showing 1–50 of 57 results for author: Cheng, S