Search | arXiv e-print repository

CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

Authors: Sachin Shah, Matthew Albert Chan, Haoming Cai, **gxi Chen, Sakshum Kulshrestha, Chahat Deep Singh, Yiannis Aloimonos, Christopher Metzler

Abstract: Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in t… ▽ More Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in the log-intensity of light. This paper establishes theoretical limits (Cramér Rao bounds) on 3D point localization and tracking with PSF-engineered event cameras. Using these bounds, we first demonstrate that existing Fisher phase masks are already near-optimal for localizing static flashing point sources (e.g., blinking fluorescent molecules). We then demonstrate that existing designs are sub-optimal for tracking moving point sources and proceed to use our theory to design optimal phase masks and binary amplitude masks for this task. To overcome the non-convexity of the design problem, we leverage novel implicit neural representation based parameterizations of the phase and amplitude masks. We demonstrate the efficacy of our designs through extensive simulations. We also validate our method with a simple prototype. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07409 [pdf, other]

Accelerating Ill-conditioned Hankel Matrix Recovery via Structured Newton-like Descent

Authors: HanQin Cai, Longxiu Huang, Xiliang Lu, Juntao You

Abstract: This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of th… ▽ More This paper studies the robust Hankel recovery problem, which simultaneously removes the sparse outliers and fulfills missing entries from the partial observation. We propose a novel non-convex algorithm, coined Hankel Structured Newton-Like Descent (HSNLD), to tackle the robust Hankel recovery problem. HSNLD is highly efficient with linear convergence, and its convergence rate is independent of the condition number of the underlying Hankel matrix. The recovery guarantee has been established under some mild conditions. Numerical experiments on both synthetic and real datasets show the superior performance of HSNLD against state-of-the-art algorithms. △ Less

Submitted 11 June, 2024; originally announced June 2024.

MSC Class: 15A29; 15A83; 47B35; 90C17; 90C26; 90C53

arXiv:2401.10269 [pdf, ps, other]

Robust Multi-Sensor Multi-Target Tracking Using Possibility Labeled Multi-Bernoulli Filter

Authors: Han Cai, Chenbao Xue, Jeremie Houssineau, Zhirun Xue

Abstract: With the increasing complexity of multiple target tracking scenes, a single sensor may not be able to effectively monitor a large number of targets. Therefore, it is imperative to extend the single-sensor technique to Multi-Sensor Multi-Target Tracking (MSMTT) for enhanced functionality. Typical MSMTT methods presume complete randomness of all uncertain components, and therefore effective solution… ▽ More With the increasing complexity of multiple target tracking scenes, a single sensor may not be able to effectively monitor a large number of targets. Therefore, it is imperative to extend the single-sensor technique to Multi-Sensor Multi-Target Tracking (MSMTT) for enhanced functionality. Typical MSMTT methods presume complete randomness of all uncertain components, and therefore effective solutions such as the random finite set filter and covariance intersection method have been derived to conduct the MSMTT task. However, the presence of epistemic uncertainty, arising from incomplete information, is often disregarded within the context of MSMTT. This paper develops an innovative possibility Labeled Multi-Bernoulli (LMB) Filter based on the labeled Uncertain Finite Set (UFS) theory. The LMB filter inherits the high robustness of the possibility generalized labeled multi-Bernoulli filter with simplified computational complexity. The fusion of LMB UFSs is derived and adapted to develop a robust MSMTT scheme. Simulation results corroborate the superior performance exhibited by the proposed approach in comparison to typical probabilistic methods. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.04679 [pdf, other]

ConVRT: Consistent Video Restoration Through Turbulence with Test-time Optimization of Neural Video Representations

Authors: Haoming Cai, **gxi Chen, Brandon Y. Feng, Weiyun Jiang, Mingyang Xie, Kevin Zhang, Ashok Veeraraghavan, Christopher Metzler

Abstract: tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT)… ▽ More tmospheric turbulence presents a significant challenge in long-range imaging. Current restoration algorithms often struggle with temporal inconsistency, as well as limited generalization ability across varying turbulence levels and scene content different than the training data. To tackle these issues, we introduce a self-supervised method, Consistent Video Restoration through Turbulence (ConVRT) a test-time optimization method featuring a neural video representation designed to enhance temporal consistency in restoration. A key innovation of ConVRT is the integration of a pretrained vision-language model (CLIP) for semantic-oriented supervision, which steers the restoration towards sharp, photorealistic images in the CLIP latent space. We further develop a principled selection strategy of text prompts, based on their statistical correlation with a perceptual metric. ConVRT's test-time optimization allows it to adapt to a wide range of real-world turbulence conditions, effectively leveraging the insights gained from pre-trained models on simulated data. ConVRT offers a comprehensive and effective solution for mitigating real-world turbulence in dynamic videos. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: https://convrt-2024.github.io/

arXiv:2311.06705 [pdf, other]

Equal Incremental Cost-Based Optimization Method to Enhance Efficiency for IPOP-Type Converters

Authors: Hanfeng Cai, Haiyang Liu, Heyang Sun, Qiao Wang

Abstract: Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fi… ▽ More Systematic optimization over a wide power range is often achieved through the combination of modules of different power levels. This paper addresses the issue of enhancing the efficiency of a multiple module system connected in parallel during operation and proposes an algorithm based on equal incremental cost for dynamic load allocation. Initially, a polynomial fitting technique is employed to fit efficiency test points for individual modules. Subsequently, the equal incremental cost-based optimization is utilized to formulate an efficiency optimization and allocation scheme for the multi-module system. A simulated annealing algorithm is applied to determine the optimal power output strategy for each module at given total power flow requirement. Finally, a dual active bridge (DAB) experimental prototype with two input-parallel-output-parallel (IPOP) configurations is constructed to validate the effectiveness of the proposed strategy. Experimental results demonstrate that under the 800W operating condition, the approach in this paper achieves an efficiency improvement of over 0.74\% by comparison with equal power sharing between both modules. △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2309.04568 [pdf, other]

doi 10.1088/1742-6596/2600/19/192011

Circular economy meets building automation

Authors: Hanmin Cai

Abstract: This paper demonstrates the concept of reusing discarded smartphones to connect the end-of-life of e-wastes with the start-of-life of smart buildings. Two control-related and one communication-related case studies have been conducted experimentally to evaluate applicability. Diverse controlled systems, control tasks, and algorithms have been considered. In addition, the sufficiency of communicatio… ▽ More This paper demonstrates the concept of reusing discarded smartphones to connect the end-of-life of e-wastes with the start-of-life of smart buildings. Two control-related and one communication-related case studies have been conducted experimentally to evaluate applicability. Diverse controlled systems, control tasks, and algorithms have been considered. In addition, the sufficiency of communication with external agents has been quantified. The proof-of-concept experiments indicate technical feasibility and applicability to typical tasks with satisfactory performance. As smartphones improve over time, higher computing performance and lower communication latency can be expected, enhancing the prospect of the proposed reuse concept. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Journal ref: J. Phys.: Conf. Ser. 2600 192011 (2023)

arXiv:2307.08208 [pdf, other]

Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Stefanos Koffas, Yiming Li

Abstract: Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against s… ▽ More Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ($e.g.$, pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to `mask' it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audios to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more `natural' poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ($e.g.$, all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. The code for reproducing main experiments are available at \url{https://github.com/HanboCai/BadSpeech_SoE}. △ Less

Submitted 16 July, 2023; originally announced July 2023.

Comments: 13 pages

arXiv:2307.02514 [pdf, other]

Exploring Multimodal Approaches for Alzheimer's Disease Detection Using Patient Speech Transcript and Audio Data

Authors: Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai, Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, Xiang Li

Abstract: Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach invo… ▽ More Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2305.10983 [pdf, other]

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Authors: Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, **g Xiao, Yinqiang Zheng, Yujiu Yang

Abstract: Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipe… ▽ More Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo-viewport sequences from a given starting point. Additionally, we design a Multi-scale Feature Aggregation (MFA) module with a Distortion-aware Block (DAB) to fuse distorted and semantic features of each viewport. We also devise Temporal Modeling Module (TMM) to learn the viewport transition in the temporal domain. Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/TianheWu/Assessor360. △ Less

Submitted 10 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2304.14089 [pdf, other]

Distributed Multi-Horizon Model Predictive Control for Network of Energy Hubs

Authors: Varsha Behrunani, Hanmin Cai, Philipp Heer, Roy S. Smith, John Lygeros

Abstract: The increasing penetration of renewable energy resources has transformed the energy system from traditional hierarchical energy delivery paradigm to a distributed structure. Such development is accompanied with continuous liberalization in the energy sector, giving rise to possible energy trading among networked local energy hub. Joint operation of such hubs can improve energy efficiency and suppo… ▽ More The increasing penetration of renewable energy resources has transformed the energy system from traditional hierarchical energy delivery paradigm to a distributed structure. Such development is accompanied with continuous liberalization in the energy sector, giving rise to possible energy trading among networked local energy hub. Joint operation of such hubs can improve energy efficiency and support the integration of renewable energy resource. Acknowledging peer-to-peer trading between hubs, their optimal operation within the network can maximize consumption of locally produced energy. However, for such complex systems involving multiple stakeholders, both computational tractability and privacy concerns need to be accounted for. We investigate both decentralized and centralized model predictive control (MPC) approaches for a network of energy hubs. While the centralized control strategy offers superior performance to the decentralized method, its implementation is computationally prohibitive and raises privacy concerns, as the information of each hub has to be shared extensively. On the other hand, a classical decentralized control approach can ease the implementation at the expense of sub-optimal performance of the overall system. In this work, a distributed scheme based on a consensus alternating direction method of multipliers (ADMM) algorithm is proposed. It combines the performance of the centralized approach with the privacy preservation of decentralized approach. A novel multi-horizon MPC framework is also introduced to increase the prediction horizon without compromising the time discretization or making the problem computationally intractable. A benchmark three-hub network is used to compare the performance of the mentioned methods. The results show superior performance in terms of total cost, computational time, robustness to demand and prices variations. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: 14 pages, 11 Figures

arXiv:2304.08708 [pdf]

A Voice Disease Detection Method Based on MFCCs and Shallow CNN

Authors: Hao Cai, Can Li, Fei Ding

Abstract: The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a w… ▽ More The incidence rate of voice diseases is increasing year by year. The use of software for remote diagnosis is a technical development trend and has important practical value. Among voice diseases, common diseases that cause hoarseness include spasmodic dysphonia, vocal cord paralysis, vocal nodule, and vocal cord polyp. This paper presents a voice disease detection method that can be applied in a wide range of clinical. We cooperated with Xiangya Hospital of Central South University to collect voice samples from sixty-one different patients. The Mel Frequency Cepstrum Coefficient (MFCC) parameters are extracted as input features to describe the voice in the form of data. An innovative model combining MFCC parameters and single convolution layer CNN is proposed for fast calculation and classification. The highest accuracy we achieved was 92%, it is fully ahead of the original research results and internationally advanced. And we use Advanced Voice Function Assessment Databases (AVFAD) to evaluate the generalization ability of the method we proposed, which achieved an accuracy rate of 98%. Experiments on clinical and standard datasets show that for the pathological detection of voice diseases, our method has greatly improved in accuracy and computational efficiency. △ Less

Submitted 17 April, 2023; originally announced April 2023.

arXiv:2212.10103 [pdf, ps, other]

VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

Abstract: Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulat… ▽ More Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulating third-party training data. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Voiceprint Selection and Voice Conversion, abbreviated as VSVC. Experimental results demonstrated that VSVC is feasible to achieve an average attack success rate close to 97% in four victim models when poisoning less than 1% of the training data. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: 7 pages,5 figures

arXiv:2211.08697 [pdf, ps, other]

PBSM: Backdoor attack against Keyword spotting based on pitch boosting and sound masking

Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

Abstract: Keyword spotting (KWS) has been widely used in various speech control scenarios. The training of KWS is usually based on deep neural networks and requires a large amount of data. Manufacturers often use third-party data to train KWS. However, deep neural networks are not sufficiently interpretable to manufacturers, and attackers can manipulate third-party training data to plant backdoors during th… ▽ More Keyword spotting (KWS) has been widely used in various speech control scenarios. The training of KWS is usually based on deep neural networks and requires a large amount of data. Manufacturers often use third-party data to train KWS. However, deep neural networks are not sufficiently interpretable to manufacturers, and attackers can manipulate third-party training data to plant backdoors during the model training. An effective backdoor attack can force the model to make specified judgments under certain conditions, i.e., triggers. In this paper, we design a backdoor attack scheme based on Pitch Boosting and Sound Masking for KWS, called PBSM. Experimental results demonstrated that PBSM is feasible to achieve an average attack success rate close to 90% in three victim models when poisoning less than 1% of the training data. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Comments: 5 pages, 4 figures

arXiv:2211.05256 [pdf, other]

Power Efficient Video Super-Resolution on Mobile NPUs with Deep Learning, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Cheng-Ming Chiang, Hsien-Kai Kuo, Yu-Syuan Xu, Man-Yu Lee, Allen Lu, Chia-Ming Cheng, Chih-Cheng Chen, Jia-Ying Yong, Hong-Han Shuai, Wen-Huang Cheng, Zhuang Jia, Tianyu Xu, Yijian Zhang, Long Bao, Heng Sun, Diankai Zhang, Si Gao, Shaoli Liu, Biao Wu, Xiaofeng Zhang, Chengjian Zheng, Kaidi Lu, Ning Wang , et al. (29 additional authors not shown)

Abstract: Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this prob… ▽ More Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2105.08826, arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.03885

arXiv:2210.05960 [pdf, other]

Efficient Image Super-Resolution using Vast-Receptive-Field Attention

Authors: Lin Zhou, Haoming Cai, **** Gu, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Yu Qiao, Chao Dong

Abstract: The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of th… ▽ More The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of the attention branch, (2) replacing large dense convolution kernels with depth-wise separable convolutions, and (3) introducing pixel normalization. These approaches paint a clear evolutionary roadmap for the design of attention mechanisms. Based on these observations, we propose VapSR, the VAst-receptive-field Pixel attention network. Experiments demonstrate the superior performance of VapSR. VapSR outperforms the present lightweight networks with even fewer parameters. And the light version of VapSR can use only 21.68% and 28.18% parameters of IMDB and RFDN to achieve similar performances to those networks. The code and models are available at https://github.com/zhoumumu/VapSR. △ Less

Submitted 12 October, 2022; originally announced October 2022.

arXiv:2210.04198 [pdf, other]

Super-Resolution by Predicting Offsets: An Ultra-Efficient Super-Resolution Network for Rasterized Images

Authors: **** Gu, Haoming Cai, Chenyu Dong, Ruofan Zhang, Yulun Zhang, Wenming Yang, Chun Yuan

Abstract: Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorit… ▽ More Rendering high-resolution (HR) graphics brings substantial computational costs. Efficient graphics super-resolution (SR) methods may achieve HR rendering with small computing resources and have attracted extensive research interests in industry and research communities. We present a new method for real-time SR for computer graphics, namely Super-Resolution by Predicting Offsets (SRPO). Our algorithm divides the image into two parts for processing, i.e., sharp edges and flatter areas. For edges, different from the previous SR methods that take the anti-aliased images as inputs, our proposed SRPO takes advantage of the characteristics of rasterized images to conduct SR on the rasterized images. To complement the residual between HR and low-resolution (LR) rasterized images, we train an ultra-efficient network to predict the offset maps to move the appropriate surrounding pixels to the new positions. For flat areas, we found simple interpolation methods can already generate reasonable output. We finally use a guided fusion operation to integrate the sharp edges generated by the network and flat areas by the interpolation method to get the final SR image. The proposed network only contains 8,434 parameters and can be accelerated by network quantization. Extensive experiments show that the proposed SRPO can achieve superior visual effects at a smaller computational cost than the existing state-of-the-art methods. △ Less

Submitted 9 October, 2022; originally announced October 2022.

Comments: This article has been accepted by ECCV2022

arXiv:2209.12245 [pdf, other]

Decentralised possibilistic inference with applications to target tracking

Authors: Jeremie Houssineau, Han Cai, Murat Uney, Emmanuel Delande

Abstract: Fusing and sharing information from multiple sensors over a network is a challenging task. Part of this challenge arises from the absence of a foundational rule for fusing probability distributions, with various approaches stemming from different principles. Yet, when expressing tracking algorithms within the framework of possibility theory, one specific fusion rule can be proved to be exact in th… ▽ More Fusing and sharing information from multiple sensors over a network is a challenging task. Part of this challenge arises from the absence of a foundational rule for fusing probability distributions, with various approaches stemming from different principles. Yet, when expressing tracking algorithms within the framework of possibility theory, one specific fusion rule can be proved to be exact in the sense that it is equivalent to the non-distributed possibilistic approach. In this article, this fusion rule is applied to decentralised fusion, based on the possibilistic analogue of the Bernoulli filter. We then show that the proposed approach outperforms its probabilistic counterpart on simulated data. △ Less

Submitted 11 August, 2023; v1 submitted 25 September, 2022; originally announced September 2022.

arXiv:2206.11695 [pdf, other]

NTIRE 2022 Challenge on Perceptual Image Quality Assessment

Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Radu Timofte

Abstract: This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics fro… ▽ More This paper reports on the NTIRE 2022 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2022. This challenge is held to address the emerging challenge of IQA by perceptual image processing algorithms. The output images of these algorithms have completely different characteristics from traditional distortions and are included in the PIPAL dataset used in this challenge. This challenge is divided into two tracks, a full-reference IQA track similar to the previous NTIRE IQA challenge and a new track that focuses on the no-reference IQA methods. The challenge has 192 and 179 registered participants for two tracks. In the final testing stage, 7 and 8 participating teams submitted their models and fact sheets. Almost all of them have achieved better results than existing IQA methods, and the winning method can demonstrate state-of-the-art performance. △ Less

Submitted 23 June, 2022; originally announced June 2022.

Comments: This report has been published in CVPR 2022 NTIRE workshop. arXiv admin note: text overlap with arXiv:2105.03072

arXiv:2205.05996 [pdf, other]

Blueprint Separable Residual Network for Efficient Image Super-Resolution

Authors: Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, **** Gu, Yu Qiao, Chao Dong

Abstract: Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there… ▽ More Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there is still redundancy in the convolution operation. In this paper, we propose Blueprint Separable Residual Network (BSRN) containing two efficient designs. One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. The other is to enhance the model ability by introducing more effective attention modules. The experimental results show that BSRN achieves state-of-the-art performance among existing efficient SR methods. Moreover, a smaller variant of our model BSRN-S won the first place in model complexity track of NTIRE 2022 Efficient SR Challenge. The code is available at https://github.com/xiaom233/BSRN. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: Accepted to CVPR Workshops

arXiv:2205.05675 [pdf, other]

NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Yawei Li, Kai Zhang, Radu Timofte, Luc Van Gool, Fangyuan Kong, Mingxi Li, Songwei Liu, Zongcai Du, Ding Liu, Chenhui Zhou, **gyi Chen, Qingrui Han, Zheyuan Li, Yingqi Liu, Xiangyu Chen, Haoming Cai, Yu Qiao, Chao Dong, Long Sun, **shan Pan, Yi Zhu, Zhikai Zong, Xiaoxiao Liu, Zheng Hui, Tao Yang , et al. (86 additional authors not shown)

Abstract: This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e… ▽ More This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Validation code of the baseline model is available at https://github.com/ofsoundof/IMDN. Validation of all submitted models is available at https://github.com/ofsoundof/NTIRE2022_ESR

arXiv:2203.07659 [pdf]

Breast Cancer Molecular Subtypes Prediction on Pathological Images with Discriminative Patch Selecting and Multi-Instance Learning

Authors: Hong Liu, Wen-Dong Xu, Zi-Hao Shang, Xiang-Dong Wang, Hai-Yan Zhou, Ke-Wen Ma, Huan Zhou, Jia-Lin Qi, Jia-Rui Jiang, Li-Lan Tan, Hui-Min Zeng, Hui-Juan Cai, Kuan-Song Wang, Yue-Liang Qian

Abstract: Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from con… ▽ More Molecular subtypes of breast cancer are important references to personalized clinical treatment. For cost and labor savings, only one of the patient's paraffin blocks is usually selected for subsequent immunohistochemistry (IHC) to obtain molecular subtypes. Inevitable sampling error is risky due to tumor heterogeneity and could result in a delay in treatment. Molecular subtype prediction from conventional H&E pathological whole slide images (WSI) using AI method is useful and critical to assist pathologists pre-screen proper paraffin block for IHC. It's a challenging task since only WSI level labels of molecular subtypes can be obtained from IHC. Gigapixel WSIs are divided into a huge number of patches to be computationally feasible for deep learning. While with coarse slide-level labels, patch-based methods may suffer from abundant noise patches, such as folds, overstained regions, or non-tumor tissues. A weakly supervised learning framework based on discriminative patch selecting and multi-instance learning was proposed for breast cancer molecular subtype prediction from H&E WSIs. Firstly, co-teaching strategy was adopted to learn molecular subtype representations and filter out noise patches. Then, a balanced sampling strategy was used to handle the imbalance in subtypes in the dataset. In addition, a noise patch filtering algorithm that used local outlier factor based on cluster centers was proposed to further select discriminative patches. Finally, a loss function integrating patch with slide constraint information was used to finetune MIL framework on obtained discriminative patches and further improve the performance of molecular subty**. The experimental results confirmed the effectiveness of the proposed method and our models outperformed even senior pathologists, with potential to assist pathologists to pre-screen paraffin blocks for IHC in clinic. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2110.12831 [pdf, other]

Experimental implementation of an emission-aware prosumer with online flexibility quantification and provision

Authors: Hanmin Cai, Philipp Heer

Abstract: Active building energy management holds potential to reduce global energy-related emissions and support flexible operations of future low-carbon systems. This requires to integrate diverse objectives and engage multiple stakeholders. However, there remains a gap in comprehensive field insights into emission reduction, flexibility provision, and user impacts. This study examined how a real occupied… ▽ More Active building energy management holds potential to reduce global energy-related emissions and support flexible operations of future low-carbon systems. This requires to integrate diverse objectives and engage multiple stakeholders. However, there remains a gap in comprehensive field insights into emission reduction, flexibility provision, and user impacts. This study examined how a real occupied building, with all its energy assets, could function as an emission-aware flexible prosumer. An existing building energy management system was enhanced by integrating a model predictive control strategy. The enhanced setup minimized the equivalent carbon emission due to electricity imports and provided flexibility to the energy system. The experimental results indicated an emission reduction of 12.5% compared to a rule-based controller that maximized PV self-consumption. In addition, a minimal flexibility provision experiment was demonstrated with a locally emulated distribution system operator. The results suggested that flexibility was provided without the risk of rebound effects. This is due to the flexibility envelope that was self-reported in advance. The study concluded by highlighting technical challenges in realizing emission reduction and flexibility in practice. △ Less

Submitted 24 October, 2023; v1 submitted 25 October, 2021; originally announced October 2021.

arXiv:2110.12796 [pdf, other]

doi 10.1109/PowerTech55446.2023.10202703

Data-Driven Demand-Side Flexibility Quantification: Prediction and Approximation of Flexibility Envelopes

Authors: Nami Hekmat, Hanmin Cai, Thierry Zufferey, Gabriela Hug, Philipp Heer

Abstract: Real-time quantification of residential building energy flexibility is needed to enable a cost-efficient operation of active distribution grids. A promising means is to use the so-called flexibility envelope concept to represent the time-dependent and inter-temporally coupled flexibility potential. However, existing optimization-based quantification entails high computational burdens limiting flex… ▽ More Real-time quantification of residential building energy flexibility is needed to enable a cost-efficient operation of active distribution grids. A promising means is to use the so-called flexibility envelope concept to represent the time-dependent and inter-temporally coupled flexibility potential. However, existing optimization-based quantification entails high computational burdens limiting flexibility utilization in real-time applications, and a more computationally efficient quantification approach is desired. Additionally, the communication of a flexibility envelope to system operators in its original form is data-intensive. In order to address the computational burdens, this paper first trains several machine learning models based on historical quantification results for online use. Subsequently, probability distribution functions are proposed to approximate the flexibility envelopes with significantly fewer parameters, which can be communicated to system operators instead of the original flexibility envelope. The results show that the most promising prediction and approximation approaches allow for a minimum reduction of the computational burden by a factor of 9 and of the communication load by a factor of 6.6, respectively. △ Less

Submitted 5 May, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

arXiv:2108.10448 [pdf, other]

Fast Robust Tensor Principal Component Analysis via Fiber CUR Decomposition

Authors: HanQin Cai, Zehan Chao, Longxiu Huang, Deanna Needell

Abstract: We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tens… ▽ More We study the problem of tensor robust principal component analysis (TRPCA), which aims to separate an underlying low-multilinear-rank tensor and a sparse outlier tensor from their sum. In this work, we propose a fast non-convex algorithm, coined Robust Tensor CUR (RTCUR), for large-scale TRPCA problems. RTCUR considers a framework of alternating projections and utilizes the recently developed tensor Fiber CUR decomposition to dramatically lower the computational complexity. The performance advantage of RTCUR is empirically verified against the state-of-the-arts on the synthetic datasets and is further demonstrated on the real-world application such as color video background subtraction. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: Accepted to Workshop on Robust Subspace Learning and Applications in Computer Vision, International Conference on Computer Vision (ICCV) 2021

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 189-197, 2021

arXiv:2106.10623 [pdf, other]

A Scalable 256-Elements E-Band Phased-Array Transceiver for Broadband Communication

Authors: Xu Li, Wenyao Zhai, Morris Repeta, Hua Cai, Tyler Ross, Kimia Ansari, Sam Tiller, Hari Krishna Pothula, Dong Liang, Fan Yang, Yibo Lyu, Songlin Shuai, Guangjian Wang, Wen Tong

Abstract: For E-band wireless communications, a high gain steerable antenna with sub-arrays is desired to reduce the implementation complexity. This paper presents an E-band communication link with 256-elements antennas based on 8-elements sub-arrays and four beam-forming chips in silicon germanium (SiGe) bipolar complementary metal-oxide-semiconductor (BiCMOS), which is packaged on a 19-layer low temperatu… ▽ More For E-band wireless communications, a high gain steerable antenna with sub-arrays is desired to reduce the implementation complexity. This paper presents an E-band communication link with 256-elements antennas based on 8-elements sub-arrays and four beam-forming chips in silicon germanium (SiGe) bipolar complementary metal-oxide-semiconductor (BiCMOS), which is packaged on a 19-layer low temperature co-fired ceramic (LTCC) substrate. After the design and manufacture of the 256-elements antenna, a fast near-field calibration method is proposed for calibration, where a single near-field measurement is required. Then near-field to far-field (NFFF) transform and far-field to near-field (FFNF) transform are used for the bore-sight calibration. The comparison with high frequency structure simulator (HFSS) is utilized for the non-bore-sight calibration. Verified on the 256-elements antenna, the beam-forming performance measured in the chamber is in good agreement with the simulations. The communication in the office environment is also realized using a fifth generation (5G) new radio (NR) system, whose bandwidth is 400 megahertz (MHz) and waveform format is orthogonal frequency division multiplexing (OFDM) with 120 kilohertz (kHz) sub-carrier spacing. △ Less

Submitted 20 June, 2021; originally announced June 2021.

arXiv:2105.03085 [pdf, other]

Toward Interactive Modulation for Photo-Realistic Image Restoration

Authors: Haoming Cai, **gwen He, Qiao Yu, Chao Dong

Abstract: Modulating image restoration level aims to generate a restored image by altering a factor that represents the restoration strength. Previous works mainly focused on optimizing the mean squared reconstruction error, which brings high reconstruction accuracy but lacks finer texture details. This paper presents a Controllable Unet Generative Adversarial Network (CUGAN) to generate high-frequency text… ▽ More Modulating image restoration level aims to generate a restored image by altering a factor that represents the restoration strength. Previous works mainly focused on optimizing the mean squared reconstruction error, which brings high reconstruction accuracy but lacks finer texture details. This paper presents a Controllable Unet Generative Adversarial Network (CUGAN) to generate high-frequency textures in the modulation tasks. CUGAN consists of two modules -- base networks and condition networks. The base networks comprise a generator and a discriminator. In the generator, we realize the interactive control of restoration levels by tuning the weights of different features from different scales in the Unet architecture. Moreover, we adaptively modulate the intermediate features in the discriminator according to the severity of degradations. The condition networks accept the condition vector (encoded degradation information) as input, then generate modulation parameters for both the generator and the discriminator. During testing, users can control the output effects by tweaking the condition vector. We also provide a smooth transition between GAN and MSE effects by a simple transition method. Extensive experiments demonstrate that the proposed CUGAN achieves excellent performance on image restoration modulation tasks. △ Less

Submitted 7 May, 2021; originally announced May 2021.

arXiv:2105.03072 [pdf, other]

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, **gyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance. △ Less

Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

arXiv:2103.11037 [pdf, other]

Mode-wise Tensor Decompositions: Multi-dimensional Generalizations of CUR Decompositions

Authors: HanQin Cai, Keaton Hamm, Longxiu Huang, Deanna Needell

Abstract: Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of… ▽ More Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of the tensor CUR approximations when (adversarial or Gaussian) noise appears. Moreover, we show that low cost uniform sampling is sufficient for tensor CUR approximations if the tensor has an incoherent structure. Empirical performance evaluations, with both synthetic and real-world datasets, establish the speed advantage of the tensor CUR approximations over other state-of-the-art low multilinear rank tensor approximations. △ Less

Submitted 25 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Journal ref: The Journal of Machine Learning Research 22.185 (2021): 1-36

arXiv:2101.05231 [pdf, other]

doi 10.1137/20M1388322

Robust CUR Decomposition: Theory and Imaging Applications

Authors: HanQin Cai, Keaton Hamm, Longxiu Huang, Deanna Needell

Abstract: This paper considers the use of Robust PCA in a CUR decomposition framework and applications thereof. Our main algorithms produce a robust version of column-row factorizations of matrices $\mathbf{D}=\mathbf{L}+\mathbf{S}$ where $\mathbf{L}$ is low-rank and $\mathbf{S}$ contains sparse outliers. These methods yield interpretable factorizations at low computational cost, and provide new CUR decompo… ▽ More This paper considers the use of Robust PCA in a CUR decomposition framework and applications thereof. Our main algorithms produce a robust version of column-row factorizations of matrices $\mathbf{D}=\mathbf{L}+\mathbf{S}$ where $\mathbf{L}$ is low-rank and $\mathbf{S}$ contains sparse outliers. These methods yield interpretable factorizations at low computational cost, and provide new CUR decompositions that are robust to sparse outliers, in contrast to previous methods. We consider two key imaging applications of Robust PCA: video foreground-background separation and face modeling. This paper examines the qualitative behavior of our Robust CUR decompositions on the benchmark videos and face datasets, and find that our method works as well as standard Robust PCA while being significantly faster. Additionally, we consider hybrid randomized and deterministic sampling methods which produce a compact CUR decomposition of a given matrix, and apply this to video sequences to produce canonical frames thereof. △ Less

Submitted 5 August, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

MSC Class: 15A23; 65F30; 68P20; 68W20; 68W25; 68Q25

Journal ref: SIAM Journal on Imaging Sciences 14.4 (2021): 1472-1503

arXiv:2012.04950 [pdf, other]

Distributed Dual Objective Control of A Flywheel Energy Storage Matrix System Under Jointly Connected Communication Network

Authors: Haiming Liu, Huanli Gao, Shu** Guo, He Cai

Abstract: This paper studies the distributed dual objective control problem of a heterogenous flywheel energy storage matrix system aiming at simultaneous reference power track-ing and state-of-energy balancing. We first prove that the solution to this problem exists by showing the existence of a common state-of-energy trajectory for all the flywheel systems on which the dual control objectives can be achie… ▽ More This paper studies the distributed dual objective control problem of a heterogenous flywheel energy storage matrix system aiming at simultaneous reference power track-ing and state-of-energy balancing. We first prove that the solution to this problem exists by showing the existence of a common state-of-energy trajectory for all the flywheel systems on which the dual control objectives can be achieved simultaneously. Next, based on this common state-of-energy trajectory, the distributed dual objective control problem is converted into a double layer distributed tracking problem, which is then solved by the adaptive distributed observer approach. Simulation results are provided to validate the effectiveness of the proposed control scheme. △ Less

Submitted 9 December, 2020; originally announced December 2020.

arXiv:2011.15002 [pdf, other]

Image Quality Assessment for Perceptual Image Restoration: A New Dataset, Benchmark and Metric

Authors: **** Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality a… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. We present two questions: Can existing IQA methods objectively evaluate recent IR algorithms? With the focus on beating current benchmarks, are we getting better IR algorithms? To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based IR algorithms, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and SR methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we shed light on how to improve the IQA performance on GAN-based distortion. Inspired by the find that the existing IQA methods have an unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment, we propose to improve the performance of an IQA network on GAN-based distortion by explicitly considering this misalignment. We propose the Space War** Difference Network, which includes the novel l_2 pooling layers and Space War** Difference layers. Experiments demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2007.12142

arXiv:2009.12798 [pdf, other]

AIM 2020: Scene Relighting and Illumination Estimation Challenge

Authors: Majed El Helou, Ruofan Zhou, Sabine Süsstrunk, Radu Timofte, Mahmoud Afifi, Michael S. Brown, Kele Xu, Hengxing Cai, Yuzhong Liu, Li-Wen Wang, Zhi-Song Liu, Chu-Tak Li, Sourya Dipta Das, Nisarg A. Shah, Akashdeep Jassal, Tongtong Zhao, Shanshan Zhao, Sabari Nathan, M. Parisa Beham, R. Suganya, Qing Wang, Zhongyun Hu, Xin Huang, Yaning Li, Maitreya Suin , et al. (12 additional authors not shown)

Abstract: We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illum… ▽ More We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i.e., light source position). The goal of the second track was to estimate illumination settings, namely the color temperature and orientation, from a given image. Lastly, the third track dealt with any-to-any relighting, thus a generalization of the first track. The target color temperature and orientation, rather than being pre-determined, are instead given by a guide image. Participants were allowed to make use of their track 1 and 2 solutions for track 3. The tracks had 94, 52, and 56 registered participants, respectively, leading to 20 confirmed submissions in the final competition stage. △ Less

Submitted 27 September, 2020; originally announced September 2020.

Comments: ECCVW 2020. Data and more information on https://github.com/majedelhelou/VIDIT

arXiv:2007.13533 [pdf]

Learning Common Harmonic Waves on Stiefel Manifold -- A New Mathematical Approach for Brain Network Analyses

Authors: Jiazhou Chen, Guoqiang Han, Hongmin Cai, Defu Yang, Paul J. Laurienti, Martin Styner, Guorong Wu, Alzheimer's Disease Neuroimaging Initiative ADNI

Abstract: Converging evidence shows that disease-relevant brain alterations do not appear in random brain locations, instead, its spatial pattern follows large scale brain networks. In this context, a powerful network analysis approach with a mathematical foundation is indispensable to understand the mechanism of neuropathological events spreading throughout the brain. Indeed, the topology of each brain net… ▽ More Converging evidence shows that disease-relevant brain alterations do not appear in random brain locations, instead, its spatial pattern follows large scale brain networks. In this context, a powerful network analysis approach with a mathematical foundation is indispensable to understand the mechanism of neuropathological events spreading throughout the brain. Indeed, the topology of each brain network is governed by its native harmonic waves, which are a set of orthogonal bases derived from the Eigen-system of the underlying Laplacian matrix. To that end, we propose a novel connectome harmonic analysis framework to provide enhanced mathematical insights by detecting frequency-based alterations relevant to brain disorders. The backbone of our framework is a novel manifold algebra appropriate for inference across harmonic waves that overcomes the limitations of using classic Euclidean operations on irregular data structures. The individual harmonic difference is measured by a set of common harmonic waves learned from a population of individual Eigen systems, where each native Eigen-system is regarded as a sample drawn from the Stiefel manifold. Specifically, a manifold optimization scheme is tailored to find the common harmonic waves which reside at the center of Stiefel manifold. To that end, the common harmonic waves constitute the new neuro-biological bases to understand disease progression. Each harmonic wave exhibits a unique propagation pattern of neuro-pathological burdens spreading across brain networks. The statistical power of our novel connectome harmonic analysis approach is evaluated by identifying frequency-based alterations relevant to Alzheimer's disease, where our learning-based manifold approach discovers more significant and reproducible network dysfunction patterns compared to Euclidian methods. △ Less

Submitted 1 July, 2020; originally announced July 2020.

arXiv:2007.12142 [pdf, other]

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

Authors: **** Gu, Haoming Cai, Haoyu Chen, Xiaoxing Ye, Jimmy Ren, Chao Dong

Abstract: Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the eval… ▽ More Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. △ Less

Submitted 26 September, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: This paper has been accepted for publication at ECCV2020

arXiv:2007.11153 [pdf, ps, other]

Output Based Adaptive Distributed Output Observer for Leader-follower Multiagent Systems

Authors: He Cai, Jie Huang

Abstract: The adaptive distributed observer approach has been an effective tool for synthesizing a distributed control law for solving various control problems of leader-follower multiagent systems. However, the existing adaptive distributed observer needs to make use of the full state of the leader system. This assumption not only precludes many practical applications in which only the output of the leader… ▽ More The adaptive distributed observer approach has been an effective tool for synthesizing a distributed control law for solving various control problems of leader-follower multiagent systems. However, the existing adaptive distributed observer needs to make use of the full state of the leader system. This assumption not only precludes many practical applications in which only the output of the leader system is available, but also leads to a high dimension observer. In this communique, we propose an adaptive distributed output observer which only makes use of the output of the leader system, and is thus more practical than the state based adaptive distributed observer. Moreover, the dimension and the information exchange among agents of the proposed adaptive distributed output observer can be significantly smaller than those of the state based adaptive distributed output observer. △ Less

Submitted 21 July, 2020; originally announced July 2020.

arXiv:1910.05859 [pdf, other]

doi 10.1109/TSP.2021.3049618

Accelerated Structured Alternating Projections for Robust Spectrally Sparse Signal Recovery

Authors: HanQin Cai, Jian-Feng Cai, Tianming Wang, Guojian Yin

Abstract: Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without dam**. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit… ▽ More Consider a spectrally sparse signal $\boldsymbol{x}$ that consists of $r$ complex sinusoids with or without dam**. We study the robust recovery problem for the spectrally sparse signal under the fully observed setting, which is about recovering $\boldsymbol{x}$ and a sparse corruption vector $\boldsymbol{s}$ from their sum $\boldsymbol{z}=\boldsymbol{x}+\boldsymbol{s}$. In this paper, we exploit the low-rank property of the Hankel matrix formed by $\boldsymbol{x}$, and formulate the problem as the robust recovery of a corrupted low-rank Hankel matrix. We develop a highly efficient non-convex algorithm, coined Accelerated Structured Alternating Projections (ASAP). The high computational efficiency and low space complexity of ASAP are achieved by fast computations involving structured matrices, and a subspace projection method for accelerated low-rank approximation. Theoretical recovery guarantee with a linear convergence rate has been established for ASAP, under some mild assumptions on $\boldsymbol{x}$ and $\boldsymbol{s}$. Empirical performance comparisons on both synthetic and real-world data confirm the advantages of ASAP, in terms of computational efficiency and robustness aspects. △ Less

Submitted 16 January, 2021; v1 submitted 13 October, 2019; originally announced October 2019.

Journal ref: IEEE Transactions on Signal Processing, 69 (2021): 809-821

arXiv:1909.03197 [pdf]

doi 10.1364/OL.45.000208

Fiber-optic joint time and frequency transfer with the same wavelength

Authors: Jialiang Wang, Chaolei Yue, Yueli Xi, Yanguang Sun, Nan Cheng, Fei Yang, Mingyu Jiang, Jianfeng Sun, Youzhen Gui, Haiwen Cai

Abstract: Optical fiber links have demonstrated their ability to transfer the ultra-stable clock signals. In this paper we propose and demonstrate a new scheme to transfer both time and radio frequency with the same wavelength based on coherent demodulation technique. Time signal is encoded as a binary phase-shift keying (BPSK) to the optical carrier using electro optic modulator (EOM) by phase modulation a… ▽ More Optical fiber links have demonstrated their ability to transfer the ultra-stable clock signals. In this paper we propose and demonstrate a new scheme to transfer both time and radio frequency with the same wavelength based on coherent demodulation technique. Time signal is encoded as a binary phase-shift keying (BPSK) to the optical carrier using electro optic modulator (EOM) by phase modulation and makes sure the frequency signal free from interference with single pulse. The phase changes caused by the fluctuations of the transfer links are actively cancelled at local site by optical delay lines. Radio frequency with 1GHz and time signal with one pulse per second (1PPS) transmitted over a 110km fiber spools are obtained. The experimental results demonstrate that frequency instabilities of 1.7E-14 at 1s and 5.9E-17 at 104s. Moreover, time interval transfer of 1PPS signal reaches sub-ps stability after 1000s. This scheme offers advantages with respect to reduce the channel in fiber network, and can keep time and frequency signal independent of each other. △ Less

Submitted 7 September, 2019; originally announced September 2019.

arXiv:1812.11364 [pdf, other]

Adaptive Synchrosqueezing Transform with a Time-Varying Parameter for Non-stationary Signal Separation

Authors: Lin Li, Haiyan Cai, Qingtang Jiang

Abstract: The continuous wavelet transform (CWT) is a linear time-frequency representation and a powerful tool for analyzing non-stationary signals. The synchrosqueezing transform (SST) is a special type of the reassignment method which not only enhances the energy concentration of CWT in the time-frequency plane, but also separates the components of multicomponent signals. The "bump wavelet" and Morlet's w… ▽ More The continuous wavelet transform (CWT) is a linear time-frequency representation and a powerful tool for analyzing non-stationary signals. The synchrosqueezing transform (SST) is a special type of the reassignment method which not only enhances the energy concentration of CWT in the time-frequency plane, but also separates the components of multicomponent signals. The "bump wavelet" and Morlet's wavelet are commonly used continuous wavelets for the wavelet-based SST. There is a parameter in these wavelets which controls the widths of the time-frequency localization window. In most literature on SST, this parameter is a fixed positive constant. In this paper, we consider the CWT with a time-varying parameter (called the adaptive CWT) and the corresponding SST (called the adaptive SST) for instantaneous frequency estimation and multicomponent signal separation. We also introduce the 2nd-order adaptive SST. We analyze the separation conditions for non-stationary multicomponent signals with the local approximation of linear frequency modulation mode. We derive well-separated conditions of a multicomponent signal based on the adaptive CWT. We propose methods to select the time-varying parameter so that the corresponding adaptive SSTs of the components of a multicomponent signal have sharp representations and are well-separated, and hence the components can be recovered more accurately. We provide comparison experimental results to demonstrate the efficiency and robustness of the proposed adaptive CWT and adaptive SST in separating components of multicomponent signals with fast varying frequencies. △ Less

Submitted 26 September, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

Comments: arXiv admin note: text overlap with arXiv:1812.11292

arXiv:1812.11292 [pdf, other]

Adaptive Short-time Fourier Transform and Synchrosqueezing Transform for Non-stationary Signal Separation

Authors: Lin Li, Haiyan Cai, Hongxia Han, Qingtang Jiang, Hongbing Ji

Abstract: The synchrosqueezing transform, a kind of reassignment method, aims to sharpen the time-frequency representation and to separate the components of a multicomponent non-stationary signal. In this paper, we consider the short-time Fourier transform (STFT) with a time-varying parameter, called the adaptive STFT. Based on the local approximation of linear frequency modulation mode, we analyze the well… ▽ More The synchrosqueezing transform, a kind of reassignment method, aims to sharpen the time-frequency representation and to separate the components of a multicomponent non-stationary signal. In this paper, we consider the short-time Fourier transform (STFT) with a time-varying parameter, called the adaptive STFT. Based on the local approximation of linear frequency modulation mode, we analyze the well-separated condition of non-stationary multicomponent signals using the adaptive STFT with the Gaussian window function. We propose the STFT-based synchrosqueezing transform (FSST) with a time-varying parameter, named the adaptive FSST, to enhance the time-frequency concentration and resolution of a multicomponent signal, and to separate its components more accurately. In addition, we also propose the 2nd-order adaptive FSST to further improve the adaptive FSST for the non-stationary signals with fast-varying frequencies. Furthermore, we present a localized optimization algorithm based on our well-separated condition to estimate the time-varying parameter adaptively and automatically. Simulation results on synthetic signals and the bat echolocation signal are provided to demonstrate the effectiveness and robustness of the proposed method. △ Less

Submitted 26 September, 2019; v1 submitted 29 December, 2018; originally announced December 2018.

arXiv:1807.03216 [pdf, other]

Ballistocardiogram-based Authentication using Convolutional Neural Networks

Authors: Joshua Hebert, Brittany Lewis, Hang Cai, Krishna K. Venkatasubramanian, Matthew Provost, Kelly Charlebois

Abstract: The goal of this work is to demonstrate the use of the ballistocardiogram (BCG) signal, derived using head-mounted wearable devices, as a viable biometric for authentication. The BCG signal is the measure of an person's body acceleration as a result of the heart's ejection of blood. It is a characterization of the cardiac cycle and can be derived non-invasively from the measurement of subtle movem… ▽ More The goal of this work is to demonstrate the use of the ballistocardiogram (BCG) signal, derived using head-mounted wearable devices, as a viable biometric for authentication. The BCG signal is the measure of an person's body acceleration as a result of the heart's ejection of blood. It is a characterization of the cardiac cycle and can be derived non-invasively from the measurement of subtle movements of a person's extremities. In this paper, we use several versions of the BCG signal, derived from accelerometer and gyroscope sensors on a Smart Eyewear (SEW) device, for authentication. The derived BCG signals are used to train a convolutional neural network (CNN) as an authentication model, which is personalized for each subject. We evaluate our authentication models using data from 12 subjects and show that our approach has an equal error rate (EER) of 3.5% immediately after training and 13\% after about 2 months, in the worst case. We also explore the use of our authentication approach for people with motor disabilities. Our analysis using a separate dataset of 6 subjects with non-spastic cerebral palsy shows an EER of 11.2% immediately after training and 21.6% after about 2 months, in the worst-case. △ Less

Submitted 28 June, 2018; originally announced July 2018.

Comments: 8 pages, 6 figures

MSC Class: 68U35

Showing 1–40 of 40 results for author: Cai, H