Search | arXiv e-print repository

doi 10.1364/OL.472155

Cost-effective photonic super-resolution millimeter-wave joint radar-communication system using self-coherent detection

Authors: Wenlin Bai, Peixuan Li, Xihua Zou, Ningyuan Zhong, Wei Pan, Lianshan Yan, Bin Luo

Abstract: A cost-effective millimeter-wave (MMW) joint radar-communication (JRC) system with super resolution is proposed and experimentally demonstrated, using optical heterodyne up-conversion and self-coherent detection down-conversion techniques. The point lies in the designed coherent dual-band constant envelope linear frequency modulation-orthogonal frequency division multiplexing (LFM-OFDM) signal wit… ▽ More A cost-effective millimeter-wave (MMW) joint radar-communication (JRC) system with super resolution is proposed and experimentally demonstrated, using optical heterodyne up-conversion and self-coherent detection down-conversion techniques. The point lies in the designed coherent dual-band constant envelope linear frequency modulation-orthogonal frequency division multiplexing (LFM-OFDM) signal with opposite phase modulation indexes for the JRC system. Then the self-coherent detection, as a simple and low-cost means, is accordingly facilitated for both de-chir** of MMW radar and frequency down-conversion reception of MMW communication, which circumvents the costly high-speed mixers along with MMW local oscillators and more significantly achieves the real-time decomposition of radar and communication information. Furthermore, a super resolution radar range profile is realized through the coherent fusion processing of dual-band JRC signal. In experiments, a dual-band LFM-OFDM JRC signal centered at 54-GHz and 61-GHz is generated. The dual bands are featured with an identical instantaneous bandwidth of 2 GHz and carry an OFDM signal of 1 GBaud, which help to achieve a 6-Gbit/s data rate for communication and a 1.76-cm range resolution for radar. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2210.00713 [pdf, other]

Efficient Meta-Learning for Continual Learning with Taylor Expansion Approximation

Authors: Xiaohan Zou, Tong Lin

Abstract: Continual learning aims to alleviate catastrophic forgetting when handling consecutive tasks under non-stationary distributions. Gradient-based meta-learning algorithms have shown the capability to implicitly solve the transfer-interference trade-off problem between different examples. However, they still suffer from the catastrophic forgetting problem in the setting of continual learning, since t… ▽ More Continual learning aims to alleviate catastrophic forgetting when handling consecutive tasks under non-stationary distributions. Gradient-based meta-learning algorithms have shown the capability to implicitly solve the transfer-interference trade-off problem between different examples. However, they still suffer from the catastrophic forgetting problem in the setting of continual learning, since the past data of previous tasks are no longer available. In this work, we propose a novel efficient meta-learning algorithm for solving the online continual learning problem, where the regularization terms and learning rates are adapted to the Taylor approximation of the parameter's importance to mitigate forgetting. The proposed method expresses the gradient of the meta-loss in closed-form and thus avoid computing second-order derivative which is computationally inhibitable. We also use Proximal Gradient Descent to further improve computational efficiency and accuracy. Experiments on diverse benchmarks show that our method achieves better or on-par performance and much higher efficiency compared to the state-of-the-art approaches. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: Accepted by the 2022 International Joint Conference on Neural Networks (IJCNN 2022)

arXiv:2209.13822 [pdf, other]

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

Authors: Xiaohan Zou, Changqiao Wu, Lele Cheng, Zhongyuan Wang

Abstract: Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual toke… ▽ More Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency. To address these limitations, some recent works simply aggregate the token-wise similarities to achieve fine-grained alignment, but they lack intuitive explanations as well as neglect the relationships between token-level features and global representations with high-level semantics. In this work, we rethink fine-grained cross-modal alignment and devise a new model-agnostic formulation for it. We additionally demystify the recent popular works and subsume them into our scheme. Furthermore, inspired by optimal transport theory, we introduce TokenFlow, an instantiation of the proposed scheme. By modifying only the similarity function, the performance of our method is comparable to the SoTA algorithms with heavy model designs on major video-text retrieval benchmarks. The visualization further indicates that TokenFlow successfully leverages the fine-grained information and achieves better interpretability. △ Less

Submitted 2 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.12254 [pdf, other]

From One to Many: Dynamic Cross Attention Networks for LiDAR and Camera Fusion

Authors: Rui Wan, Shuangjie Xu, Wei Wu, Xiaoyi Zou, Tongyi Cao

Abstract: LiDAR and cameras are two complementary sensors for 3D perception in autonomous driving. LiDAR point clouds have accurate spatial and geometry information, while RGB images provide textural and color data for context reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration, namely one-to-one map**.… ▽ More LiDAR and cameras are two complementary sensors for 3D perception in autonomous driving. LiDAR point clouds have accurate spatial and geometry information, while RGB images provide textural and color data for context reasoning. To exploit LiDAR and cameras jointly, existing fusion methods tend to align each 3D point to only one projected image pixel based on calibration, namely one-to-one map**. However, the performance of these approaches highly relies on the calibration quality, which is sensitive to the temporal and spatial synchronization of sensors. Therefore, we propose a Dynamic Cross Attention (DCA) module with a novel one-to-many cross-modality map** that learns multiple offsets from the initial projection towards the neighborhood and thus develops tolerance to calibration error. Moreover, a \textit{dynamic query enhancement} is proposed to perceive the model-independent calibration, which further strengthens DCA's tolerance to the initial misalignment. The whole fusion architecture named Dynamic Cross Attention Network (DCAN) exploits multi-level image features and adapts to multiple representations of point clouds, which allows DCA to serve as a plug-in fusion module. Extensive experiments on nuScenes and KITTI prove DCA's effectiveness. The proposed DCAN outperforms state-of-the-art methods on the nuScenes detection challenge. △ Less

Submitted 25 September, 2022; originally announced September 2022.

arXiv:2209.10234 [pdf, other]

Cold-atom sources for the Matter-wave laser Interferometric Gravitation Antenna (MIGA)

Authors: Quentin Beaufils, Leonid A. Sidorenkov, Pierre Lebegue, Bertrand Venon, David Holleville, Laurent Volodimer, Michel Lours, Joseph Junca, Xinhao Zou, Andrea Bertoldi, Marco Prevedelli, Dylan O. Sabulsky, Philippe Bouyer, Arnaud Landragin, Benjamin Canuel, Remi Geiger

Abstract: The Matter-wave laser Interferometric Gravitation Antenna (MIGA) is an underground instrument using cold-atom interferometry to perform precision measurements of gravity gradients and strains. Following its installation at the low noise underground laboratory LSBB in the South-East of France, it will serve as a prototype for gravitational wave detectors with a horizontal baseline of 150 meters. Th… ▽ More The Matter-wave laser Interferometric Gravitation Antenna (MIGA) is an underground instrument using cold-atom interferometry to perform precision measurements of gravity gradients and strains. Following its installation at the low noise underground laboratory LSBB in the South-East of France, it will serve as a prototype for gravitational wave detectors with a horizontal baseline of 150 meters. Three spatially separated cold-atom interferometers will be driven by two common counter-propagating lasers to perform a measurement of the gravity gradient along this baseline. This article presents the cold-atom sources of MIGA, focusing on the design choices, the realization of the systems, the performances and the integration within the MIGA instrument. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2209.07663 [pdf, other]

Monolith: Real Time Recommendation System With Collisionless Embedding Table

Authors: Zhuoran Liu, Leqi Zou, Xuan Zou, Caihua Wang, Biao Zhang, Da Tang, Bolin Zhu, Yijie Zhu, Peng Wu, Ke Wang, Youlong Cheng

Abstract: Building a scalable and real-time recommendation system is vital for many businesses driven by time-sensitive customer feedback, such as short-videos ranking or online ads. Despite the ubiquitous adoption of production-scale deep learning frameworks like TensorFlow or PyTorch, these general-purpose frameworks fall short of business demands in recommendation scenarios for various reasons: on one ha… ▽ More Building a scalable and real-time recommendation system is vital for many businesses driven by time-sensitive customer feedback, such as short-videos ranking or online ads. Despite the ubiquitous adoption of production-scale deep learning frameworks like TensorFlow or PyTorch, these general-purpose frameworks fall short of business demands in recommendation scenarios for various reasons: on one hand, tweaking systems based on static parameters and dense computations for recommendation with dynamic and sparse features is detrimental to model quality; on the other hand, such frameworks are designed with batch-training stage and serving stage completely separated, preventing the model from interacting with customer feedback in real-time. These issues led us to reexamine traditional approaches and explore radically different design choices. In this paper, we present Monolith, a system tailored for online training. Our design has been driven by observations of our application workloads and production environment that reflects a marked departure from other recommendations systems. Our contributions are manifold: first, we crafted a collisionless embedding table with optimizations such as expirable embeddings and frequency filtering to reduce its memory footprint; second, we provide an production-ready online training architecture with high fault-tolerance; finally, we proved that system reliability could be traded-off for real-time learning. Monolith has successfully landed in the BytePlus Recommend product. △ Less

Submitted 27 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: ORSUM@ACM RecSys 2022

arXiv:2209.07300 [pdf]

Multi-Task Mixture Density Graph Neural Networks for Predicting Cu-based Single-Atom Alloy Catalysts for CO2 Reduction Reaction

Authors: Chen Liang, Bowen Wang, Shaogang Hao, Guangyong Chen, Pheng-Ann Heng, Xiaolong Zou

Abstract: Graph neural networks (GNNs) have drawn more and more attention from material scientists and demonstrated a high capacity to establish connections between the structure and properties. However, with only unrelaxed structures provided as input, few GNN models can predict the thermodynamic properties of relaxed configurations with an acceptable level of error. In this work, we develop a multi-task (… ▽ More Graph neural networks (GNNs) have drawn more and more attention from material scientists and demonstrated a high capacity to establish connections between the structure and properties. However, with only unrelaxed structures provided as input, few GNN models can predict the thermodynamic properties of relaxed configurations with an acceptable level of error. In this work, we develop a multi-task (MT) architecture based on DimeNet++ and mixture density networks to improve the performance of such task. Taking CO adsorption on Cu-based single-atom alloy catalysts as an illustration, we show that our method can reliably estimate CO adsorption energy with a mean absolute error of 0.087 eV from the initial CO adsorption structures without costly first-principles calculations. Further, compared to other state-of-the-art GNN methods, our model exhibits improved generalization ability when predicting catalytic performance of out-of-domain configurations, built with either unseen substrate surfaces or do** species. We show that the proposed MT GNN strategy can facilitate catalyst discovery. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 22 pages, 3 figures, 2 tables

arXiv:2209.05054 [pdf, other]

doi 10.1145/3503161.3547880

High-Fidelity Variable-Rate Image Compression via Invertible Activation Transformation

Authors: Shilv Cai, Zhijun Zhang, Liqun Chen, Luxin Yan, Sheng Zhong, Xu Zou

Abstract: Learning-based methods have effectively promoted the community of image compression. Meanwhile, variational autoencoder (VAE) based variable-rate approaches have recently gained much attention to avoid the usage of a set of different networks for various compression rates. Despite the remarkable performance that has been achieved, these approaches would be readily corrupted once multiple compressi… ▽ More Learning-based methods have effectively promoted the community of image compression. Meanwhile, variational autoencoder (VAE) based variable-rate approaches have recently gained much attention to avoid the usage of a set of different networks for various compression rates. Despite the remarkable performance that has been achieved, these approaches would be readily corrupted once multiple compression/decompression operations are executed, resulting in the fact that image quality would be tremendously dropped and strong artifacts would appear. Thus, we try to tackle the issue of high-fidelity fine variable-rate image compression and propose the Invertible Activation Transformation (IAT) module. We implement the IAT in a mathematical invertible manner on a single rate Invertible Neural Network (INN) based model and the quality level (QLevel) would be fed into the IAT to generate scaling and bias tensors. IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity. Extensive experiments demonstrate that the single rate image compression model equipped with our IAT module has the ability to achieve variable-rate control without any compromise. And our IAT-embedded model obtains comparable rate-distortion performance with recent learning-based image compression methods. Furthermore, our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: Accept to ACMMM2022

MSC Class: 68P30 ACM Class: I.4.2

arXiv:2209.01148 [pdf, ps, other]

ARST: Auto-Regressive Surgical Transformer for Phase Recognition from Laparoscopic Videos

Authors: Xiaoyang Zou, Wenyong Liu, Junchen Wang, Rong Tao, Guoyan Zheng

Abstract: Phase recognition plays an essential role for surgical workflow analysis in computer assisted intervention. Transformer, originally proposed for sequential data modeling in natural language processing, has been successfully applied to surgical phase recognition. Existing works based on transformer mainly focus on modeling attention dependency, without introducing auto-regression. In this work, an… ▽ More Phase recognition plays an essential role for surgical workflow analysis in computer assisted intervention. Transformer, originally proposed for sequential data modeling in natural language processing, has been successfully applied to surgical phase recognition. Existing works based on transformer mainly focus on modeling attention dependency, without introducing auto-regression. In this work, an Auto-Regressive Surgical Transformer, referred as ARST, is first proposed for on-line surgical phase recognition from laparoscopic videos, modeling the inter-phase correlation implicitly by conditional probability distribution. To reduce inference bias and to enhance phase consistency, we further develop a consistency constraint inference strategy based on auto-regression. We conduct comprehensive validations on a well-known public dataset Cholec80. Experimental results show that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and achieves an inference rate of 66 frames per second (fps). △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: 11 Pages, 3 figures

arXiv:2208.13040 [pdf, other]

YOLOX-PAI: An Improved YOLOX, Stronger and Faster than YOLOv6

Authors: Ziheng Wu, Xinyi Zou, Wenmeng Zhou, Jun Huang

Abstract: We develop an all-in-one computer vision toolbox named EasyCV to facilitate the use of various SOTA computer vision methods. Recently, we add YOLOX-PAI, an improved version of YOLOX, into EasyCV. We conduct ablation studies to investigate the influence of some detection methods on YOLOX. We also provide an easy use for PAI-Blade which is used to accelerate the inference process based on BladeDISC… ▽ More We develop an all-in-one computer vision toolbox named EasyCV to facilitate the use of various SOTA computer vision methods. Recently, we add YOLOX-PAI, an improved version of YOLOX, into EasyCV. We conduct ablation studies to investigate the influence of some detection methods on YOLOX. We also provide an easy use for PAI-Blade which is used to accelerate the inference process based on BladeDISC and TensorRT. Finally, we receive 42.8 mAP on COCO dateset within 1.0 ms on a single NVIDIA V100 GPU, which is a bit faster than YOLOv6. A simple but efficient predictor api is also designed in EasyCV to conduct end2end object detection. Codes and models are now available at: https://github.com/alibaba/EasyCV. △ Less

Submitted 26 September, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: 5 pages, 5 figures

arXiv:2208.11184 [pdf, other]

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, **gzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR. △ Less

Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Camera-ready version

arXiv:2208.08074 [pdf]

Nanoscale three-dimensional magnetic sensing with a probabilistic nanomagnet driven by spin-orbit torque

Authors: Shuai Zhang, Shihao Li, Zhe Guo, Yan Xu, Ruofan Li, Zhenjiang Chen, Song Min, Xiaofei Yang, Liang Li, Jeongmin Hong, Xuecheng Zou, Long You

Abstract: Detection of vector magnetic fields at nanoscale dimensions is critical in applications ranging from basic material science, to medical diagnostic. Meanwhile, an all-electric operation is of great significance for achieving a simple and compact sensing system. Here, we propose and experimentally demonstrate a simple approach to sensing a vector magnetic field at nanoscale dimensions, by monitoring… ▽ More Detection of vector magnetic fields at nanoscale dimensions is critical in applications ranging from basic material science, to medical diagnostic. Meanwhile, an all-electric operation is of great significance for achieving a simple and compact sensing system. Here, we propose and experimentally demonstrate a simple approach to sensing a vector magnetic field at nanoscale dimensions, by monitoring a probabilistic nanomagnet's transition probability from a metastable state, excited by a driving current due to SOT, to a settled state. We achieve sensitivities for Hx, Hy, and Hz of 1.02%/Oe, 1.09%/Oe and 3.43%/Oe, respectively, with a 200 x 200 nm^2 nanomagnet. The minimum detectable field is dependent on the driving pulse events N, and is expected to be as low as 1 uT if N = 3 x 10^6. △ Less

Submitted 17 August, 2022; originally announced August 2022.

Comments: 15 pages, 4 figures

arXiv:2208.05772 [pdf, other]

KiPA22 Report: U-Net with Contour Regularization for Renal Structures Segmentation

Authors: Kangqing Ye, Peng Liu, Xiaoyang Zou, Qin Zhou, Guoyan Zheng

Abstract: Three-dimensional (3D) integrated renal structures (IRS) segmentation is important in clinical practice. With the advancement of deep learning techniques, many powerful frameworks focusing on medical image segmentation are proposed. In this challenge, we utilized the nnU-Net framework, which is the state-of-the-art method for medical image segmentation. To reduce the outlier prediction for the tum… ▽ More Three-dimensional (3D) integrated renal structures (IRS) segmentation is important in clinical practice. With the advancement of deep learning techniques, many powerful frameworks focusing on medical image segmentation are proposed. In this challenge, we utilized the nnU-Net framework, which is the state-of-the-art method for medical image segmentation. To reduce the outlier prediction for the tumor label, we combine contour regularization (CR) loss of the tumor label with Dice loss and cross-entropy loss to improve this phenomenon. △ Less

Submitted 6 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.04318 [pdf, other]

Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution

Authors: Hongwei Li, Tao Dai, Yiming Li, Xueyi Zou, Shu-Tao Xia

Abstract: Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution… ▽ More Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors. However, LIIF often suffers from structural distortions and ringing artifacts around edges, mostly because all pixels share the same model, thus ignoring the local properties of the image. In this paper, we propose a novel adaptive local image function (A-LIIF) to alleviate this problem. Specifically, our A-LIIF consists of two main components: an encoder and a expansion network. The former captures cross-scale image features, while the latter models the continuous up-scaling function by a weighted combination of multiple local implicit image functions. Accordingly, our A-LIIF can reconstruct the high-frequency textures and structures more accurately. Experiments on multiple benchmark datasets verify the effectiveness of our method. Our codes are available at \url{https://github.com/LeeHW-THU/A-LIIF}. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: This paper is accepted by ICIP 2022. 5 pages

arXiv:2208.01435 [pdf]

doi 10.1038/s41467-022-32204-4

Highly Efficient and Selective Extraction of Gold by Reduced Graphene Oxide

Authors: Fei Li, Jiuyi Zhu, Pengzhan Sun, Mingrui Zhang, Zhenqing Li, Dingxin Xu, Xinyu Gong, Xiaolong Zou, A. K. Geim, Yang Su, Hui-Ming Cheng

Abstract: Materials that are capable of extracting gold from complex sources, especially electronic waste (e-waste) with high efficiency are needed for gold resource sustainability and effective e-waste recycling. However, it remains challenging to achieve high extraction capacity to trace amount of gold, and precise selectivity to gold over a wide range of complex co-existing elements. Here we report a red… ▽ More Materials that are capable of extracting gold from complex sources, especially electronic waste (e-waste) with high efficiency are needed for gold resource sustainability and effective e-waste recycling. However, it remains challenging to achieve high extraction capacity to trace amount of gold, and precise selectivity to gold over a wide range of complex co-existing elements. Here we report a reduced graphene oxide (rGO) material that has an ultrahigh extraction capacity for trace amounts of gold (1,850 mg/g and 1,180 mg/g to 10 ppm and 1 ppm gold). The excellent gold extraction behavior is accounted to the graphene areas and oxidized regions of rGO. The graphene areas spontaneously reduce gold ions to metallic gold, and the oxidized regions provide a good dispersibility so that efficient adsorption and reduction of gold ions by the graphene area can be realized. The rGO is also highly selective to gold ions. By controlling the protonation process of the functional groups on the oxidized regions of rGO, it shows an exclusive gold extraction without adsorption of 14 co-existing elements seen in e-waste. These discoveries are further exploited in highly efficient, continuous gold recycling from e-waste with good scalability and economic viability, as exemplified by extracting gold from e-waste using a rGO membrane based flow-through process. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2208.00639 [pdf, other]

Dress Well via Fashion Cognitive Learning

Authors: Kaicheng Pang, Xingxing Zou, Waikeung Wong

Abstract: Fashion compatibility models enable online retailers to easily obtain a large number of outfit compositions with good quality. However, effective fashion recommendation demands precise service for each customer with a deeper cognition of fashion. In this paper, we conduct the first study on fashion cognitive learning, which is fashion recommendations conditioned on personal physical information. T… ▽ More Fashion compatibility models enable online retailers to easily obtain a large number of outfit compositions with good quality. However, effective fashion recommendation demands precise service for each customer with a deeper cognition of fashion. In this paper, we conduct the first study on fashion cognitive learning, which is fashion recommendations conditioned on personal physical information. To this end, we propose a Fashion Cognitive Network (FCN) to learn the relationships among visual-semantic embedding of outfit composition and appearance features of individuals. FCN contains two submodules, namely outfit encoder and Multi-label Graph Neural Network (ML-GCN). The outfit encoder uses a convolutional layer to encode an outfit into an outfit embedding. The latter module learns label classifiers via stacked GCN. We conducted extensive experiments on the newly collected O4U dataset, and the results provide strong qualitative and quantitative evidence that our framework outperforms alternative methods. △ Less

Submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.07973 [pdf, other]

Learn-to-Decompose: Cascaded Decomposition Network for Cross-Domain Few-Shot Facial Expression Recognition

Authors: Xinyi Zou, Yan Yan, **g-Hao Xue, Si Chen, Hanzi Wang

Abstract: Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound expressions in the target domain. Specifically, we… ▽ More Most existing compound facial expression recognition (FER) methods rely on large-scale labeled compound expression data for training. However, collecting such data is labor-intensive and time-consuming. In this paper, we address the compound FER task in the cross-domain few-shot learning (FSL) setting, which requires only a few samples of compound expressions in the target domain. Specifically, we propose a novel cascaded decomposition network (CDNet), which cascades several learn-to-decompose modules with shared parameters based on a sequential decomposition mechanism, to obtain a transferable feature space. To alleviate the overfitting problem caused by limited base classes in our task, a partial regularization strategy is designed to effectively exploit the best of both episodic training and batch training. By training across similar tasks on multiple basic expression datasets, CDNet learns the ability of learn-to-decompose that can be easily adapted to identify unseen compound expressions. Extensive experiments on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed CDNet against several state-of-the-art FSL methods. Code is available at: https://github.com/zouxinyi0625/CDNet. △ Less

Submitted 16 July, 2022; originally announced July 2022.

Comments: 17 pages, 5 figures

arXiv:2207.00592 [pdf, other]

Dissecting Service Mesh Overheads

Authors: Xiangfeng Zhu, Guozhen She, Bowen Xue, Yu Zhang, Yongsu Zhang, Xuan Kelvin Zou, Xiongchun Duan, Peng He, Arvind Krishnamurthy, Matthew Lentz, Danyang Zhuo, Ratul Mahajan

Abstract: Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect different services that form a distributed application. However, because of the way they interpose on application traffic, they can substantially increase application latency and resource consumption. We develop a decompositional approach and a tool, called MeshInsight, to system… ▽ More Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect different services that form a distributed application. However, because of the way they interpose on application traffic, they can substantially increase application latency and resource consumption. We develop a decompositional approach and a tool, called MeshInsight, to systematically characterize the overhead of service meshes and to help developers quantify overhead in deployment scenarios of interest. Using MeshInsight, we confirm that service meshes can have high overhead -- up to 185% higher latency and up to 92% more virtual CPU cores for our benchmark applications -- but the severity is intimately tied to how they are configured and the application workload. The primary contributors to overhead vary based on the configuration too. IPC (inter-process communication) and socket writes dominate when the service mesh operates as a TCP proxy, but protocol parsing dominates when it operates as an HTTP proxy. MeshInsight also enables us to study the end-to-end impact of optimizations to service meshes. We show that not all seemingly-promising optimizations lead to a notable overhead reduction in realistic settings. △ Less

Submitted 2 July, 2022; originally announced July 2022.

arXiv:2206.04221 [pdf, other]

Analyzing Folktales of Different Regions Using Topic Modeling and Clustering

Authors: Jacob Werzinsky, Zhiyan Zhong, Xuedan Zou

Abstract: This paper employs two major natural language processing techniques, topic modeling and clustering, to find patterns in folktales and reveal cultural relationships between regions. In particular, we used Latent Dirichlet Allocation and BERTopic to extract the recurring elements as well as K-means clustering to group folktales. Our paper tries to answer the question what are the similarities and di… ▽ More This paper employs two major natural language processing techniques, topic modeling and clustering, to find patterns in folktales and reveal cultural relationships between regions. In particular, we used Latent Dirichlet Allocation and BERTopic to extract the recurring elements as well as K-means clustering to group folktales. Our paper tries to answer the question what are the similarities and differences between folktales, and what do they say about culture. Here we show that the common trends between folktales are family, food, traditional gender roles, mythological figures, and animals. Also, folktales topics differ based on geographical location with folktales found in different regions having different animals and environment. We were not surprised to find that religious figures and animals are some of the common topics in all cultures. However, we were surprised that European and Asian folktales were often paired together. Our results demonstrate the prevalence of certain elements in cultures across the world. We anticipate our work to be a resource to future research of folktales and an example of using natural language processing to analyze documents in specific domains. Furthermore, since we only analyzed the documents based on their topics, more work could be done in analyzing the structure, sentiment, and the characters of these folktales. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: 5 pages, 2 figures

arXiv:2205.14321 [pdf, other]

doi 10.1145/3477495.3531942

Automatic Expert Selection for Multi-Scenario and Multi-Task Search

Authors: Xinyu Zou, Zhi Hu, Yiming Zhao, Xuchu Ding, Zhongyi Liu, Chenliang Li, Aixin Sun

Abstract: Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained demands by separating services for different user sectors, e.g., by user's geographical region. Under each scenario there is a need to optimize multiple task-specific targets e.g., click through rate and conversion rate, known as multi-task learning (MTL). Recent solutions for MSL and MTL are mostly based on… ▽ More Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained demands by separating services for different user sectors, e.g., by user's geographical region. Under each scenario there is a need to optimize multiple task-specific targets e.g., click through rate and conversion rate, known as multi-task learning (MTL). Recent solutions for MSL and MTL are mostly based on the multi-gate mixture-of-experts (MMoE) architecture. MMoE structure is typically static and its design requires domain-specific knowledge, making it less effective in handling both MSL and MTL. In this paper, we propose a novel Automatic Expert Selection framework for Multi-scenario and Multi-task search, named AESM^{2}. AESM^{2} integrates both MSL and MTL into a unified framework with an automatic structure learning. Specifically, AESM^{2} stacks multi-task layers over multi-scenario layers. This hierarchical design enables us to flexibly establish intrinsic connections between different scenarios, and at the same time also supports high-level feature extraction for different tasks. At each multi-scenario/multi-task layer, a novel expert selection algorithm is proposed to automatically identify scenario-/task-specific and shared experts for each input. Experiments over two real-world large-scale datasets demonstrate the effectiveness of AESM^{2} over a battery of strong baselines. Online A/B test also shows substantial performance gain on multiple metrics. Currently, AESM^{2} has been deployed online for serving major traffic. △ Less

Submitted 6 June, 2022; v1 submitted 27 May, 2022; originally announced May 2022.

Comments: Accepted by SIGIR 2022; 10 pages, 8 figures

arXiv:2205.10195 [pdf, other]

Unsupervised Flow-Aligned Sequence-to-Sequence Learning for Video Restoration

Authors: **g Lin, Xiaowan Hu, Yuanhao Cai, Haoqian Wang, Youliang Yan, Xueyi Zou, Yulun Zhang, Luc Van Gool

Abstract: How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR). In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem. On the one hand, the sequence-to-sequence model, which has proven capable of sequence modeling in the field of natural language processing, is exp… ▽ More How to properly model the inter-frame relation within the video sequence is an important but unsolved challenge for video restoration (VR). In this work, we propose an unsupervised flow-aligned sequence-to-sequence model (S2SVR) to address this problem. On the one hand, the sequence-to-sequence model, which has proven capable of sequence modeling in the field of natural language processing, is explored for the first time in VR. Optimized serialization modeling shows potential in capturing long-range dependencies among frames. On the other hand, we equip the sequence-to-sequence model with an unsupervised optical flow estimator to maximize its potential. The flow estimator is trained with our proposed unsupervised distillation loss, which can alleviate the data discrepancy and inaccurate degraded optical flow issues of previous flow-based methods. With reliable optical flow, we can establish accurate correspondence among multiple frames, narrowing the domain difference between 1D language and 2D misaligned frames and improving the potential of the sequence-to-sequence model. S2SVR shows superior performance in multiple VR tasks, including video deblurring, video super-resolution, and compressed video quality enhancement. Code and models are publicly available at https://github.com/lin**g7/VR-Baseline △ Less

Submitted 16 June, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: ICML 2022; The first sequence-to-sequence model for video restoration

arXiv:2205.08685 [pdf, other]

Policy Distillation with Selective Input Gradient Regularization for Efficient Interpretability

Authors: **wei Xing, Takashi Nagata, Xinyun Zou, Emre Neftci, Jeffrey L. Krichmar

Abstract: Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time require… ▽ More Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach. △ Less

Submitted 17 May, 2022; originally announced May 2022.

arXiv:2205.05510 [pdf, ps, other]

Invariance entropy for uncertain control systems

Authors: Xingfu Zhong, Yu Huang, Xingfu Zou

Abstract: We introduce a notion of invariance entropy for uncertain control systems, which is, roughly speaking, the exponential growth rate of "branches" of "trees" that are formed by controls and are necessary to achieve invariance of controlled invariant subsets of the state space. This entropy extends the invariance entropy for deterministic control systems introduced by Colonius and Kawan (2009). We sh… ▽ More We introduce a notion of invariance entropy for uncertain control systems, which is, roughly speaking, the exponential growth rate of "branches" of "trees" that are formed by controls and are necessary to achieve invariance of controlled invariant subsets of the state space. This entropy extends the invariance entropy for deterministic control systems introduced by Colonius and Kawan (2009). We show that invariance feedback entropy, proposed by Tomar, Rungger, and Zamani (2020), is bounded from below by our invariance entropy. We generalize the formula for the calculation of entropy of invariant partitions obtained by Tomar, Kawan, and Zamani (2020) to quasi-invariant-partitions. Moreover, we also derive lower and upper bounds for entropy of a quasi-invariant-partition by spectral radii of its adjacency matrix and weighted adjacency matrix. With some reasonable assumptions, we obtain explicit formulas for computing invariance entropy for uncertain control systems and invariance feedback entropy for finite controlled invariant sets. △ Less

Submitted 11 May, 2022; originally announced May 2022.

MSC Class: 37B40; 93C55

arXiv:2204.13612 [pdf, other]

doi 10.1103/PhysRevB.106.064420

Surface critical properties of the three-dimensional clock model

Authors: Xuan Zou, Shuo Liu, Wenan Guo

Abstract: Using Monte Carlo simulations and finite-size scaling analysis, we show that the $q$-state clock model with $q=6$ on the simple cubic lattice with open surfaces has a rich phase diagram; in particular, it has an extraordinary-log phase, besides the ordinary and extraordinary transitions at the bulk critical point. We prove numerically that the presence of the intermediate extraordinary-log phase i… ▽ More Using Monte Carlo simulations and finite-size scaling analysis, we show that the $q$-state clock model with $q=6$ on the simple cubic lattice with open surfaces has a rich phase diagram; in particular, it has an extraordinary-log phase, besides the ordinary and extraordinary transitions at the bulk critical point. We prove numerically that the presence of the intermediate extraordinary-log phase is due to the emergence of an O(2) symmetry in the surface state before the surface enters the $Z_{q}$ symmetry-breaking region as the surface coupling is increased at the bulk critical point, while O(2) symmetry emerges for the bulk. The critical behaviors of the extraordinary-log transition, as well as the ordinary and the special transition separating the ordinary and the extraordinary-log transition are obtained. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 12 pages, 8 figures

Journal ref: Phys. Rev. B 106, 064420 (2022)

arXiv:2204.12137 [pdf, other]

A gravity antenna based on quantum technologies: MIGA

Authors: B. Canuel, X. Zou, D. O. Sabulsky, J. Junca, A. Bertoldi, Q. Beaufils, R. Geiger, A. Landragin, M. Prevedelli, S. Gaffet, D. Boyer, I. Lázaro Roche, P. Bouyer

Abstract: We report the realization of a large scale gravity antenna based on matter-wave interferometry, the MIGA project. This experiment consists in an array of cold Rb sources correlated by a 150 m long optical cavity. MIGA is in construction at the LSBB underground laboratory, a site that benefits from a low background noise and is an ideal premise to carry out precision gravity measurements. The MIGA… ▽ More We report the realization of a large scale gravity antenna based on matter-wave interferometry, the MIGA project. This experiment consists in an array of cold Rb sources correlated by a 150 m long optical cavity. MIGA is in construction at the LSBB underground laboratory, a site that benefits from a low background noise and is an ideal premise to carry out precision gravity measurements. The MIGA facility will be a demonstrator for a new generation of GW detector based on atom interferometry that could open the infrasound window for the observation of GWs. We describe here the status of the instrument construction, focusing on the infrastructure works at LSBB and the realization of the vacuum vessel of the antenna. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Contribution to the 2022 Gravitation session of the 56th Rencontres de Moriond

arXiv:2204.09314 [pdf, other]

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yun** Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, Wangmeng Zuo, Pavel Ostyakov , et al. (54 additional authors not shown)

Abstract: This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and qua… ▽ More This paper reviews the NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video. In this challenge, we proposed the LDV 2.0 dataset, which includes the LDV dataset (240 videos) and 95 additional videos. This challenge includes three tracks. Track 1 aims at enhancing the videos compressed by HEVC at a fixed QP. Track 2 and Track 3 target both the super-resolution and quality enhancement of HEVC compressed video. They require x2 and x4 super-resolution, respectively. The three tracks totally attract more than 600 registrations. In the test phase, 8 teams, 8 teams and 12 teams submitted the final results to Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution and quality enhancement of compressed video. The proposed LDV 2.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge (including open-sourced codes) is at https://github.com/RenYang-home/NTIRE22_VEnh_SR. △ Less

Submitted 25 April, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2204.07017 [pdf, other]

OneMax is not the Easiest Function for Fitness Improvements

Authors: Marc Kaufmann, Maxime Larcher, Johannes Lengler, Xun Zou

Abstract: We study the $(1:s+1)$ success rule for controlling the population size of the $(1,λ)$-EA. It was shown by Hevia Fajardo and Sudholt that this parameter control mechanism can run into problems for large $s$ if the fitness landscape is too easy. They conjectured that this problem is worst for the OneMax benchmark, since in some well-established sense OneMax is known to be the easiest fitness landsc… ▽ More We study the $(1:s+1)$ success rule for controlling the population size of the $(1,λ)$-EA. It was shown by Hevia Fajardo and Sudholt that this parameter control mechanism can run into problems for large $s$ if the fitness landscape is too easy. They conjectured that this problem is worst for the OneMax benchmark, since in some well-established sense OneMax is known to be the easiest fitness landscape. In this paper we disprove this conjecture and show that OneMax is not the easiest fitness landscape with respect to finding improving steps. As a consequence, we show that there exists $s$ and $\varepsilon$ such that the self-adjusting $(1,λ)$-EA with $(1:s+1)$-rule optimizes OneMax efficiently when started with $\varepsilon n$ zero-bits, but does not find the optimum in polynomial time on Dynamic BinVal. Hence, we show that there are landscapes where the problem of the $(1:s+1)$-rule for controlling the population size of the $(1, λ)$-EA is more severe than for OneMax. △ Less

Submitted 24 January, 2024; v1 submitted 14 April, 2022; originally announced April 2022.

MSC Class: 68W50

arXiv:2204.06240 [pdf, other]

CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU

Authors: Zangwei Zheng, Pengtai Xu, Xuan Zou, Da Tang, Zhen Li, Chenguang Xi, Peng Wu, Leqi Zou, Yijie Zhu, Ming Chen, Xiangzhuo Ding, Fuzhao Xue, Ziheng Qin, Youlong Cheng, Yang You

Abstract: The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vis… ▽ More The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clip** (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/bytedance/LargeBatchCTR. △ Less

Submitted 30 November, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

Comments: AAAI 2023

arXiv:2204.04911 [pdf, other]

Category-Aware Transformer Network for Better Human-Object Interaction Detection

Authors: Leizhen Dong, Zhimin Li, Kunlun Xu, Zhijun Zhang, Luxin Yan, Sheng Zhong, Xu Zou

Abstract: Human-Object Interactions (HOI) detection, which aims to localize a human and a relevant object while recognizing their interaction, is crucial for understanding a still image. Recently, transformer-based models have significantly advanced the progress of HOI detection. However, the capability of these models has not been fully explored since the Object Query of the model is always simply initiali… ▽ More Human-Object Interactions (HOI) detection, which aims to localize a human and a relevant object while recognizing their interaction, is crucial for understanding a still image. Recently, transformer-based models have significantly advanced the progress of HOI detection. However, the capability of these models has not been fully explored since the Object Query of the model is always simply initialized as just zeros, which would affect the performance. In this paper, we try to study the issue of promoting transformer-based HOI detectors by initializing the Object Query with category-aware semantic information. To this end, we innovatively propose the Category-Aware Transformer Network (CATN). Specifically, the Object Query would be initialized via category priors represented by an external object detection model to yield better performance. Moreover, such category priors can be further used for enhancing the representation ability of features via the attention mechanism. We have firstly verified our idea via the Oracle experiment by initializing the Object Query with the groundtruth category information. And then extensive experiments have been conducted to show that a HOI detection model equipped with our idea outperforms the baseline by a large margin to achieve a new state-of-the-art result. △ Less

Submitted 9 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

Comments: Accepted by CVPR2022

arXiv:2204.02898 [pdf, other]

End-to-End Instance Edge Detection

Authors: Xueyan Zou, Haotian Liu, Yong Jae Lee

Abstract: Edge detection has long been an important problem in the field of computer vision. Previous works have explored category-agnostic or category-aware edge detection. In this paper, we explore edge detection in the context of object instances. Although object boundaries could be easily derived from segmentation masks, in practice, instance segmentation models are trained to maximize IoU to the ground… ▽ More Edge detection has long been an important problem in the field of computer vision. Previous works have explored category-agnostic or category-aware edge detection. In this paper, we explore edge detection in the context of object instances. Although object boundaries could be easily derived from segmentation masks, in practice, instance segmentation models are trained to maximize IoU to the ground-truth mask, which means that segmentation boundaries are not enforced to precisely align with ground-truth edge boundaries. Thus, the task of instance edge detection itself is different and critical. Since precise edge detection requires high resolution feature maps, we design a novel transformer architecture that efficiently combines a FPN and a transformer decoder to enable cross attention on multi-scale high resolution feature maps within a reasonable computation budget. Further, we propose a light weight dense prediction head that is applicable to both instance edge and mask detection. Finally, we use a penalty reduced focal loss to effectively train the model with point supervision on instance edges, which can reduce annotation costs. We demonstrate highly competitive instance edge detection performance compared to state-of-the-art baselines, and also show that the proposed task and loss are complementary to instance segmentation and object detection. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.00531 [pdf, other]

Self-adjusting Population Sizes for the $(1, λ)$-EA on Monotone Functions

Authors: Marc Kaufmann, Maxime Larcher, Johannes Lengler, Xun Zou

Abstract: We study the $(1,λ)$-EA with mutation rate $c/n$ for $c\le 1$, where the population size is adaptively controlled with the $(1:s+1)$-success rule. Recently, Hevia Fajardo and Sudholt have shown that this setup with $c=1$ is efficient on \onemax for $s<1$, but inefficient if $s \ge 18$. Surprisingly, the hardest part is not close to the optimum, but rather at linear distance. We show that this beha… ▽ More We study the $(1,λ)$-EA with mutation rate $c/n$ for $c\le 1$, where the population size is adaptively controlled with the $(1:s+1)$-success rule. Recently, Hevia Fajardo and Sudholt have shown that this setup with $c=1$ is efficient on \onemax for $s<1$, but inefficient if $s \ge 18$. Surprisingly, the hardest part is not close to the optimum, but rather at linear distance. We show that this behavior is not specific to \onemax. If $s$ is small, then the algorithm is efficient on all monotone functions, and if $s$ is large, then it needs superpolynomial time on all monotone functions. In the former case, for $c<1$ we show a $O(n)$ upper bound for the number of generations and $O(n\log n)$ for the number of function evaluations, and for $c=1$ we show $O(n\log n)$ generations and $O(n^2\log\log n)$ evaluations. We also show formally that optimization is always fast, regardless of $s$, if the algorithm starts in proximity of the optimum. All results also hold in a dynamic environment where the fitness function changes in each generation. △ Less

Submitted 12 July, 2023; v1 submitted 1 April, 2022; originally announced April 2022.

MSC Class: 68W50

arXiv:2203.16916 [pdf]

Determination of Physical and Mechanical properties of Sugarcane Single-Bud Billet

Authors: Meimei Wang, Qingting Liu, Yinggang Ou, ** Zou

Abstract: Determining the physical and mechanical properties of sugarcane single-bud billets is a critical step in the mechanical structure design of a sugarcane planter. In this study, the TaiTang F66 cultivar sugarcane samples are analyzed. The moisture content of the billets is found to range from 63.78% to 77.72%, and the average density is 244.67 kg/m3. The coefficient of restitution (CoR) of the sampl… ▽ More Determining the physical and mechanical properties of sugarcane single-bud billets is a critical step in the mechanical structure design of a sugarcane planter. In this study, the TaiTang F66 cultivar sugarcane samples are analyzed. The moisture content of the billets is found to range from 63.78% to 77.72%, and the average density is 244.67 kg/m3. The coefficient of restitution (CoR) of the samples is determined by conducting a drop test wherein the samples are dropped onto a steel plate from different heights. The static friction coefficient (SFC) of four types of samples is determined by the inclined plate method at two orientations. In addition, the rolling friction coefficient (RFC) is determined at three plate inclination angles and sample displacement. The experiment results show that with increasing drop height and moisture content, the billet steel CoR decreases from 0.625 to 0.458, while the billet billet CoR increases from 0.603 to 0.698. With an increase in contact area, the billet steel SFC decreases from 0.515 to 0.377 and the billet billet SFC decreases from 0.498 to 0.323. With increasing angle and sample displacement, the billet steel RFC increases from 0.0315 to 0.2175 and the billet billet RFC increases from 0.0203 to 0.1007. These parameters are useful in the design and optimization of sugarcane single-bud billet planters using EDEM simulation. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: 24 pages, 10 figures,12 tables

arXiv:2203.14823 [pdf]

Reciprocal phase transition-enabled electro-optic modulation

Authors: Fang Zou, Lei Zou, Ye Tian, Yiming Zhang, Erwin Bente, Weigang Hou, Yu Liu, Siming Chen, Victoria Cao, Lei Guo, Songsui Li, Lianshan Yan, Wei Pan, Dusan Milosevic, Zizheng Cao, A. M. J. Koonen, Huiyun Liu, Xihua Zou

Abstract: Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic map** relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed… ▽ More Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic map** relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed phase-transition EO modulation is revealed from the reciprocal transition between two distinct phase planes arising from the bifurcation. Remarkably, a monolithically integrated mode-locked laser (MLL) is implemented as a prototype. A 24.8-GHz radio-frequency signal is generated and modulated, achieving a modulation energy efficiency of 3.06 fJ/bit improved by about four orders of magnitude and a contrast ratio exceeding 50 dB. Thus, MLL-based phase-transition EO modulation is characterised by ultrahigh modulation efficiency and ultrahigh contrast ratio, as experimentally proved in radio-over-fibre and underwater acoustic-sensing systems. This phase-transition EO modulation opens a new avenue for green communication and ubiquitous connections. △ Less

Submitted 22 November, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: 27 pages, 14 figures

arXiv:2203.12430 [pdf, other]

Solving the Federated Edge Learning Participation Dilemma: A Truthful and Correlated Perspective

Authors: Qin Hu, Feng Li, Xukai Zou, Yinhao Xiao

Abstract: An emerging computational paradigm, named federated edge learning (FEL), enables intelligent computing at the network edge with the feature of preserving data privacy for edge devices. Given their constrained resources, it becomes a great challenge to achieve high execution performance for FEL. Most of the state-of-the-arts concentrate on enhancing FEL from the perspective of system operation proc… ▽ More An emerging computational paradigm, named federated edge learning (FEL), enables intelligent computing at the network edge with the feature of preserving data privacy for edge devices. Given their constrained resources, it becomes a great challenge to achieve high execution performance for FEL. Most of the state-of-the-arts concentrate on enhancing FEL from the perspective of system operation procedures, taking few precautions during the composition step of the FEL system. Though a few recent studies recognize the importance of FEL formation and propose server-centric device selection schemes, the impact of data sizes is largely overlooked. In this paper, we take advantage of game theory to depict the decision dilemma among edge devices regarding whether to participate in FEL or not given their heterogeneous sizes of local datasets. For realizing both the individual and global optimization, the server is employed to solve the participation dilemma, which requires accurate information collection for devices' local datasets. Hence, we utilize mechanism design to enable truthful information solicitation. With the help of correlated equilibrium, we derive a decision making strategy for devices from the global perspective, which can achieve the long-term stability and efficacy of FEL. For scalability consideration, we optimize the computational complexity of the basic solution to the polynomial level. Lastly, extensive experiments based on both real and synthetic data are conducted to evaluate our proposed mechanisms, with experimental results demonstrating the performance advantages. △ Less

Submitted 12 February, 2022; originally announced March 2022.

arXiv:2203.12240 [pdf, other]

doi 10.1103/PhysRevLett.129.163201

Fano-like resonance due to interference with distant transitions

Authors: Y. -N. Lv, A. -W. Liu, Y. Tan, C. -L. Hu, T. -P. Hua, X. -B. Zou, Y. R. Sun, C. -L. Zou, G. -C. Guo, S. -M. Hu

Abstract: Narrow optical resonances of atoms or molecules have immense significance in various precision measurements, such as testing fundamental physics and the generation of primary frequency standards. In these studies, accurate transition centers derived from fitting the measured spectra are demanded, which critically rely on the knowledge of spectral line profiles. Here, we propose a new mechanism of… ▽ More Narrow optical resonances of atoms or molecules have immense significance in various precision measurements, such as testing fundamental physics and the generation of primary frequency standards. In these studies, accurate transition centers derived from fitting the measured spectra are demanded, which critically rely on the knowledge of spectral line profiles. Here, we propose a new mechanism of Fano-like resonance induced by distant discrete levels %in atoms or molecules and experimentally verify it with Doppler-free spectroscopy of vibration-rotational transitions of CO$_2$. The observed spectrum has an asymmetric profile and its amplitude increases quadratically with the probe laser power. Our results facilitate a broad range of topics based on narrow transitions. %, such as optical clocks, determination of fundamental physical constants, and quantum memory. △ Less

Submitted 12 October, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

Comments: 6 pages, 4 figures

Journal ref: Physical Review Letters, 129: 163201 (2022)

arXiv:2202.11998 [pdf, other]

doi 10.1016/j.imavis.2022.104422

Effective Actor-centric Human-object Interaction Detection

Authors: Kunlun Xu, Zhimin Li, Zhijun Zhang, Leizhen Dong, Wenhui Xu, Luxin Yan, Sheng Zhong, Xu Zou

Abstract: While Human-Object Interaction(HOI) Detection has achieved tremendous advances in recent, it still remains challenging due to complex interactions with multiple humans and objects occurring in images, which would inevitably lead to ambiguities. Most existing methods either generate all human-object pair candidates and infer their relationships by cropped local features successively in a two-stage… ▽ More While Human-Object Interaction(HOI) Detection has achieved tremendous advances in recent, it still remains challenging due to complex interactions with multiple humans and objects occurring in images, which would inevitably lead to ambiguities. Most existing methods either generate all human-object pair candidates and infer their relationships by cropped local features successively in a two-stage manner, or directly predict interaction points in a one-stage procedure. However, the lack of spatial configurations or reasoning steps of two- or one- stage methods respectively limits their performance in such complex scenes. To avoid this ambiguity, we propose a novel actor-centric framework. The main ideas are that when inferring interactions: 1) the non-local features of the entire image guided by actor position are obtained to model the relationship between the actor and context, and then 2) we use an object branch to generate pixel-wise interaction area prediction, where the interaction area denotes the object central area. Moreover, we also use an actor branch to get interaction prediction of the actor and propose a novel composition strategy based on center-point indexing to generate the final HOI prediction. Thanks to the usage of the non-local features and the partly-coupled property of the human-objects composition strategy, our proposed framework can detect HOI more accurately especially for complex images. Extensive experimental results show that our method achieves the state-of-the-art on the challenging V-COCO and HICO-DET benchmarks and is more robust especially in multiple persons and/or objects scenes. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 11 pages

arXiv:2202.11142 [pdf, other]

An LLVM-based C++ Compiler Toolchain for Variational Hybrid Quantum-Classical Algorithms and Quantum Accelerators

Authors: Pradnya Khalate, Xin-Chuan Wu, Shavindra Premaratne, Justin Hogaboam, Adam Holmes, Albert Schmitz, Gian Giacomo Guerreschi, Xiang Zou, A. Y. Matsuura

Abstract: Variational algorithms are a representative class of quantum computing workloads that combine quantum and classical computing. This paper presents an LLVM-based C++ compiler toolchain to efficiently execute variational hybrid quantum-classical algorithms on a computational system in which the quantum device acts as an accelerator. We introduce a set of extensions to the C++ language for programmin… ▽ More Variational algorithms are a representative class of quantum computing workloads that combine quantum and classical computing. This paper presents an LLVM-based C++ compiler toolchain to efficiently execute variational hybrid quantum-classical algorithms on a computational system in which the quantum device acts as an accelerator. We introduce a set of extensions to the C++ language for programming these algorithms. We define a novel Executable and Linking Format (ELF) for Quantum and create a quantum device compiler component in the LLVM framework to compile the quantum part of the C++ source and reuse the host compiler in the LLVM framework to compile the classical computing part of the C++ source. A variational algorithm runs a quantum circuit repeatedly, each time with different gate parameters. We add to the quantum runtime the capability to execute dynamically a quantum circuit with different parameters. Thus, programmers can call quantum routines the same way as classical routines. With these capabilities, a variational hybrid quantum-classical algorithm can be specified in a single-source code and only needs to be compiled once for all iterations. The single compilation significantly reduces the execution latency of variational algorithms. We evaluate the framework's performance by running quantum circuits that prepare Thermofield Double (TFD) states, a quantum-classical variational algorithm. △ Less

Submitted 22 February, 2022; originally announced February 2022.

Comments: 13 pages, 10 figures, 1 appendix

arXiv:2202.09668 [pdf, other]

doi 10.1038/s41467-022-28309-5

Light-induced dimension crossover in 1T-TiSe$_2$ dictated by excitonic correlations

Authors: Yun Cheng, Alfred Zong, Jun Li, Wei Xia, Shaofeng Duan, Wenxuan Zhao, Yidian Li, Fengfeng Qi, Jun Wu, Lingrong Zhao, Pengfei Zhu, Xiao Zou, Tao Jiang, Yanfeng Guo, Lexian Yang, Dong Qian, Wentao Zhang, Anshul Kogar, Michael W. Zuerch, Dao Xiang, Jie Zhang

Abstract: In low-dimensional systems with strong electronic correlations, the application of an ultrashort laser pulse often yields novel phases that are otherwise inaccessible. The central challenge in understanding such phenomena is to determine how dimensionality and many-body correlations together govern the pathway of a non-adiabatic transition. To this end, we examine a layered compound, 1T-TiSe$_2$,… ▽ More In low-dimensional systems with strong electronic correlations, the application of an ultrashort laser pulse often yields novel phases that are otherwise inaccessible. The central challenge in understanding such phenomena is to determine how dimensionality and many-body correlations together govern the pathway of a non-adiabatic transition. To this end, we examine a layered compound, 1T-TiSe$_2$, whose three-dimensional charge-density-wave (3D CDW) state also features exciton condensation due to strong electron-hole interactions. We find that photoexcitation suppresses the equilibrium 3D CDW while creating a nonequilibrium 2D CDW. Remarkably, the dimension reduction does not occur unless bound electron-hole pairs are broken. This relation suggests that excitonic correlations maintain the out-of-plane CDW coherence, settling a long-standing debate over their role in the CDW transition. Our findings demonstrate how optical manipulation of electronic interaction enables one to control the dimensionality of a broken-symmetry order, paving the way for realizing other emergent states in strongly correlated systems. △ Less

Submitted 19 February, 2022; originally announced February 2022.

Journal ref: Nature Communications 13, 963 (2022)

arXiv:2202.07339 [pdf]

A Ta-TaS2 monolithic catalyst with robust and metallic interface for superior hydrogen evolution

Authors: Qiangmin Yu, Zhiyuan Zhang, Siyao Qiu, Yuting Luo, Zhibo Liu, Fengning Yang, Heming Liu, Shiyu Ge, Xiaolong Zou, Baofu Ding, Wencai Ren, Hui-Ming Cheng, Chenghua Sun, Bilu Liu

Abstract: The use of highly active and robust catalysts is crucial for producing green hydrogen by water electrolysis as we strive to achieve global carbon neutrality. Noble metals like platinum are currently used in industry for the hydrogen evolution reaction (HER), but suffer from scarcity, high price and unsatisfied performance and stability at large current density, restricting their large scale implem… ▽ More The use of highly active and robust catalysts is crucial for producing green hydrogen by water electrolysis as we strive to achieve global carbon neutrality. Noble metals like platinum are currently used in industry for the hydrogen evolution reaction (HER), but suffer from scarcity, high price and unsatisfied performance and stability at large current density, restricting their large scale implementations. Here we report the synthesis of a new type of monolithic catalyst (MC) consisting of a metal disulfide (e.g., TaS2) catalyst vertically bonded to a conductive substrate of the same metal by strong covalent bonds. These features give the MC a mechanically robust and electrically near zero resistance interface, leading to an outstanding HER performance including rapid charge transfer and excellent durability, together with a low overpotential of 398 mV to achieve a current density of 2,000 mA cm-2 as required by industry. The Ta TaS2 MC has a negligible performance decay after 200 h operation at large current densities. In light of its unique interface and the various choice of metal elements giving the same structure, such monolithic materials may have broad uses besides catalysis. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 16 pages, 5 figures

arXiv:2202.07212 [pdf]

Generic bond energy formalism within the modified quasichemical model for ternary solutions

Authors: Kun Wang, Dongyang Li, Xingli Zou, Hongwei Cheng, Chonghe Li, Xionggang Lu, Kuochih Chou

Abstract: The Modified Quasichemical Model in the Pair Approximation (MQMPA) can effectively capture the thermodynamic features of a binary solution with Short-Range Ordering (SRO). If the model is used to treat a ternary solution, a geometric interpolation method must be employed to extend the bond energy expression from binary to ternary formalism. The aim of the present work is to implement such extensio… ▽ More The Modified Quasichemical Model in the Pair Approximation (MQMPA) can effectively capture the thermodynamic features of a binary solution with Short-Range Ordering (SRO). If the model is used to treat a ternary solution, a geometric interpolation method must be employed to extend the bond energy expression from binary to ternary formalism. The aim of the present work is to implement such extension by means of a generic geometric interpolation approach. The generic method is unbiased and can be transformed into the widely used Kohler, Toop and Muggianu approaches with special interpolation parameters. The interpolation parameters can be calculated by the integration method as well as be optimized by ternary experimental data. The generic bond energy formalism (GBEF) has thus been derived to provide the MQMPA great flexibility to describe ternary solutions with complex configurations. Moreover, the GBEF is more concise than the formula derived by a combinatorial Kohler-Toop method. The concise GBEF is in the respect more conveniently programmed. Eventually, the Cu-Li-Sn liquid where both SRO and clustering among atoms occur is employed to validate the effectiveness and reliability of the GBEF within the MQMPA. △ Less

Submitted 29 November, 2022; v1 submitted 15 February, 2022; originally announced February 2022.

Comments: 20 pages, 5 figures

arXiv:2201.11693 [pdf, other]

Multi-photon Atom Interferometry via cavity-enhanced Bragg Diffraction

Authors: D. O. Sabulsky, J. Junca, X. Zou, A. Bertoldi, M. Prevedelli, Q. Beaufils, R. Geiger, A. Landragin, P. Bouyer, B. Canuel

Abstract: We present a novel atom interferometer configuration that combines large momentum transfer with the enhancement of an optical resonator for the purpose of measuring gravitational strain in the horizontal directions. Using Bragg diffraction and taking advantage of the optical gain provided by the resonator, we achieve momentum transfer up to $8\hbar k$ with mW level optical power in a cm-sized reso… ▽ More We present a novel atom interferometer configuration that combines large momentum transfer with the enhancement of an optical resonator for the purpose of measuring gravitational strain in the horizontal directions. Using Bragg diffraction and taking advantage of the optical gain provided by the resonator, we achieve momentum transfer up to $8\hbar k$ with mW level optical power in a cm-sized resonating waist. Importantly, our experiment uses an original resonator design that allows for a large resonating beam waist and eliminates the need to trap atoms in cavity modes. We demonstrate inertial sensitivity in the horizontal direction by measuring the change in tilt of our resonator. This result paves the way for future hybrid atom/optical gravitational wave detectors. Furthermore, the versatility of our method extends to a wide range of measurement geometries and atomic sources, opening up new avenues for the realization of highly sensitive inertial atom sensors. △ Less

Submitted 16 April, 2024; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 8 pages, 4 figures

arXiv:2201.07789 [pdf, other]

doi 10.1140/epjqt/s40507-022-00147-w

Cold Atoms in Space: Community Workshop Summary and Proposed Road-Map

Authors: Ivan Alonso, Cristiano Alpigiani, Brett Altschul, Henrique Araujo, Gianluigi Arduini, Jan Arlt, Leonardo Badurina, Antun Balaz, Satvika Bandarupally, Barry C Barish Michele Barone, Michele Barsanti, Steven Bass, Angelo Bassi, Baptiste Battelier, Charles F. A. Baynham, Quentin Beaufils, Aleksandar Belic, Joel Berge, Jose Bernabeu, Andrea Bertoldi, Robert Bingham, Sebastien Bize, Diego Blas, Kai Bongs, Philippe Bouyer , et al. (224 additional authors not shown)

Abstract: We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, a… ▽ More We summarize the discussions at a virtual Community Workshop on Cold Atoms in Space concerning the status of cold atom technologies, the prospective scientific and societal opportunities offered by their deployment in space, and the developments needed before cold atoms could be operated in space. The cold atom technologies discussed include atomic clocks, quantum gravimeters and accelerometers, and atom interferometers. Prospective applications include metrology, geodesy and measurement of terrestrial mass change due to, e.g., climate change, and fundamental science experiments such as tests of the equivalence principle, searches for dark matter, measurements of gravitational waves and tests of quantum mechanics. We review the current status of cold atom technologies and outline the requirements for their space qualification, including the development paths and the corresponding technical milestones, and identifying possible pathfinder missions to pave the way for missions to exploit the full potential of cold atoms in space. Finally, we present a first draft of a possible road-map for achieving these goals, that we propose for discussion by the interested cold atom, Earth Observation, fundamental physics and other prospective scientific user communities, together with ESA and national space and research funding agencies. △ Less

Submitted 19 January, 2022; originally announced January 2022.

Comments: Summary of the Community Workshop on Cold Atoms in Space and corresponding Road-map: https://indico.cern.ch/event/1064855/

Journal ref: EPJ Quantum Technol. 9, 30 (2022)

arXiv:2201.06781 [pdf, other]

When Facial Expression Recognition Meets Few-Shot Learning: A Joint and Alternate Learning Framework

Authors: Xinyi Zou, Yan Yan, **g-Hao Xue, Si Chen, Hanzi Wang

Abstract: Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training data, which are often laboriously collected under… ▽ More Human emotions involve basic and compound facial expressions. However, current research on facial expression recognition (FER) mainly focuses on basic expressions, and thus fails to address the diversity of human emotions in practical scenarios. Meanwhile, existing work on compound FER relies heavily on abundant labeled compound expression training data, which are often laboriously collected under the professional instruction of psychology. In this paper, we study compound FER in the cross-domain few-shot learning setting, where only a few images of novel classes from the target domain are required as a reference. In particular, we aim to identify unseen compound expressions with the model trained on easily accessible basic expression datasets. To alleviate the problem of limited base classes in our FER task, we propose a novel Emotion Guided Similarity Network (EGS-Net), consisting of an emotion branch and a similarity branch, based on a two-stage learning framework. Specifically, in the first stage, the similarity branch is jointly trained with the emotion branch in a multi-task fashion. With the regularization of the emotion branch, we prevent the similarity branch from overfitting to sampled base classes that are highly overlapped across different episodes. In the second stage, the emotion branch and the similarity branch play a "two-student game" to alternately learn from each other, thereby further improving the inference ability of the similarity branch on unseen compound expressions. Experimental results on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method against several state-of-the-art methods. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 9 pages, 2 figures

arXiv:2201.05972 [pdf, other]

Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

Authors: Shuangjie Xu, Rui Wan, Maosheng Ye, Xiaoyi Zou, Tongyi Cao

Abstract: Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grou** processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-… ▽ More Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grou** processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: Accepted by the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2201.01893 [pdf, other]

Flow-Guided Sparse Transformer for Video Deblurring

Authors: **g Lin, Yuanhao Cai, Xiaowan Hu, Haoqian Wang, Youliang Yan, Xueyi Zou, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool

Abstract: Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sp… ▽ More Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA). For each $query$ element on the blurry reference frame, FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse yet highly related $key$ elements corresponding to the same scene patch in neighboring frames. Besides, we present a Recurrent Embedding (RE) mechanism to transfer information from past frames and strengthen long-range temporal dependencies. Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring. Code and pre-trained models are publicly available at https://github.com/lin**g7/VR-Baseline △ Less

Submitted 29 May, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: ICML 2022; The First Transformer-based method for Video Deblurring

arXiv:2112.00504 [pdf, other]

Learning Oriented Remote Sensing Object Detection via Naive Geometric Computing

Authors: Yanjie Wang, Xu Zou, Zhijun Zhang, Wenhui Xu, Liqun Chen, Sheng Zhong, Luxin Yan, Guodong Wang

Abstract: Detecting oriented objects along with estimating their rotation information is one crucial step for analyzing remote sensing images. Despite that many methods proposed recently have achieved remarkable performance, most of them directly learn to predict object directions under the supervision of only one (e.g. the rotation angle) or a few (e.g. several coordinates) groundtruth values individually.… ▽ More Detecting oriented objects along with estimating their rotation information is one crucial step for analyzing remote sensing images. Despite that many methods proposed recently have achieved remarkable performance, most of them directly learn to predict object directions under the supervision of only one (e.g. the rotation angle) or a few (e.g. several coordinates) groundtruth values individually. Oriented object detection would be more accurate and robust if extra constraints, with respect to proposal and rotation information regression, are adopted for joint supervision during training. To this end, we innovatively propose a mechanism that simultaneously learns the regression of horizontal proposals, oriented proposals, and rotation angles of objects in a consistent manner, via naive geometric computing, as one additional steady constraint (see Figure 1). An oriented center prior guided label assignment strategy is proposed for further enhancing the quality of proposals, yielding better performance. Extensive experiments demonstrate the model equipped with our idea significantly outperforms the baseline by a large margin to achieve a new state-of-the-art result without any extra computational burden during inference. Our proposed idea is simple and intuitive that can be readily implemented. Source codes and trained models are involved in supplementary files. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2111.14640 [pdf, other]

doi 10.3390/mi13010046

Timing performance simulation for 3D 4H-SiC detector

Authors: Yuhang Tan, Tao Yang, Kai Liu, Congcong Wang, Xiyuan Zhang, Mei Zhao, Xiaochuan Xia, Hongwei Liang, Ruiliang Xu, Yu Zhao, Xiaoshen Kang, Chenxi Fu, Weimin Song, Zhenzhong Zhang, Ruirui Fan, Xinbo Zou, Xin Shi

Abstract: To meet high radiation challenge for detectors in future high-energy physics, a novel 3D 4H-SiC detector was investigated. SiC detectors could potentially operate in radiation harsh and room temperature environment because of its high thermal conductivity and high atomic displacement threshold energy. 3D structure, which decouples thickness and distance between electrodes, further improves timing… ▽ More To meet high radiation challenge for detectors in future high-energy physics, a novel 3D 4H-SiC detector was investigated. SiC detectors could potentially operate in radiation harsh and room temperature environment because of its high thermal conductivity and high atomic displacement threshold energy. 3D structure, which decouples thickness and distance between electrodes, further improves timing performance and radiation hardness of the detector. We developed a simulation software - RASER (RAdiation SEmiconductoR) to simulate the time resolution of planar and 3D 4H-SiC detectors with different parameters and structures, and the reliability of the software is verified by comparing time resolution results of simulation with data. The rough time resolution of 3D 4H-SiC detector was estimated, and the simulation parameters could be used as guideline to 3D 4H-SiC detector design and optimization. △ Less

Submitted 27 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: 9 pages, 7 figures

Journal ref: Micromachines 2022, 13, 46

arXiv:2111.14636 [pdf, other]

doi 10.1002/lpor.202200599

Quantum effects beyond mean-field treatment in quantum optics

Authors: Yue-Xun Huang, Ming Li, Zi-Jie Chen, Xu-Bo Zou, Guang-Can Guo, Chang-Ling Zou

Abstract: Mean-field treatment (MFT) is frequently applied to approximately predict the dynamics of quantum optics systems, to simplify the system Hamiltonian through neglecting certain modes that are driven strongly or couple weakly with other modes. While in practical quantum systems, the quantum correlations between different modes might lead to unanticipated quantum effects and lead to significantly dis… ▽ More Mean-field treatment (MFT) is frequently applied to approximately predict the dynamics of quantum optics systems, to simplify the system Hamiltonian through neglecting certain modes that are driven strongly or couple weakly with other modes. While in practical quantum systems, the quantum correlations between different modes might lead to unanticipated quantum effects and lead to significantly distinct system dynamics. Here, we provide a general and systematic theoretical framework based on the perturbation theory in company with the MFT to capture these quantum effects. The form of nonlinear dissipation and parasitic Hamiltonian are predicted, which scales inversely with the nonlinear coupling rate. Furthermore, the indicator is also proposed as a measure of the accuracy of mean-field treatment. Our theory is applied to the example of quantum frequency conversion, in which mean-field treatment is commonly applied, to test its limitation under strong pump and large coupling strength. The analytical results show excellent agreement with the numerical simulations. Our work clearly reveals the attendant quantum effects under mean-field treatment and provides a more precise theoretical framework to describe quantum optics systems. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: 6 pages, 3 figures

Journal ref: Laser & Photonics Reviews 17, 2200599 (2023)

arXiv:2111.09461 [pdf]

doi 10.1038/s42256-021-00421-z

Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence

Authors: Xiang Bai, Hanchen Wang, Liya Ma, Yongchao Xu, Jiefeng Gan, Ziwei Fan, Fan Yang, Ke Ma, Jiehua Yang, Song Bai, Chang Shu, Xinyu Zou, Renhao Huang, Changzheng Zhang, Xiaowu Liu, Dandan Tu, Chuou Xu, Wenqing Zhang, Xi Wang, Anguo Chen, Yu Zeng, Dehua Yang, Ming-Wei Wang, Nagaraj Holalkere, Neil J. Halin , et al. (21 additional authors not shown)

Abstract: Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI),… ▽ More Artificial intelligence (AI) provides a promising substitution for streamlining COVID-19 diagnoses. However, concerns surrounding security and trustworthiness impede the collection of large-scale representative medical data, posing a considerable challenge for training a well-generalised model in clinical practices. To address this, we launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution under a federated learning framework (FL) without data sharing. Here we show that our FL model outperformed all the local models by a large yield (test sensitivity /specificity in China: 0.973/0.951, in the UK: 0.730/0.942), achieving comparable performance with a panel of professional radiologists. We further evaluated the model on the hold-out (collected from another two hospitals leaving out the FL) and heterogeneous (acquired with contrast materials) data, provided visual explanations for decisions made by the model, and analysed the trade-offs between the model performance and the communication costs in the federated training process. Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK. Collectively, our work advanced the prospects of utilising federated learning for privacy-preserving AI in digital health. △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Nature Machine Intelligence

arXiv:2111.06213 [pdf, other]

Enhanced Fast Boolean Matching based on Sensitivity Signatures Pruning

Authors: Jiaxi Zhang, Liwei Ni, Shenggen Zheng, Hao Liu, Xiangfu Zou, Feng Wang, Guojie Luo

Abstract: Boolean matching is significant to digital integrated circuits design. An exhaustive method for Boolean matching is computationally expensive even for functions with only a few variables, because the time complexity of such an algorithm for an n-variable Boolean function is $O(2^{n+1}n!)$. Sensitivity is an important characteristic and a measure of the complexity of Boolean functions. It has been… ▽ More Boolean matching is significant to digital integrated circuits design. An exhaustive method for Boolean matching is computationally expensive even for functions with only a few variables, because the time complexity of such an algorithm for an n-variable Boolean function is $O(2^{n+1}n!)$. Sensitivity is an important characteristic and a measure of the complexity of Boolean functions. It has been used in analysis of the complexity of algorithms in different fields. This measure could be regarded as a signature of Boolean functions and has great potential to help reduce the search space of Boolean matching. In this paper, we introduce Boolean sensitivity into Boolean matching and design several sensitivity-related signatures to enhance fast Boolean matching. First, we propose some new signatures that relate sensitivity to Boolean equivalence. Then, we prove that these signatures are prerequisites for Boolean matching, which we can use to reduce the search space of the matching problem. Besides, we develop a fast sensitivity calculation method to compute and compare these signatures of two Boolean functions. Compared with the traditional cofactor and symmetric detection methods, sensitivity is a series of signatures of another dimension. We also show that sensitivity can be easily integrated into traditional methods and distinguish the mismatched Boolean functions faster. To the best of our knowledge, this is the first work that introduces sensitivity to Boolean matching. The experimental results show that sensitivity-related signatures we proposed in this paper can reduce the search space to a very large extent, and perform up to 3x speedup over the state-of-the-art Boolean matching methods. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: To be appeared in ICCAD'21

Showing 101–150 of 382 results for author: Zou, X