Search | arXiv e-print repository

doi 10.1109/COMST.2023.3308717

Networking Architecture and Key Supporting Technologies for Human Digital Twin in Personalized Healthcare: A Comprehensive Survey

Authors: Jiayuan Chen, Changyan Yi, Samuel D. Okegbile, Jun Cai, Xuemin, Shen

Abstract: Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as s… ▽ More Digital twin (DT), refers to a promising technique to digitally and accurately represent actual physical entities. One typical advantage of DT is that it can be used to not only virtually replicate a system's detailed operations but also analyze the current condition, predict future behaviour, and refine the control optimization. Although DT has been widely implemented in various fields, such as smart manufacturing and transportation, its conventional paradigm is limited to embody non-living entities, e.g., robots and vehicles. When adopted in human-centric systems, a novel concept, called human digital twin (HDT) has thus been proposed. Particularly, HDT allows in silico representation of individual human body with the ability to dynamically reflect molecular status, physiological status, emotional and psychological status, as well as lifestyle evolutions. These prompt the expected application of HDT in personalized healthcare (PH), which can facilitate remote monitoring, diagnosis, prescription, surgery and rehabilitation. However, despite the large potential, HDT faces substantial research challenges in different aspects, and becomes an increasingly popular topic recently. In this survey, with a specific focus on the networking architecture and key technologies for HDT in PH applications, we first discuss the differences between HDT and conventional DTs, followed by the universal framework and essential functions of HDT. We then analyze its design requirements and challenges in PH applications. After that, we provide an overview of the networking architecture of HDT, including data acquisition layer, data communication layer, computation layer, data management layer and data analysis and decision making layer. Besides reviewing the key technologies for implementing such networking architecture in detail, we conclude this survey by presenting future research directions of HDT. △ Less

Submitted 23 June, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2301.03759 [pdf]

Strain-programmable van der Waals magnetic tunnel junctions

Authors: John Cenker, Dmitry Ovchinnikov, Harvey Yang, Daniel G. Chica, Catherine Zhu, Jiaqi Cai, Geoffrey Diederich, Zhaoyu Liu, Xiaoyang Zhu, Xavier Roy, Ting Cao, Matthew W. Daniels, Jiun-Haw Chu, Di Xiao, Xiaodong Xu

Abstract: The magnetic tunnel junction (MTJ) is a backbone device for spintronics. Realizing next generation energy efficient MTJs will require operating mechanisms beyond the standard means of applying magnetic fields or large electrical currents. Here, we demonstrate a new concept for programmable MTJ operation via strain control of the magnetic states of CrSBr, a layered antiferromagnetic semiconductor u… ▽ More The magnetic tunnel junction (MTJ) is a backbone device for spintronics. Realizing next generation energy efficient MTJs will require operating mechanisms beyond the standard means of applying magnetic fields or large electrical currents. Here, we demonstrate a new concept for programmable MTJ operation via strain control of the magnetic states of CrSBr, a layered antiferromagnetic semiconductor used as the tunnel barrier. Switching the CrSBr from antiferromagnetic to ferromagnetic order generates a giant tunneling magnetoresistance ratio without external magnetic field at temperatures up to ~ 140 K. When the static strain is set near the phase transition, applying small strain pulses leads to active flip** of layer magnetization with controlled layer number and thus magnetoresistance states. Further, finely adjusting the static strain to a critical value turns on stochastic switching between metastable states, with a strain-tunable sigmoidal response curve akin to the stochastic binary neuron. Our results highlight the potential of strain-programmable van der Waals MTJs towards spintronic applications, such as magnetic memory, random number generation, and probabilistic and neuromorphic computing. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.13400 [pdf, other]

doi 10.1088/1674-1056/ac9362

LAMOST medium-resolution spectroscopic survey of binarity and exotic star (LAMOST-MRS-B): Observation strategy and target selection

Authors: Jiao Li, Jiang-Dan Li, Yan-Jun Guo, Zhan-Wen Han, Xue-Fei Chen, Chao Liu, Hong-Wei Ge, Deng-Kai Jiang, Li-Fang Li, Bo Zhang, Jia-Ming Liu, Hao Tian, Hao-Tong Zhang, Hai-Long Yuan, Wen-Yuan Cui, Juan-Juan Ren, **g-Hao Cai, Jian-Rong Shi

Abstract: LAMOST-MRS-B is one of the sub-surveys of LAMOST medium-resolution (R~7500) spectroscopic survey. It aims at studying the statistical properties (e.g., binary fraction, orbital period distribution, mass ratio distribution) of binary stars and exotic stars. We intend to observe about 30000 stars (10 mag <= G <= 14.5 mag) with at least 10 visits in five years. We first planned to observe 25 plates a… ▽ More LAMOST-MRS-B is one of the sub-surveys of LAMOST medium-resolution (R~7500) spectroscopic survey. It aims at studying the statistical properties (e.g., binary fraction, orbital period distribution, mass ratio distribution) of binary stars and exotic stars. We intend to observe about 30000 stars (10 mag <= G <= 14.5 mag) with at least 10 visits in five years. We first planned to observe 25 plates around the galactic plane in 2018. Then the plates were reduced to 12 in 2019 because of the limitation of observation. At the same time, two new plates located at the high galactic latitude were added to explore binary properties influenced by the different environments. In this survey project, we set the identified exotic and low-metallicity stars with the highest observation priorities. For the rest of the selected stars, we gave higher priority to the relatively brighter stars in order to obtain high-quality spectra as many as possible. Spectra of 49129 stars have been obtained in LAMOST-MRS-B field and released in DR8, of which 28828 and 3375 stars have been visited more than twice and ten times with SNR >= 10, respectively. Most of the sources are B-, A-, and F-type stars with 0.6 < [Fe/H] < 0.4 dex. We also obtain 347 identified variable and exotic stars and about 250 stars with [Fe/H] < 1 dex. We measure radial velocities (RVs) by using 892233 spectra of the stars. The uncertainties of RV achieve about 1 km/s and 10 km/s1 for 95% of late- and early-type stars, respectively. The datasets presented in this paper are available at http://www.doi.org/10.57760/sciencedb.j00113.00035. △ Less

Submitted 27 December, 2022; originally announced December 2022.

arXiv:2212.13395 [pdf, other]

doi 10.1088/1538-3873/ac98e0

Variability and Spectral Behavior of Gamma-ray Flares of 3C 279

Authors: Gege Wang, Junhui Fan, Hubing Xiao, **ting Cai

Abstract: 3C 279 showed enhanced flux variations in Fermi-LAT γ-ray observations from January to June 2018. We present a detailed Fermi-LAT analysis to investigate the variability and spectral behaviors of 3C 279 during the γ-ray flares in 2018. In this work, we analyzed the γ-ray spectra and found that the spectra in either the flaring or quiescent states do not show any clear breaks (or cutoffs). This ind… ▽ More 3C 279 showed enhanced flux variations in Fermi-LAT γ-ray observations from January to June 2018. We present a detailed Fermi-LAT analysis to investigate the variability and spectral behaviors of 3C 279 during the γ-ray flares in 2018. In this work, we analyzed the γ-ray spectra and found that the spectra in either the flaring or quiescent states do not show any clear breaks (or cutoffs). This indicates that the dissipation region is outside the broad-line region, and the energy dissipation may be due to the inverse Compton process of scattering the dust torus infrared photons, this result is also consistent with that in Tolamatti et al. An external inverse Compton scattering of dusty torus (DT) photons is employed to calculate the broadband spectral energy distribution (SED). This model was further supported by the fact that we found flare decay timescale was consistent with the cooling time of relativistic electrons through DT photons. During the SED modeling, a relatively harder spectrum for the electron energy distribution (EED) is found and suggests these electrons may not be accelerated by the shock that happened in the dissipation region. Besides, the magnetic reconnection is also ruled out due to a low magnetization ratio. Thus, we suggest an injection of higher-energy electrons from outside the blob and raising the flare. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 12 pages, 6 figures, published in the Publications of the Astronomical Society of the Pacific

arXiv:2212.09286 [pdf, other]

doi 10.1093/mnras/stac3292

Data mining techniques on astronomical spectra data. II : Classification Analysis

Authors: Haifeng Yang, Lichan Zhou, Jianghui Cai, Chenhui Shi, Yuqing Yang, Xujun Zhao, Juncheng Duan, Xiaona Yin

Abstract: Classification is valuable and necessary in spectral analysis, especially for data-driven mining. Along with the rapid development of spectral surveys, a variety of classification techniques have been successfully applied to astronomical data processing. However, it is difficult to select an appropriate classification method in practical scenarios due to the different algorithmic ideas and data ch… ▽ More Classification is valuable and necessary in spectral analysis, especially for data-driven mining. Along with the rapid development of spectral surveys, a variety of classification techniques have been successfully applied to astronomical data processing. However, it is difficult to select an appropriate classification method in practical scenarios due to the different algorithmic ideas and data characteristics. Here, we present the second work in the data mining series - a review of spectral classification techniques. This work also consists of three parts: a systematic overview of current literature, experimental analyses of commonly used classification algorithms and source codes used in this paper. Firstly, we carefully investigate the current classification methods in astronomical literature and organize these methods into ten types based on their algorithmic ideas. For each type of algorithm, the analysis is organized from the following three perspectives. (1) their current applications and usage frequencies in spectral classification are summarized; (2) their basic ideas are introduced and preliminarily analysed; (3) the advantages and caveats of each type of algorithm are discussed. Secondly, the classification performance of different algorithms on the unified data sets is analysed. Experimental data are selected from the LAMOST survey and SDSS survey. Six groups of spectral data sets are designed from data characteristics, data qualities and data volumes to examine the performance of these algorithms. Then the scores of nine basic algorithms are shown and discussed in the experimental analysis. Finally, nine basic algorithms source codes written in python and manuals for usage and improvement are provided. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Comments: 25 pages, 41 figures

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 518, Issue 4, February 2023, Pages 5904-5928

arXiv:2212.08419 [pdf, other]

doi 10.1093/mnras/stac2975

Data mining techniques on astronomical spectra data. I : Clustering Analysis

Authors: Haifeng Yang, Chenhui Shi, Jianghui Cai, Lichan Zhou, Yuqing Yang, Xujun Zhao, Yanting He, **g Hao

Abstract: Clustering is an effective tool for astronomical spectral analysis, to mine clustering patterns among data. With the implementation of large sky surveys, many clustering methods have been applied to tackle spectroscopic and photometric data effectively and automatically. Meanwhile, the performance of clustering methods under different data characteristics varies greatly. With the aim of summarizin… ▽ More Clustering is an effective tool for astronomical spectral analysis, to mine clustering patterns among data. With the implementation of large sky surveys, many clustering methods have been applied to tackle spectroscopic and photometric data effectively and automatically. Meanwhile, the performance of clustering methods under different data characteristics varies greatly. With the aim of summarizing astronomical spectral clustering algorithms and laying the foundation for further research, this work gives a review of clustering methods applied to astronomical spectra data in three parts. First, many clustering methods for astronomical spectra are investigated and analysed theoretically, looking at algorithmic ideas, applications, and features. Secondly, experiments are carried out on unified datasets constructed using three criteria (spectra data type, spectra quality, and data volume) to compare the performance of typical algorithms; spectra data are selected from the Large Sky Area Multi-Object Fibre Spectroscopic Telescope (LAMOST) survey and Sloan Digital Sky Survey (SDSS). Finally, source codes of the comparison clustering algorithms and manuals for usage and improvement are provided on GitHub. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: 28 pages, 53 figures

Journal ref: Monthly Notices of the Royal Astronomical Society. 517(2022)5496-5523

arXiv:2212.07867 [pdf, other]

Localizing Scan Targets from Human Pose for Autonomous Lung Ultrasound Imaging

Authors: Jianzhi Long, Jicang Cai, Abdullah Al-Battal, Shiwei **, **g Zhang, Dacheng Tao, Truong Nguyen

Abstract: Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for a long period of time, therefore increasing risk of infection. In this work, we investigate the important yet seldom-studied problem of… ▽ More Ultrasound is progressing toward becoming an affordable and versatile solution to medical imaging. With the advent of COVID-19 global pandemic, there is a need to fully automate ultrasound imaging as it requires trained operators in close proximity to patients for a long period of time, therefore increasing risk of infection. In this work, we investigate the important yet seldom-studied problem of scan target localization, under the setting of lung ultrasound imaging. We propose a purely vision-based, data driven method that incorporates learning-based computer vision techniques. We combine a human pose estimation model with a specially designed regression model to predict the lung ultrasound scan targets, and deploy multiview stereo vision to enhance the consistency of 3D target localization. While related works mostly focus on phantom experiments, we collect data from 30 human subjects for testing. Our method attains an accuracy level of 16.00(9.79) mm for probe positioning and 4.44(3.75) degree for probe orientation, with a success rate above 80% under an error threshold of 25mm for all scan targets. Moreover, our approach can serve as a general solution to other types of ultrasound modalities. The code for implementation has been released. △ Less

Submitted 25 February, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

Comments: v2 2023/02/25

ACM Class: I.4.9

arXiv:2212.06890 [pdf]

doi 10.1021/acs.nanolett.2c04010

Anomalous Second Harmonic Generation from Atomically Thin MnBi2Te4

Authors: Jordan Fonseca, Geoffrey M. Diederich, Dmitry Ovchinnikov, Jiaqi Cai, Chong Wang, Jiaqiang Yan, Di Xiao, Xiaodong Xu

Abstract: MnBi2Te4 is a van der Waals topological insulator with intrinsic intralayer ferromagnetic exchange and A-type antiferromagnetic interlayer coupling. Theoretically, it belongs to a class of structurally centrosymmetric crystals whose layered antiferromagnetic order breaks inversion symmetry for even layer numbers, making optical second harmonic generation (SHG) an ideal probe of the coupling betwee… ▽ More MnBi2Te4 is a van der Waals topological insulator with intrinsic intralayer ferromagnetic exchange and A-type antiferromagnetic interlayer coupling. Theoretically, it belongs to a class of structurally centrosymmetric crystals whose layered antiferromagnetic order breaks inversion symmetry for even layer numbers, making optical second harmonic generation (SHG) an ideal probe of the coupling between the crystal and magnetic structures. Here, we perform magnetic field and temperature-dependent SHG measurements on MnBi2Te4 flakes ranging from bulk to monolayer thickness. We find that the dominant SHG signal from MnBi2Te4 is unexpectedly unrelated to both magnetic state and layer number. We suggest that surface SHG is the likely source of the observed strong SHG, whose symmetry matches that of the MnBi2Te4-vacuum interface. Our results highlight the importance of considering the surface contribution to inversion symmetry-breaking in van der Waals centrosymmetric magnets. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.03335 [pdf, ps, other]

Planar #CSP Equality Corresponds to Quantum Isomorphism -- A Holant Viewpoint

Authors: **-Yi Cai, Ben Young

Abstract: Recently, Mančinska and Roberson proved that two graphs $G$ and $G'$ are quantum isomorphic if and only if they admit the same number of homomorphisms from all planar graphs. We extend this result to planar #CSP with any pair of sets $\mathcal{F}$ and $\mathcal{F}'$ of real-valued, arbitrary-arity constraint functions. Graph homomorphism is the special case where each of $\mathcal{F}$ and… ▽ More Recently, Mančinska and Roberson proved that two graphs $G$ and $G'$ are quantum isomorphic if and only if they admit the same number of homomorphisms from all planar graphs. We extend this result to planar #CSP with any pair of sets $\mathcal{F}$ and $\mathcal{F}'$ of real-valued, arbitrary-arity constraint functions. Graph homomorphism is the special case where each of $\mathcal{F}$ and $\mathcal{F}'$ contains a single symmetric 0-1-valued binary constraint function. Our treatment uses the framework of planar Holant problems. To prove that quantum isomorphic constraint function sets give the same value on any planar #CSP instance, we apply a novel form of holographic transformation of Valiant, using the quantum permutation matrix $\mathcal{U}$ defining the quantum isomorphism. Due to the noncommutativity of $\mathcal{U}$'s entries, it turns out that this form of holographic transformation is only applicable to planar Holant. To prove the converse, we introduce the quantum automorphism group Qut$(\mathcal{F})$ of a set of constraint functions $\mathcal{F}$, and characterize the intertwiners of Qut$(\mathcal{F})$ as the signature matrices of planar Holant$(\mathcal{F}\,|\,\mathcal{EQ})$ quantum gadgets. Then we define a new notion of (projective) connectivity for constraint functions and reduce arity while preserving the quantum automorphism group. Finally, to address the challenges posed by generalizing from 0-1 valued to real-valued constraint functions, we adapt a technique of Lovász in the classical setting for isomorphisms of real-weighted graphs to the setting of quantum isomorphisms. △ Less

Submitted 5 May, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 37 pages, 12 figures

arXiv:2212.01612 [pdf, other]

Named Entity and Relation Extraction with Multi-Modal Retrieval

Authors: Xinyu Wang, Jiong Cai, Yong Jiang, Pengjun Xie, Kewei Tu, Wei Lu

Abstract: Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware,… ▽ More Multi-modal named entity recognition (NER) and relation extraction (RE) aim to leverage relevant image information to improve the performance of NER and RE. Most existing efforts largely focused on directly extracting potentially useful information from images (such as pixel-level features, identified objects, and associated captions). However, such extraction processes may not be knowledge aware, resulting in information that may not be highly relevant. In this paper, we propose a novel Multi-modal Retrieval based framework (MoRe). MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively. Next, the retrieval results are sent to the textual and visual models respectively for predictions. Finally, a Mixture of Experts (MoE) module combines the predictions from the two models to make the final decision. Our experiments show that both our textual model and visual model can achieve state-of-the-art performance on four multi-modal NER datasets and one multi-modal RE dataset. With MoE, the model performance can be further improved and our analysis demonstrates the benefits of integrating both textual and visual cues for such tasks. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Findings of EMNLP 2022. Code is publicly available at http://github.com/modelscope/adaseq/examples/MoRe

arXiv:2211.15666 [pdf, other]

Learning Visual Planning Models from Partially Observed Images

Authors: Kebing **, Zhanhao Xiao, Hankui Hankz Zhuo, Hai Wan, Jiaran Cai

Abstract: There has been increasing attention on planning model learning in classical planning. Most existing approaches, however, focus on learning planning models from structured data in symbolic representations. It is often difficult to obtain such structured data in real-world scenarios. Although a number of approaches have been developed for learning planning models from fully observed unstructured dat… ▽ More There has been increasing attention on planning model learning in classical planning. Most existing approaches, however, focus on learning planning models from structured data in symbolic representations. It is often difficult to obtain such structured data in real-world scenarios. Although a number of approaches have been developed for learning planning models from fully observed unstructured data (e.g., images), in many scenarios raw observations are often incomplete. In this paper, we provide a novel framework, \aType{Recplan}, for learning a transition model from partially observed raw image traces. More specifically, by considering the preceding and subsequent images in a trace, we learn the latent state representations of raw observations and then build a transition model based on such representations. Additionally, we propose a neural-network-based approach to learn a heuristic model that estimates the distance toward a given goal observation. Based on the learned transition model and heuristic model, we implement a classical planner for images. We exhibit empirically that our approach is more effective than a state-of-the-art approach of learning visual planning models in the environment with incomplete observations. △ Less

Submitted 25 November, 2022; originally announced November 2022.

Comments: 25 pages, 5 figures

arXiv:2211.14843 [pdf, other]

Learning Object-Language Alignments for Open-Vocabulary Object Detection

Authors: Chuang Lin, Peize Sun, Yi Jiang, ** Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai

Abstract: Existing object detection methods are bounded in a fixed-set vocabulary by costly labeled data. When dealing with novel categories, the model has to be retrained with more bounding box annotations. Natural language supervision is an attractive alternative for its annotation-free attributes and broader object concepts. However, learning open-vocabulary object detection from language is challenging… ▽ More Existing object detection methods are bounded in a fixed-set vocabulary by costly labeled data. When dealing with novel categories, the model has to be retrained with more bounding box annotations. Natural language supervision is an attractive alternative for its annotation-free attributes and broader object concepts. However, learning open-vocabulary object detection from language is challenging since image-text pairs do not contain fine-grained object-language alignments. Previous solutions rely on either expensive grounding annotations or distilling classification-oriented vision models. In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data. We formulate object-language alignment as a set matching problem between a set of image region features and a set of word embeddings. It enables us to train an open-vocabulary object detector on image-text pairs in a much simple and effective way. Extensive experiments on two benchmark datasets, COCO and LVIS, demonstrate our superior performance over the competing approaches on novel categories, e.g. achieving 32.0% mAP on COCO and 21.7% mask mAP on LVIS. Code is available at: https://github.com/clin1223/VLDet. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: Technical Report

arXiv:2211.14742 [pdf, other]

Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification

Authors: YuTeng Ye, Hang Zhou, Jiale Cai, Chenxing Gao, Youjia Zhang, Junle Wang, Qiang Hu, Junqing Yu, Wei Yang

Abstract: Occluded person re-identification (ReID) is a challenging problem due to contamination from occluders. Existing approaches address the issue with prior knowledge cues, such as human body key points and semantic segmentations, which easily fail in the presence of heavy occlusion and other humans as occluders. In this paper, we propose a feature pruning and consolidation (FPC) framework to circumven… ▽ More Occluded person re-identification (ReID) is a challenging problem due to contamination from occluders. Existing approaches address the issue with prior knowledge cues, such as human body key points and semantic segmentations, which easily fail in the presence of heavy occlusion and other humans as occluders. In this paper, we propose a feature pruning and consolidation (FPC) framework to circumvent explicit human structure parsing. The framework mainly consists of a sparse encoder, a multi-view feature mathcing module, and a feature consolidation decoder. Specifically, the sparse encoder drops less important image tokens, mostly related to background noise and occluders, solely based on correlation within the class token attention. Subsequently, the matching stage relies on the preserved tokens produced by the sparse encoder to identify k-nearest neighbors in the gallery by measuring the image and patch-level combined similarity. Finally, we use the feature consolidation module to compensate pruned features using identified neighbors for recovering essential information while disregarding disturbance from noise and occlusion. Experimental results demonstrate the effectiveness of our proposed framework on occluded, partial, and holistic Re-ID datasets. In particular, our method outperforms state-of-the-art results by at least 8.6\% mAP and 6.0\% Rank-1 accuracy on the challenging Occluded-Duke dataset. △ Less

Submitted 20 December, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: Accepted by AAAI-24

arXiv:2211.12845 [pdf, other]

doi 10.1007/s41095-023-0387-8

Super-resolution Reconstruction of Single Image for Latent features

Authors: Xin Wang, **g-Ke Yan, **g-Ye Cai, Jian-Hua Deng, Qin Qin, Yao Cheng

Abstract: Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of… ▽ More Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image. However, during SISR tasks, it is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features. This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling. To address these problems, this paper proposes a Latent Feature-oriented Diffusion Probability Model (LDDPM). First, we designed a conditional encoder capable of effectively encoding LR images, reducing the solution space for model image reconstruction and thereby improving the quality of the reconstructed images. We then employed a normalized flow and multimodal adversarial training, learning from complex multimodal distributions, to model the denoising distribution. Doing so boosts the generative modeling capabilities within a minimal number of sampling steps. Experimental comparisons of our proposed model with existing SISR methods on mainstream datasets demonstrate that our model reconstructs more realistic HR images and achieves better performance on multiple evaluation metrics, providing a fresh perspective for tackling SISR tasks. △ Less

Submitted 9 November, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Journal ref: Computational Visual Media,2023

arXiv:2211.11144 [pdf]

Coarse-Super-Resolution-Fine Network (CoSF-Net): A Unified End-to-End Neural Network for 4D-MRI with Simultaneous Motion Estimation and Super-Resolution

Authors: Shaohua Zhi, Yinghui Wang, Haonan Xiao, Ti Bai, Hong Ge, Bing Li, Chenyang Liu, Wen Li, Tian Li, **g Cai

Abstract: Four-dimensional magnetic resonance imaging (4D-MRI) is an emerging technique for tumor motion management in image-guided radiation therapy (IGRT). However, current 4D-MRI suffers from low spatial resolution and strong motion artifacts owing to the long acquisition time and patients' respiratory variations; these limitations, if not managed properly, can adversely affect treatment planning and del… ▽ More Four-dimensional magnetic resonance imaging (4D-MRI) is an emerging technique for tumor motion management in image-guided radiation therapy (IGRT). However, current 4D-MRI suffers from low spatial resolution and strong motion artifacts owing to the long acquisition time and patients' respiratory variations; these limitations, if not managed properly, can adversely affect treatment planning and delivery in IGRT. Herein, we developed a novel deep learning framework called the coarse-super-resolution-fine network (CoSF-Net) to achieve simultaneous motion estimation and super-resolution in a unified model. We designed CoSF-Net by fully excavating the inherent properties of 4D-MRI, with consideration of limited and imperfectly matched training datasets. We conducted extensive experiments on multiple real patient datasets to verify the feasibility and robustness of the developed network. Compared with existing networks and three state-of-the-art conventional algorithms, CoSF-Net not only accurately estimated the deformable vector fields between the respiratory phases of 4D-MRI but also simultaneously improved the spatial resolution of 4D-MRI with enhanced anatomic features, yielding 4D-MR images with high spatiotemporal resolution. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2211.08615 [pdf, other]

GLFF: Global and Local Feature Fusion for AI-synthesized Image Detection

Authors: Yan Ju, Shan Jia, Jialing Cai, Haiying Guan, Siwei Lyu

Abstract: With the rapid development of deep generative models (such as Generative Adversarial Networks and Diffusion models), AI-synthesized images are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-proce… ▽ More With the rapid development of deep generative models (such as Generative Adversarial Networks and Diffusion models), AI-synthesized images are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processing, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) framework to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for AI synthesized image detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a synthesized image dataset simulating real-world applications for evaluation, we further create a challenging fake image dataset, named DeepFakeFaceForensics (DF 3 ), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF 3 dataset and three other open-source datasets. △ Less

Submitted 4 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: 13 pages, 6 figures, 8 tables

arXiv:2211.08144 [pdf, other]

Monocular BEV Perception of Road Scenes via Front-to-Top View Projection

Authors: Wenxi Liu, Qi Li, Weixiang Yang, Jiaxin Cai, Yuanlong Yu, Yuexin Ma, Shengfeng He, Jia Pan

Abstract: HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to expensive sensors and time-consuming computation. Camera-based methods usually need to perform road segmentation and view transformation separately, which often causes distortion and missing content. To push the limits of the technology, we present a novel framework that reconstructs a local map formed… ▽ More HD map reconstruction is crucial for autonomous driving. LiDAR-based methods are limited due to expensive sensors and time-consuming computation. Camera-based methods usually need to perform road segmentation and view transformation separately, which often causes distortion and missing content. To push the limits of the technology, we present a novel framework that reconstructs a local map formed by road layout and vehicle occupancy in the bird's-eye view given a front-view monocular image only. We propose a front-to-top view projection (FTVP) module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding. In addition, we also apply multi-scale FTVP modules to propagate the rich spatial information of low-level features to mitigate spatial deviation of the predicted object location. Experiments on public benchmarks show that our method achieves the state-of-the-art performance in the tasks of road layout estimation, vehicle occupancy estimation, and multi-class semantic estimation. For multi-class semantic estimation, in particular, our model outperforms all competitors by a large margin. Furthermore, our model runs at 25 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: Extension to CVPR'21 paper "Projecting Your View Attentively: Monocular Road Scene Layout Estimation via Cross-View Transformation"

arXiv:2211.06627 [pdf, other]

MARLIN: Masked Autoencoder for facial video Representation LearnINg

Authors: Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

Abstract: This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust… ▽ More This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our code and models are available at https://github.com/ControlNet/MARLIN . △ Less

Submitted 22 March, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: CVPR 2023

arXiv:2211.05783 [pdf, other]

Unifying Flow, Stereo and Depth Estimation

Authors: Haofei Xu, **g Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, Dacheng Tao, Andreas Geiger

Abstract: We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature… ▽ More We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed. △ Less

Submitted 26 July, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

Comments: TPAMI 2023, Project Page: https://haofeixu.github.io/unimatch, Code: https://github.com/autonomousvision/unimatch, Demo: https://huggingface.co/spaces/haofeixu/unimatch

arXiv:2211.02916 [pdf, other]

Electronic properties of monolayer copper selenide with one-dimensional moiré patterns

Authors: Gefei Niu, Jianchen Lu, Jianqun Geng, Shicheng Li, Hui Zhang, Wei Xiong, Zilin Ruan, Yong Zhang, Boyu Fu, Lei Gao, **ming Cai

Abstract: Strain engineering is a vital way to manipulate the electronic properties of two-dimensional (2D) materials. As a typical representative of transition metal mono-chalcogenides (TMMs), a honeycomb CuSe monolayer features with one-dimensional (1D) moiré patterns owing to the uniaxial strain along one of three equivalent orientations of Cu(111) substrates. Here, by combining low-temperature scanning… ▽ More Strain engineering is a vital way to manipulate the electronic properties of two-dimensional (2D) materials. As a typical representative of transition metal mono-chalcogenides (TMMs), a honeycomb CuSe monolayer features with one-dimensional (1D) moiré patterns owing to the uniaxial strain along one of three equivalent orientations of Cu(111) substrates. Here, by combining low-temperature scanning tunneling microscopy/spectroscopy (STM/S) experiments and density functional theory (DFT) calculations, we systematically investigate the electronic properties of the strained CuSe monolayer on the Cu(111) substrate. Our results show the semiconducting feature of CuSe monolayer with a band gap of 1.28 eV and the 1D periodical modulation of electronic properties by the 1D moiré patterns. Except for the uniaxially strained CuSe monolayer, we observed domain boundary and line defects in the CuSe monolayer, where the biaxial-strain and strain-free conditions can be investigated respectively. STS measurements for the three different strain regions show that the first peak in conduction band will move downward with the increasing strain. DFT calculations based on the three CuSe atomic models with different strain inside reproduced the peak movement. The present findings not only enrich the fundamental comprehension toward the influence of strain on electronic properties at 2D limit, but also offer the benchmark for the development of 2D semiconductor materials. △ Less

Submitted 5 November, 2022; originally announced November 2022.

Comments: 14 pages, 12 figures, 25 reference

arXiv:2211.01547 [pdf, other]

A Systematic Paradigm for Detecting, Surfacing, and Characterizing Heterogeneous Treatment Effects (HTE)

Authors: John Cai, Weinan Wang

Abstract: To effectively optimize and personalize treatments, it is necessary to investigate the heterogeneity of treatment effects. With the wide range of users being treated over many online controlled experiments, the typical approach of manually investigating each dimension of heterogeneity becomes overly cumbersome and prone to subjective human biases. We need an efficient way to search through thousan… ▽ More To effectively optimize and personalize treatments, it is necessary to investigate the heterogeneity of treatment effects. With the wide range of users being treated over many online controlled experiments, the typical approach of manually investigating each dimension of heterogeneity becomes overly cumbersome and prone to subjective human biases. We need an efficient way to search through thousands of experiments with hundreds of target covariates and hundreds of breakdown dimensions. In this paper, we propose a systematic paradigm for detecting, surfacing and characterizing heterogeneous treatment effects. First, we detect if treatment effect variation is present in an experiment, prior to specifying any breakdowns. Second, we surface the most relevant dimensions for heterogeneity. Finally, we characterize the heterogeneity beyond just the conditional average treatment effects (CATE) by studying the conditional distributions of the estimated individual treatment effects. We show the effectiveness of our methods using simulated data and empirical studies. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 6 pages, 6 figures

Journal ref: 2022 Conference on Digital Experimentation

arXiv:2210.17438 [pdf]

doi 10.1021/acs.nanolett.2c03701

Photo-accelerated water dissociation across one-atom-thick electrodes

Authors: J. Cai, E. Griffin, V. Guarochico-Moreira, D. Barry, B. Xin, S. Huang, A. K. Geim, F. M. Peeters, M. Lozada-Hidalgo

Abstract: Recent experiments demonstrated that interfacial water dissociation (H2O = H+ + OH-) could be accelerated exponentially by an electric field applied to graphene electrodes, a phenomenon related to the Wien effect. Here we report an order-of-magnitude acceleration of the interfacial water dissociation reaction under visible-light illumination. This process is accompanied by spatial separation of pr… ▽ More Recent experiments demonstrated that interfacial water dissociation (H2O = H+ + OH-) could be accelerated exponentially by an electric field applied to graphene electrodes, a phenomenon related to the Wien effect. Here we report an order-of-magnitude acceleration of the interfacial water dissociation reaction under visible-light illumination. This process is accompanied by spatial separation of protons and hydroxide ions across one-atom-thick graphene and enhanced by strong interfacial electric fields. The found photo-effect is attributed to the combination of graphene's perfect selectivity with respect to protons, which prevents proton-hydroxide recombination, and to proton transport acceleration by the Wien effect, which occurs in synchrony with the water dissociation reaction. Our findings provide fundamental insights into ion dynamics near atomically-thin proton-selective interfaces and suggest that strong interfacial fields can enhance and tune very fast ionic processes, which is of relevance for applications in photo-catalysis and designing reconfigurable materials. △ Less

Submitted 31 October, 2022; originally announced October 2022.

Journal ref: Nano Letters (2022)

arXiv:2210.16514 [pdf]

doi 10.1097/RTI.0000000000000717

Extracting lung function-correlated information from CT-encoded static textures

Authors: Yu-Hua Huang, Xinzhi Teng, Jiang Zhang, Zhi Chen, Zongrui Ma, Ge Ren, Feng-Ming, Kong, **g Cai

Abstract: The inherent characteristics of lung tissues, which are independent of breathing manoeuvre, may provide fundamental information on lung function. This paper attempted to study function-correlated lung textures and their spatial distribution from CT. 21 lung cancer patients with thoracic 4DCT scans, DTPA-SPECT ventilation images (V), and available pulmonary function test (PFT) measurements were col… ▽ More The inherent characteristics of lung tissues, which are independent of breathing manoeuvre, may provide fundamental information on lung function. This paper attempted to study function-correlated lung textures and their spatial distribution from CT. 21 lung cancer patients with thoracic 4DCT scans, DTPA-SPECT ventilation images (V), and available pulmonary function test (PFT) measurements were collected. 79 radiomic features were included for analysis, and a sparse-to-fine strategy including subregional feature discovery and voxel-wise feature distribution study was carried out to identify the function-correlated radiomic features. At the subregion level, lung CT images were partitioned and labeled as defected/non-defected patches according to reference V. At the voxel-wise level, feature maps (FMs) of selected feature candidates were generated for each 4DCT phase. Quantitative metrics, including Spearman coefficient of correlation (SCC) and Dice similarity coefficient (DSC) for FM-V spatial agreement assessments, intra-class coefficient of correlation (ICC) for FM robustness evaluations, and FM-PFT comparisons, were applied to validate the results. At the subregion level, eight function-correlated features were filtered out with medium-to-large statistical strength (effect size>0.330) to differentiate defected/non-defected lung regions. At the voxel-wise level, FMs of candidates yielded moderate-to-strong voxel-wise correlations with reference V. Among them, FMs of GLDM Dependence Non-uniformity showed the highest robust (ICC=0.96) spatial correlation, with median SCCs ranging from 0.54 to 0.59 throughout ten phases. Its phase-averaged FM achieved a median SCC of 0.60, the median DSC of 0.60/0.65 for high/low functional lung volumes, respectively, and the correlation of 0.646 between the spatially averaged feature values and PFT measurements. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: 6 figures, 4 tables

arXiv:2210.15201 [pdf, other]

Multi-view Contrastive Learning with Additive Margin for Adaptive Nasopharyngeal Carcinoma Radiotherapy Prediction

Authors: Jiabao Sheng, Yuanpeng Zhang, **g Cai, Sai-Kit Lam, Zhe Li, Jiang Zhang, Xinzhi Teng

Abstract: The prediction of adaptive radiation therapy (ART) prior to radiation therapy (RT) for nasopharyngeal carcinoma (NPC) patients is important to reduce toxicity and prolong the survival of patients. Currently, due to the complex tumor micro-environment, a single type of high-resolution image can provide only limited information. Meanwhile, the traditional softmax-based loss is insufficient for quant… ▽ More The prediction of adaptive radiation therapy (ART) prior to radiation therapy (RT) for nasopharyngeal carcinoma (NPC) patients is important to reduce toxicity and prolong the survival of patients. Currently, due to the complex tumor micro-environment, a single type of high-resolution image can provide only limited information. Meanwhile, the traditional softmax-based loss is insufficient for quantifying the discriminative power of a model. To overcome these challenges, we propose a supervised multi-view contrastive learning method with an additive margin (MMCon). For each patient, four medical images are considered to form multi-view positive pairs, which can provide additional information and enhance the representation of medical images. In addition, the embedding space is learned by means of contrastive learning. NPC samples from the same patient or with similar labels will remain close in the embedding space, while NPC samples with different labels will be far apart. To improve the discriminative ability of the loss function, we incorporate a margin into the contrastive learning. Experimental result show this new learning objective can be used to find an embedding space that exhibits superior discrimination ability for NPC images. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: submitted to ICASSP 2023, 5 pages

arXiv:2210.14976 [pdf]

Dynamic Hardness Evolution in Metals from Impact Induced Gradient Dislocation Density

Authors: Jizhe Cai, Claire Griesbach, Savannah G. Ahnen, Ramathasan Thevamaran

Abstract: A clear understanding of the dynamic behavior of metals is critical for develo** superior structural materials as well as for improving material processing techniques such as cold spray and shot peening. Using a high velocity (from 120 m/s to 700 m/s; strain rates >10^7 1/s) micro-projectile impact testing and quasistatic (strain rates: 10^-2 1/s) nanoindentation, we investigate the strain-rate-… ▽ More A clear understanding of the dynamic behavior of metals is critical for develo** superior structural materials as well as for improving material processing techniques such as cold spray and shot peening. Using a high velocity (from 120 m/s to 700 m/s; strain rates >10^7 1/s) micro-projectile impact testing and quasistatic (strain rates: 10^-2 1/s) nanoindentation, we investigate the strain-rate-dependent mechanical behavior of single-crystal aluminum substrates with (001), (011), and (111) crystal orientations. For all three crystal orientations, the dynamic hardness initially increases with increasing impact velocity and reaches a plateau regime at hardness 5 times higher than that of at quasistatic indentations. Based on coefficient of restitution and post mortem transmission Kikuchi diffraction analyses, we show that distinct plastic deformation mechanisms with a gradient dislocation density evolution govern the dynamic behavior. We also discover a distinct deformation regime-stable plastic regime-that emerge beyond the deeply plastic regime with unique strain rate insensitive microstructure evolution and dynamic hardness. Our work additionally demonstrates an effective approach to introduce strong spatial gradient in dislocation density in metals by high-velocity projectile impacts to enhance surface mechanical properties, as it can be employed in material processing techniques such as shot peening and surface mechanical attrition treatment. △ Less

Submitted 21 February, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.14628 [pdf, ps, other]

doi 10.1088/1361-6420/acd8b8

Provable Sample-Efficient Sparse Phase Retrieval Initialized by Truncated Power Method

Authors: Jian-Feng Cai, **gyang Li, Juntao You

Abstract: We study the sparse phase retrieval problem, recovering an $s$-sparse length-$n$ signal from $m$ magnitude-only measurements. Two-stage non-convex approaches have drawn much attention in recent studies for this problem. Despite non-convexity, many two-stage algorithms provably converge to the underlying solution linearly when appropriately initialized. However, in terms of sample complexity, the b… ▽ More We study the sparse phase retrieval problem, recovering an $s$-sparse length-$n$ signal from $m$ magnitude-only measurements. Two-stage non-convex approaches have drawn much attention in recent studies for this problem. Despite non-convexity, many two-stage algorithms provably converge to the underlying solution linearly when appropriately initialized. However, in terms of sample complexity, the bottleneck of those algorithms often comes from the initialization stage. Although the refinement stage usually needs only $m=Ω(s\log n)$ measurements, the widely used spectral initialization in the initialization stage requires $m=Ω(s^2\log n)$ measurements to produce a desired initial guess, which causes the total sample complexity order-wisely more than necessary. To reduce the number of measurements, we propose a truncated power method to replace the spectral initialization for non-convex sparse phase retrieval algorithms. We prove that $m=Ω(\bar{s} s\log n)$ measurements, where $\bar{s}$ is the stable sparsity of the underlying signal, are sufficient to produce a desired initial guess. When the underlying signal contains only very few significant components, the sample complexity of the proposed algorithm is $m=Ω(s\log n)$ and optimal. Numerical experiments illustrate that the proposed method is more sample-efficient than state-of-the-art algorithms. △ Less

Submitted 27 April, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.12697 [pdf, ps, other]

doi 10.1088/1367-2630/acc608

Tetragonal Mexican-Hat Dispersion and Switchable Half-Metal State with Multiple Anisotropic Weyl Fermions in Penta-Graphene

Authors: Ningning Jia, Yongting Shi, Zhiheng Lv, Junting Qin, Jiangtao Cai, Xue Jiang, Jijun Zhao, Zhifeng Liu

Abstract: In past decades, the ever-expanding library of 2D carbon allotropes has yielded a broad range of exotic properties for the future carbon-based electronics. However, the known allotropes are all intrinsic nonmagnetic due to the paired valence electrons configuration. Based on the reported 2D carbon structure database and first-principles calculations, herein we demonstrate that inherent ferromagnet… ▽ More In past decades, the ever-expanding library of 2D carbon allotropes has yielded a broad range of exotic properties for the future carbon-based electronics. However, the known allotropes are all intrinsic nonmagnetic due to the paired valence electrons configuration. Based on the reported 2D carbon structure database and first-principles calculations, herein we demonstrate that inherent ferromagnetism can be obtained in the prominent allotrope, penta-graphene, which has an unique Mexican-hat valence band edge, giving rise to van Hove singularities and electronic instability. Induced by modest hole-do**, being achievable in electrolyte gate, the semiconducting pentagraphene can transform into different ferromagnetic half-metals with room temperature stability and switchable spin directions. In particular, multiple anisotropic Weyl states, including type-I and type-II Weyl cones and hybrid quasi Weyl nodal loop, can be found in a sizable energy window of spin-down half-metal under proper strains. These findings not only identify a promising carbon allotrope to obtain the inherent magnetism for carbon-based spintronic devices, but highlight the possibility to realize different Weyl states by combining the electronic and mechanical means as well. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.11940 [pdf, other]

JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking

Authors: Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi

Abstract: Autonomous robotic systems operating in human environments must understand their surroundings to make accurate and safe decisions. In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking. However, existing datasets either do not provide pos… ▽ More Autonomous robotic systems operating in human environments must understand their surroundings to make accurate and safe decisions. In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking. However, existing datasets either do not provide pose annotations or include scene types unrelated to robotic applications. Many datasets also lack the diversity of poses and occlusions found in crowded human scenes. To address this limitation we introduce JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation and tracking using videos captured from a social navigation robot. The dataset contains challenge scenes with crowded indoor and outdoor locations and a diverse range of scales and occlusion types. JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene. A public evaluation server is made available for fair evaluation on a held-out test set. JRDB-Pose is available at https://jrdb.erc.monash.edu/ . △ Less

Submitted 11 March, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

Comments: 13 pages, 11 figures

arXiv:2210.07932 [pdf, other]

Neural Routing in Meta Learning

Authors: Jicang Cai, Saeed Vahidian, Weijia Wang, Mohsen Joneidi, Bill Lin

Abstract: Meta-learning often referred to as learning-to-learn is a promising notion raised to mimic human learning by exploiting the knowledge of prior tasks but being able to adapt quickly to novel tasks. A plethora of models has emerged in this context and improved the learning efficiency, robustness, etc. The question that arises here is can we emulate other aspects of human learning and incorporate the… ▽ More Meta-learning often referred to as learning-to-learn is a promising notion raised to mimic human learning by exploiting the knowledge of prior tasks but being able to adapt quickly to novel tasks. A plethora of models has emerged in this context and improved the learning efficiency, robustness, etc. The question that arises here is can we emulate other aspects of human learning and incorporate them into the existing meta learning algorithms? Inspired by the widely recognized finding in neuroscience that distinct parts of the brain are highly specialized for different types of tasks, we aim to improve the model performance of the current meta learning algorithms by selectively using only parts of the model conditioned on the input tasks. In this work, we describe an approach that investigates task-dependent dynamic neuron selection in deep convolutional neural networks (CNNs) by leveraging the scaling factor in the batch normalization (BN) layer associated with each convolutional layer. The problem is intriguing because the idea of hel** different parts of the model to learn from different types of tasks may help us train better filters in CNNs, and improve the model generalization performance. We find that the proposed approach, neural routing in meta learning (NRML), outperforms one of the well-known existing meta learning baselines on few-shot classification tasks on the most widely used benchmark datasets. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.07862 [pdf, other]

Unsupervised Dense Nuclei Detection and Segmentation with Prior Self-activation Map For Histology Images

Authors: **yi Chen, Chenglu Zhu, Zhongyi Shui, Jiatong Cai, Sunyi Zheng, Shichuan Zhang, Lin Yang

Abstract: The success of supervised deep learning models in medical image segmentation relies on detailed annotations. However, labor-intensive manual labeling is costly and inefficient, especially in dense object segmentation. To this end, we propose a self-supervised learning based approach with a Prior Self-activation Module (PSM) that generates self-activation maps from the input images to avoid labelin… ▽ More The success of supervised deep learning models in medical image segmentation relies on detailed annotations. However, labor-intensive manual labeling is costly and inefficient, especially in dense object segmentation. To this end, we propose a self-supervised learning based approach with a Prior Self-activation Module (PSM) that generates self-activation maps from the input images to avoid labeling costs and further produce pseudo masks for the downstream task. To be specific, we firstly train a neural network using self-supervised learning and utilize the gradient information in the shallow layers of the network to generate self-activation maps. Afterwards, a semantic-guided generator is then introduced as a pipeline to transform visual representations from PSM to pixel-level semantic pseudo masks for downstream tasks. Furthermore, a two-stage training module, consisting of a nuclei detection network and a nuclei segmentation network, is adopted to achieve the final segmentation. Experimental results show the effectiveness on two public pathological datasets. Compared with other fully-supervised and weakly-supervised methods, our method can achieve competitive performance without any manual annotations. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.04717 [pdf, other]

Quantum state tomography via non-convex Riemannian gradient descent

Authors: Ming-Chien Hsu, En-Jui Kuo, Wei-Hsuan Yu, Jian-Feng Cai, Min-Hsiu Hsieh

Abstract: The recovery of an unknown density matrix of large size requires huge computational resources. The recent Factored Gradient Descent (FGD) algorithm and its variants achieved state-of-the-art performance since they could mitigate the dimensionality barrier by utilizing some of the underlying structures of the density matrix. Despite their theoretical guarantee of a linear convergence rate, the conv… ▽ More The recovery of an unknown density matrix of large size requires huge computational resources. The recent Factored Gradient Descent (FGD) algorithm and its variants achieved state-of-the-art performance since they could mitigate the dimensionality barrier by utilizing some of the underlying structures of the density matrix. Despite their theoretical guarantee of a linear convergence rate, the convergence in practical scenarios is still slow because the contracting factor of the FGD algorithms depends on the condition number $κ$ of the ground truth state. Consequently, the total number of iterations can be as large as $O(\sqrtκ\ln(\frac{1}{\varepsilon}))$ to achieve the estimation error $\varepsilon$. In this work, we derive a quantum state tomography scheme that improves the dependence on $κ$ to the logarithmic scale; namely, our algorithm could achieve the approximation error $\varepsilon$ in $O(\ln(\frac{1}{κ\varepsilon}))$ steps. The improvement comes from the application of the non-convex Riemannian gradient descent (RGD). The contracting factor in our approach is thus a universal constant that is independent of the given state. Our theoretical results of extremely fast convergence and nearly optimal error bounds are corroborated by numerical results. △ Less

Submitted 10 October, 2022; originally announced October 2022.

Comments: Comments are welcome!

arXiv:2210.04136 [pdf]

doi 10.1038/s41467-023-36488-y

Creation of Chiral Interface Channels for Quantized Transport in Magnetic Topological Insulator Multilayer Heterostructures

Authors: Yi-Fan Zhao, Ruoxi Zhang, Jiaqi Cai, Deyi Zhuo, Ling-Jie Zhou, Zi-Jie Yan, Moses H. W. Chan, Xiaodong Xu, Cui-Zu Chang

Abstract: One-dimensional (1D) topologically protected states are usually formed at the interface between two-dimensional (2D) materials with different topological invariants. Therefore, 1D chiral interface channels (CICs) can be created at the boundary of two quantum anomalous Hall (QAH) insulators with different Chern numbers. Such a QAH junction can function as a chiral edge current distributer at zero m… ▽ More One-dimensional (1D) topologically protected states are usually formed at the interface between two-dimensional (2D) materials with different topological invariants. Therefore, 1D chiral interface channels (CICs) can be created at the boundary of two quantum anomalous Hall (QAH) insulators with different Chern numbers. Such a QAH junction can function as a chiral edge current distributer at zero magnetic field, but its realization remains challenging. Here, by employing an in-situ mechanical mask, we use molecular beam epitaxy (MBE) to synthesize QAH insulator junctions, in which two QAH insulators with different Chern numbers are connected along a 1D junction. For the junction between C = 1 and C = -1 QAH insulators, we observe quantized transport and demonstrate the appearance of the two parallel propagating CICs along the magnetic domain wall at zero magnetic field. Moreover, since the Chern number of the QAH insulators in magnetic topological insulator (TI)/TI multilayers can be tuned by altering magnetic TI/TI bilayer periods, the junction between two QAH insulators with arbitrary Chern numbers can be achieved by growing different periods of magnetic TI/TI on the two sides of the sample. For the junction between C = 1 and C = 2 QAH insulators, our quantized transport shows that a single CIC appears at the interface. Our work lays down the foundation for the development of QAH insulator-based electronic and spintronic devices, topological chiral networks, and topological quantum computations. △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: 20 pages, 4 figures, comments are welcome

Journal ref: Nature Commun. 14, 770 (2023)

arXiv:2210.03222 [pdf, other]

doi 10.1103/PhysRevLett.129.132701

Deep underground laboratory measurement of $^{13}$C($α$,$n$)$^{16}$O in the Gamow windows of the $s$- and $i$-processes

Authors: B. Gao, T. Y. Jiao, Y. T. Li, H. Chen, W. P. Lin, Z. An, L. H. Ru, Z. C. Zhang, X. D. Tang, X. Y. Wang, N. T. Zhang, X. Fang, D. H. Xie, Y. H. Fan, L. Ma, X. Zhang, F. Bai, P. Wang, Y. X. Fan, G. Liu, H. X. Huang, Q. Wu, Y. B. Zhu, J. L. Chai, J. Q. Li , et al. (50 additional authors not shown)

Abstract: The $^{13}$C($α$,$n$)$^{16}$O reaction is the main neutron source for the slow-neutron-capture (s-) process in Asymptotic Giant Branch stars and for the intermediate (i-) process. Direct measurements at astrophysical energies in above-ground laboratories are hindered by the extremely small cross sections and vast cosmic-ray induced background. We performed the first consistent direct measurement i… ▽ More The $^{13}$C($α$,$n$)$^{16}$O reaction is the main neutron source for the slow-neutron-capture (s-) process in Asymptotic Giant Branch stars and for the intermediate (i-) process. Direct measurements at astrophysical energies in above-ground laboratories are hindered by the extremely small cross sections and vast cosmic-ray induced background. We performed the first consistent direct measurement in the range of $E_{\rm c.m.}=$0.24 MeV to 1.9 MeV using the accelerators at the China **** Underground Laboratory (CJPL) and Sichuan University. Our measurement covers almost the entire i-process Gamow window in which the large uncertainty of the previous experiments has been reduced from 60\% down to 15\%, eliminates the large systematic uncertainty in the extrapolation arising from the inconsistency of existing data sets, and provides a more reliable reaction rate for the studies of the s- and i-processes along with the first direct determination of the alpha strength for the near-threshold state. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Journal ref: Physical Review Letters 129, 132701 (2022)

arXiv:2210.01338 [pdf, other]

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

Authors: Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

Abstract: Humans tend to decompose a sentence into different parts like \textsc{sth do sth at someplace} and then fill each part with certain content. Inspired by this, we follow the \textit{principle of modular design} to propose a novel image captioner: learning to Collocate Visual-Linguistic Neural Modules (CVLNM). Unlike the \re{widely used} neural module networks in VQA, where the language (\ie, questi… ▽ More Humans tend to decompose a sentence into different parts like \textsc{sth do sth at someplace} and then fill each part with certain content. Inspired by this, we follow the \textit{principle of modular design} to propose a novel image captioner: learning to Collocate Visual-Linguistic Neural Modules (CVLNM). Unlike the \re{widely used} neural module networks in VQA, where the language (\ie, question) is fully observable, \re{the task of collocating visual-linguistic modules is more challenging.} This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning. To sum up, we make the following technical contributions to design and train our CVLNM: 1) \textit{distinguishable module design} -- \re{four modules in the encoder} including one linguistic module for function words and three visual modules for different content words (\ie, noun, adjective, and verb) and another linguistic one in the decoder for commonsense reasoning, 2) a self-attention based \textit{module controller} for robustifying the visual reasoning, 3) a part-of-speech based \textit{syntax loss} imposed on the module controller for further regularizing the training of our CVLNM. Extensive experiments on the MS-COCO dataset show that our CVLNM is more effective, \eg, achieving a new state-of-the-art 129.5 CIDEr-D, and more robust, \eg, being less likely to overfit to dataset bias and suffering less when fewer training samples are available. Codes are available at \url{https://github.com/GCYZSL/CVLMN} △ Less

Submitted 23 April, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: Accepted to IJCV. Codes are available at https://github.com/GCYZSL/CVLMN

arXiv:2209.15632 [pdf, other]

ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing

Authors: Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Junzhe Zhang

Abstract: Sketch-and-extrude is a common and intuitive modeling process in computer aided design. This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude. We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. Behind ExtrudeNet are two new technical components: 1) an effective representation… ▽ More Sketch-and-extrude is a common and intuitive modeling process in computer aided design. This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude. We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. Behind ExtrudeNet are two new technical components: 1) an effective representation for sketch and extrude, which can model extrusion with freeform sketches and conventional cylinder and box primitives as well; and 2) a numerical method for computing the signed distance field which is used in the network learning. This is the first attempt that uses machine learning to reverse engineer the sketch-and-extrude modeling process of a shape in an unsupervised fashion. ExtrudeNet not only outputs a compact, editable and interpretable representation of the shape that can be seamlessly integrated into modern CAD software, but also aligns with the standard CAD modeling process facilitating various editing applications, which distinguishes our work from existing shape parsing research. Code is released at https://github.com/kimren227/ExtrudeNet. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: Accepted to ECCV 2022

arXiv:2209.13947 [pdf, ps, other]

$^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) Reaction Cross Section Measurements using Laser-Driven Ultra-Intense $γ$-Ray Source

Authors: D. Wu, H. Y. Lan, J. Y. Zhang, J. X. Liu, H. G. Lu, J. F. Lv, X. Z. Wu, H. Zhang, J. Cai, Q. Y. Ma, Y. H. Xia, Z. N. Wang, M. Z. Wang, Z. Y. Yang, X. L. Xu, Y. X. Geng, Y. Y. Zhao, C. Lin, W. J. Ma, J. Q. Yu, H. R. Wang, F. L. Liu, C. Y. He, B. Guo, P. Zhu , et al. (4 additional authors not shown)

Abstract: We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover th… ▽ More We present a new method for the measurements of photonuclear reaction flux-weighted average cross sections and isomeric ratios using a laser-driven bremsstrahlung $γ$-ray source. An ultra-bright ultra-fast 60$\,\thicksim\,$250 MeV bremsstrahlung $γ$-ray source was established using the 200 TW laser facility in the Compact Laser Plasma Accelerator Laboratory, Peking University, which could cover the energy range from knocking out neutrons to producing pions. Stable quasi-monoenergetic electron beams were generated via laser wakefield acceleration with a charge of 300$\,\thicksim\,$600 pC per shot. The averaged $γ$-ray intensities ($\geqslant$8 MeV) were higher than 10$^{8}$ per shot and the instantaneous intensities can reach above 10$^{19}$ s$^{-1}$ with a duration time about 6.7 ps. $^{65}$Cu($γ,\,n$)$^{64}$Cu and $^{27}$Al($γ,\,x$)$^{24}$Na reactions were used as $γ$-ray flux monitors in the experiments. The flux-weighted average cross sections and isomeric ratios of $^{197}$Au($γ,\,xn;\,x\,=\,1\thicksim9$) reactions were analyzed through activation measurements. The results showed good agreement with previous works and proved this method to be accurate. The $^{197}$Au($γ,\,xn;\,x\,=\,7\thicksim\,9$) reaction cross sections were first achieved with the highest threshold energy of 71.410 MeV. Theoretical cross sections of TALYS 1.9 were calculated to compare with experiment results. This method offered a unique way of gaining insight into photonuclear reaction research, especially for short-lived isomers which extremely lack experimental data. △ Less

Submitted 23 November, 2023; v1 submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.13925 [pdf, other]

doi 10.1145/3503161.3548395

DeViT: Deformed Vision Transformers in Video Inpainting

Authors: Jiayin Cai, Changlin Li, Xin Tao, Chun Yuan, Yu-Wing Tai

Abstract: This paper proposes a novel video inpainting method. We make three main contributions: First, we extended previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH), which improves patch-level feature alignments without additional supervision and benefits challenging scenes with various deformation. Second, we introduce Mask Pruning-based Patch Attention (MPP… ▽ More This paper proposes a novel video inpainting method. We make three main contributions: First, we extended previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH), which improves patch-level feature alignments without additional supervision and benefits challenging scenes with various deformation. Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching by pruning out less essential features and using saliency map. MPPA enhances matching accuracy between warped tokens with invalid pixels. Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens under the guidance of the Deformation Factor learned from DePtH, especially for videos with agile motions. Experimental results demonstrate that our method outperforms recent methods qualitatively and quantitatively and achieves a new state-of-the-art. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Journal ref: ACMMM'22, October 10-14, 2022, Lisboa, Portugal

arXiv:2209.10907 [pdf, other]

DRKF: Distilled Rotated Kernel Fusion for Efficient Rotation Invariant Descriptors in Local Feature Matching

Authors: Ranran Huang, Jiancheng Cai, Chao Li, Zhuoyuan Wu, Xinmin Liu, Zhenhua Chai

Abstract: The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameteri… ▽ More The performance of local feature descriptors degrades in the presence of large rotation variations. To address this issue, we present an efficient approach to learning rotation invariant descriptors. Specifically, we propose Rotated Kernel Fusion (RKF) which imposes rotations on the convolution kernel to improve the inherent nature of CNN. Since RKF can be processed by the subsequent re-parameterization, no extra computational costs will be introduced in the inference stage. Moreover, we present Multi-oriented Feature Aggregation (MOFA) which aggregates features extracted from multiple rotated versions of the input image and can provide auxiliary knowledge for the training of RKF by leveraging the distillation strategy. We refer to the distilled RKF model as DRKF. Besides the evaluation on a rotation-augmented version of the public dataset HPatches, we also contribute a new dataset named DiverseBEV which is collected during the drone's flight and consists of bird's eye view images with large viewpoint changes and camera rotations. Extensive experiments show that our method can outperform other state-of-the-art techniques when exposed to large rotation variations. △ Less

Submitted 5 January, 2024; v1 submitted 22 September, 2022; originally announced September 2022.

Comments: 8 pages, 7 figures

arXiv:2209.09687 [pdf]

Frame Size Optimization Using a Machine Learning Approach in WLAN Downlink MU-MIMO Channel

Authors: Lemlem Kassa, Jianhua Deng, Mark Davis, **gye Cai

Abstract: The IEEE 802.11ac/n introduced frame aggregation technology to accommodate the growing traffic demand and increase the performance of transmission efficiency and channel utilization. This is achieved by allowing many packets to be aggregated per transmission which realized a significant enhancement in the throughput performance of WLAN. However, it is difficult to efficiently utilize the benefits… ▽ More The IEEE 802.11ac/n introduced frame aggregation technology to accommodate the growing traffic demand and increase the performance of transmission efficiency and channel utilization. This is achieved by allowing many packets to be aggregated per transmission which realized a significant enhancement in the throughput performance of WLAN. However, it is difficult to efficiently utilize the benefits of frame aggregation in the downlink MU-MIMO channels as stations have heterogeneous transmission demands and data transmission rates. As a result of this, wasted space channel time will occur which degrades transmission efficiency. In addressing these challenges, the existing studies have proposed different approaches. However, most of these approaches did not consider a machine-Learning based optimization solution. The main contribution of this paper is to propose a machine-learning-based frame size optimization solution to maximize the system throughput of WLAN in the downlink MU-MIMO channel. In this approach, the Access Point (AP) performs the maximum system throughput measurement and collected frame size-system throughput patterns which contain knowledge about the effects of traffic patterns, channel conditions, and number of stations(STAs). Based on these patterns,our approach uses a neural network to correctly model the system throughput as a function of the system frame size. After training the neural network, we obtain the gradient information to adjust the frame size. the performance of the proposed Machine learning(ML) approach is evaluated over the FIFO aggregation algorithm under the effects of heterogenous traffic patterns for VoIP and video applications, channel conditions, and number of stations. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: The 8th International Conference of Networks, Communications, Wireless and Mobile Computing (NCWC 2022), Sep. 2022, Copenhagen, Denmark

arXiv:2209.09004 [pdf, other]

EcoFormer: Energy-Saving Attention with Linear Complexity

Authors: **g Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

Abstract: Transformer is a transformative framework that models sequential data and has achieved remarkable performance on a wide range of tasks, but with high computational and energy cost. To improve its efficiency, a popular choice is to compress the models via binarization which constrains the floating-point values into binary ones to save resource consumption owing to cheap bitwise operations significa… ▽ More Transformer is a transformative framework that models sequential data and has achieved remarkable performance on a wide range of tasks, but with high computational and energy cost. To improve its efficiency, a popular choice is to compress the models via binarization which constrains the floating-point values into binary ones to save resource consumption owing to cheap bitwise operations significantly. However, existing binarization methods only aim at minimizing the information loss for the input distribution statistically, while ignoring the pairwise similarity modeling at the core of the attention. To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space. The kernelized hash functions are learned to match the ground-truth similarity relations extracted from the attention map in a self-supervised way. Based on the equivalence between the inner product of binary codes and the Hamming distance as well as the associative property of matrix multiplication, we can approximate the attention in linear complexity by expressing it as a dot-product of binary codes. Moreover, the compact binary representations of queries and keys enable us to replace most of the expensive multiply-accumulate operations in attention with simple accumulations to save considerable on-chip energy footprint on edge devices. Extensive experiments on both vision and language tasks show that EcoFormer consistently achieves comparable performance with standard attentions while consuming much fewer resources. For example, based on PVTv2-B0 and ImageNet-1K, Ecoformer achieves a 73% on-chip energy footprint reduction with only a 0.33% performance drop compared to the standard attention. Code is available at https://github.com/ziplab/EcoFormer. △ Less

Submitted 20 March, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: NeurIPS 2022 camera ready; First two authors contributed equally

arXiv:2209.09002 [pdf, other]

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

Authors: Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

Abstract: Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalizatio… ▽ More Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate the spatially conditional normalization to modulate the quantized vectors so as to insert spatially variant information to the embedded index maps, encouraging the decoder to generate more photorealistic images. Moreover, we use multichannel quantization to increase the recombination capability of the discrete codes without increasing the cost of model and codebook. Additionally, to generate discrete tokens at the second stage, we adopt a Masked Generative Image Transformer (MaskGIT) to learn an underlying prior distribution in the compressed latent space, which is much faster than the conventional autoregressive model. Experiments on two benchmark datasets demonstrate that our proposed modulated VQGAN is able to greatly improve the reconstructed image quality as well as provide high-fidelity image generation. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2209.06886 [pdf, ps, other]

Vectorized Adjoint Sensitivity Method for Graph Convolutional Neural Ordinary Differential Equations

Authors: Jack Cai

Abstract: This document, as the title stated, is meant to provide a vectorized implementation of adjoint dynamics calculation for Graph Convolutional Neural Ordinary Differential Equations (GCDE). The adjoint sensitivity method is the gradient approximation method for neural ODEs that replaces the back propagation. When implemented on libraries such as PyTorch or Tensorflow, the adjoint can be calculated by… ▽ More This document, as the title stated, is meant to provide a vectorized implementation of adjoint dynamics calculation for Graph Convolutional Neural Ordinary Differential Equations (GCDE). The adjoint sensitivity method is the gradient approximation method for neural ODEs that replaces the back propagation. When implemented on libraries such as PyTorch or Tensorflow, the adjoint can be calculated by autograd functions without the need for a hand-derived formula. In applications such as edge computing and in memristor crossbars, however, autograds are not available, and therefore we need a vectorized derivation of adjoint dynamics to efficiently map the system on hardware. This document will go over the basics, then move on to derive the vectorized adjoint dynamics for GCDE. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2209.06675 [pdf, other]

Volumetric-based Contact Point Detection for 7-DoF Gras**

Authors: Junhao Cai, **gcheng Su, Zida Zhou, Hui Cheng, Qifeng Chen, Michael Y Wang

Abstract: In this paper, we propose a novel grasp pipeline based on contact point detection on the truncated signed distance function (TSDF) volume to achieve closed-loop 7-degree-of-freedom (7-DoF) gras** on cluttered environments. The key aspects of our method are that 1) the proposed pipeline exploits the TSDF volume in terms of multi-view fusion, contact-point sampling and evaluation, and collision ch… ▽ More In this paper, we propose a novel grasp pipeline based on contact point detection on the truncated signed distance function (TSDF) volume to achieve closed-loop 7-degree-of-freedom (7-DoF) gras** on cluttered environments. The key aspects of our method are that 1) the proposed pipeline exploits the TSDF volume in terms of multi-view fusion, contact-point sampling and evaluation, and collision checking, which provides reliable and collision-free 7-DoF gripper poses with real-time performance; 2) the contact-based pose representation effectively eliminates the ambiguity introduced by the normal-based methods, which provides a more precise and flexible solution. Extensive simulated and real-robot experiments demonstrate that the proposed pipeline can select more antipodal and stable grasp poses and outperforms normal-based baselines in terms of the grasp success rate in both simulated and physical scenarios. △ Less

Submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted to Conference on Robot Learning (CoRL) 2022. Supplementary materials: https://openreview.net/forum?id=SrSCqW4dq9

arXiv:2209.06397 [pdf, other]

Federated Learning based on Defending Against Data Poisoning Attacks in IoT

Authors: Jiayin Li, Wenzhong Guo, Xingshuo Han, Jian** Cai, Ximeng Liu

Abstract: The rapidly expanding number of Internet of Things (IoT) devices is generating huge quantities of data, but the data privacy and security exposure in IoT devices, especially in the automatic driving system. Federated learning (FL) is a paradigm that addresses data privacy, security, access rights, and access to heterogeneous message issues by integrating a global model based on distributed nodes.… ▽ More The rapidly expanding number of Internet of Things (IoT) devices is generating huge quantities of data, but the data privacy and security exposure in IoT devices, especially in the automatic driving system. Federated learning (FL) is a paradigm that addresses data privacy, security, access rights, and access to heterogeneous message issues by integrating a global model based on distributed nodes. However, data poisoning attacks on FL can undermine the benefits, destroying the global model's availability and disrupting model training. To avoid the above issues, we build up a hierarchical defense data poisoning (HDDP) system framework to defend against data poisoning attacks in FL, which monitors each local model of individual nodes via abnormal detection to remove the malicious clients. Whether the poisoning defense server has a trusted test dataset, we design the \underline{l}ocal \underline{m}odel \underline{t}est \underline{v}oting (LMTV) and \underline{k}ullback-\underline{l}eibler divergence \underline{a}nomaly parameters \underline{d}etection (KLAD) algorithms to defend against label-flip** poisoning attacks. Specifically, the trusted test dataset is utilized to obtain the evaluation results for each classification to recognize the malicious clients in LMTV. More importantly, we adopt the kullback leibler divergence to measure the similarity between local models without the trusted test dataset in KLAD. Finally, through extensive evaluations and against the various label-flip** poisoning attacks, LMTV and KLAD algorithms could achieve the $100\%$ and $40\%$ to $85\%$ successful defense ratios under different detection situations. △ Less

Submitted 13 September, 2022; originally announced September 2022.

arXiv:2209.04786 [pdf, other]

Tensor Completion via Tensor Train Based Low-Rank Quotient Geometry under a Preconditioned Metric

Authors: Jian-Feng Cai, Wen Huang, Haifeng Wang, Ke Wei

Abstract: This paper investigates the low-rank tensor completion problem, which is about recovering a tensor from partially observed entries. We consider this problem in the tensor train format and extend the preconditioned metric from the matrix case to the tensor case. The first-order and second-order quotient geometry of the manifold of fixed tensor train rank tensors under this metric is studied in deta… ▽ More This paper investigates the low-rank tensor completion problem, which is about recovering a tensor from partially observed entries. We consider this problem in the tensor train format and extend the preconditioned metric from the matrix case to the tensor case. The first-order and second-order quotient geometry of the manifold of fixed tensor train rank tensors under this metric is studied in detail. Algorithms, including Riemannian gradient descent, Riemannian conjugate gradient, and Riemannian Gauss-Newton, have been proposed for the tensor completion problem based on the quotient geometry. It has also been shown that the Riemannian Gauss-Newton method on the quotient geometry is equivalent to the Riemannian Gauss-Newton method on the embedded geometry with a specific retraction. Empirical evaluations on random instances as well as on function-related tensors show that the proposed algorithms are competitive with other existing algorithms in terms of recovery ability, convergence performance, and reconstruction quality. △ Less

Submitted 18 April, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: The manuscript has been adjusted in several places

arXiv:2209.01963 [pdf, other]

Modeling User Repeat Consumption Behavior for Online Novel Recommendation

Authors: Yuncong Li, Cunxiang Yin, Yancheng He, Guoqiang Xu, **g Cai, Leeven Luo, Sheng-hua Zhong

Abstract: Given a user's historical interaction sequence, online novel recommendation suggests the next novel the user may be interested in. Online novel recommendation is important but underexplored. In this paper, we concentrate on recommending online novels to new users of an online novel reading platform, whose first visits to the platform occurred in the last seven days. We have two observations about… ▽ More Given a user's historical interaction sequence, online novel recommendation suggests the next novel the user may be interested in. Online novel recommendation is important but underexplored. In this paper, we concentrate on recommending online novels to new users of an online novel reading platform, whose first visits to the platform occurred in the last seven days. We have two observations about online novel recommendation for new users. First, repeat novel consumption of new users is a common phenomenon. Second, interactions between users and novels are informative. To accurately predict whether a user will reconsume a novel, it is crucial to characterize each interaction at a fine-grained level. Based on these two observations, we propose a neural network for online novel recommendation, called NovelNet. NovelNet can recommend the next novel from both the user's consumed novels and new novels simultaneously. Specifically, an interaction encoder is used to obtain accurate interaction representation considering fine-grained attributes of interaction, and a pointer network with a pointwise loss is incorporated into NovelNet to recommend previously-consumed novels. Moreover, an online novel recommendation dataset is built from a well-known online novel reading platform and is released for public use as a benchmark. Experimental results on the dataset demonstrate the effectiveness of NovelNet. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: RecSys 2022

arXiv:2208.14301 [pdf, other]

Reynolds Stress Anisotropy Tensor Predictions for Turbulent Channel Flow using Neural Networks

Authors: Jiayi Cai, Pierre-Emmanuel Angeli, Jean-Marc Martinez, Guillaume Damblin, Didier Lucor

Abstract: The Reynolds-Averaged Navier-Stokes (RANS) approach remains a backbone for turbulence modeling due to its high cost-effectiveness. Its accuracy is largely based on a reliable Reynolds stress anisotropy tensor closure model. There has been an amount of work aiming at improving traditional closure models, while they are still not satisfactory to some complex flow configurations. In recent years, adv… ▽ More The Reynolds-Averaged Navier-Stokes (RANS) approach remains a backbone for turbulence modeling due to its high cost-effectiveness. Its accuracy is largely based on a reliable Reynolds stress anisotropy tensor closure model. There has been an amount of work aiming at improving traditional closure models, while they are still not satisfactory to some complex flow configurations. In recent years, advances in computing power have opened up a new way to address this problem: the machine-learning-assisted turbulence modeling. In this paper, we employ neural networks to fully predict the Reynolds stress anisotropy tensor of turbulent channel flows at different friction Reynolds numbers, for both interpolation and extrapolation scenarios. Several generic neural networks of Multi-Layer Perceptron (MLP) type are trained with different input feature combinations to acquire a complete grasp of the role of each parameter. The best performance is yielded by the model with the dimensionless mean streamwise velocity gradient $α$, the dimensionless wall distance $y^+$ and the friction Reynolds number $\mathrm{Re}_τ$ as inputs. A deeper theoretical insight into the Tensor Basis Neural Network (TBNN) clarifies some remaining ambiguities found in the literature concerning its application of Pope's general eddy viscosity model. We emphasize the sensitivity of the TBNN on the constant tensor $\textbf{T}^{*(0)}$ upon the turbulent channel flow data set, and newly propose a generalized $\textbf{T}^{*(0)}$, which considerably enhances its performance. Through comparison between the MLP and the augmented TBNN model with both $\{α, y^+, \mathrm{Re}_τ\}$ as input set, it is concluded that the former outperforms the latter and provides excellent interpolation and extrapolation predictions of the Reynolds stress anisotropy tensor in the specific case of turbulent channel flow. △ Less

Submitted 30 August, 2022; originally announced August 2022.

Comments: 35 pages, 10 figures

arXiv:2208.13202 [pdf, ps, other]

Mechanical Anisotropy and Multiple Direction-Dependent Dirac States in the Synthesized Ag3C20 Monolayer

Authors: Zhiheng Ly, Ningning Jia, Jiangtao Cai, Jijun Zhao, Zhifeng Liu

Abstract: Recently, a 2D orthorhombic silver-organic framework, Ag3C20 monolayer, was synthesized by assembling organic molecules linked with multiple aryl-metal bonds. Herein, via first-principles study, we demonstrate that owing to the unique bonding feature, Ag3C20 monolayer not only exhibits strong mechanical anisotropy, but also possesses various tunable direction-dependent Dirac states. Around the Fer… ▽ More Recently, a 2D orthorhombic silver-organic framework, Ag3C20 monolayer, was synthesized by assembling organic molecules linked with multiple aryl-metal bonds. Herein, via first-principles study, we demonstrate that owing to the unique bonding feature, Ag3C20 monolayer not only exhibits strong mechanical anisotropy, but also possesses various tunable direction-dependent Dirac states. Around the Fermi level blow, the intrinsic Dirac points form two antiparallel quasi type-III nodal lines protected by mirror symmetry, which can further evolve into hybrid nodal loops under tiny strains. Intriguingly, near the Fermi level above, a special semi-Dirac state can emerge under a critical strain by merging two type-I Dirac cone, which harbors direction-dependent strongly localized fermions, normal massive carries, and ultrafast Dirac fermions at the same time. These findings suggest that the mechanically sensitive Ag3C20 monolayer is a promising 2D material to realize the interesting Dirac physics and highly anisotropic multiple carries transport. △ Less

Submitted 28 August, 2022; originally announced August 2022.

arXiv:2208.10861 [pdf, other]

FocusFormer: Focusing on What We Need via Architecture Sampler

Authors: **g Liu, Jianfei Cai, Bohan Zhuang

Abstract: Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope… ▽ More Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5% in terms of the Top-1 accuracy. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: Tech report

arXiv:2208.10828 [pdf]

doi 10.1038/s41467-022-33451-1

Wien effect in interfacial water dissociation through proton-permeable graphene electrodes

Authors: J. Cai, E. Griffin, V. Guarochico-Moreira, D. Barry, B. Xin, M. Yagmurcukardes, S. Zhang, A. K. Geim, F. M. Peeters, M. Lozada-Hidalgo

Abstract: Strong electric fields can accelerate molecular dissociation reactions. The phenomenon known as the Wien effect was previously observed using high-voltage electrolysis cells that produced fields of about 10^7 V m-1, sufficient to accelerate the dissociation of weakly bound molecules (e.g., organics and weak electrolytes). The observation of the Wien effect for the common case of water dissociation… ▽ More Strong electric fields can accelerate molecular dissociation reactions. The phenomenon known as the Wien effect was previously observed using high-voltage electrolysis cells that produced fields of about 10^7 V m-1, sufficient to accelerate the dissociation of weakly bound molecules (e.g., organics and weak electrolytes). The observation of the Wien effect for the common case of water dissociation (H2O = H+ + OH-) has remained elusive. Here we study the dissociation of interfacial water adjacent to proton-permeable graphene electrodes and observe strong acceleration of the reaction in fields reaching above 10^8 V m-1. The use of graphene electrodes allow measuring the proton currents arising exclusively from the dissociation of interfacial water, while the electric field driving the reaction is monitored through the carrier density induced in graphene by the same field. The observed exponential increase in proton currents is in quantitative agreement with Onsager's theory. Our results also demonstrate that graphene electrodes can be valuable for the investigation of various interfacial phenomena involving proton transport. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Report number: 13, 5776

Journal ref: Nature Communications (2022)

Showing 201–250 of 797 results for author: Cai, J