Search | arXiv e-print repository

doi 10.3847/2041-8213/ac96e7

Detecting the oscillation and propagation of the nascent dynamic solar wind structure at 2.6 solar radii using VLBI radio telescopes

Authors: Maoli Ma, Guifre Molera Calves, Giuseppe Cimo, Ming Xiong, Peijia Li, **g Kong, Pei** Zhang, Jiansen He, Lijia Liu, Pradyumna Kummamuru, Chuanpeng Hou, Jasper Edwards, Qinghui Liu, Zhong Chen, Zhanghu Chu, De Wu, Xu Zhao, Zhichao Wang, Songtao Han Quanquan Zhi, Yingkai Liu, Jonathan Quick, Javier Gonzalez, Cristina Garcia Miro, Mikhail Kharinov, Andrey Mikhailov , et al. (7 additional authors not shown)

Abstract: Probing the solar corona is crucial to study the coronal heating and solar wind acceleration. However, the transient and inhomogeneous solar wind flows carry large-amplitude inherent Alfven waves and turbulence, which make detection more difficult. We report the oscillation and propagation of the solar wind at 2.6 solar radii (Rs) by observation of China Tianwen and ESA Mars Express with radio tel… ▽ More Probing the solar corona is crucial to study the coronal heating and solar wind acceleration. However, the transient and inhomogeneous solar wind flows carry large-amplitude inherent Alfven waves and turbulence, which make detection more difficult. We report the oscillation and propagation of the solar wind at 2.6 solar radii (Rs) by observation of China Tianwen and ESA Mars Express with radio telescopes. The observations were carried out on Oct.9 2021, when one coronal mass ejection (CME) passed across the ray paths of the telescope beams. We obtain the frequency fluctuations (FF) of the spacecraft signals from each individual telescope. Firstly, we visually identify the drift of the frequency spikes at a high spatial resolution of thousands of kilometers along the projected baselines. They are used as traces to estimate the solar wind velocity. Then we perform the cross-correlation analysis on the time series of FF from different telescopes. The velocity variations of solar wind structure along radial and tangential directions during the CME passage are obtained. The oscillation of tangential velocity confirms the detection of streamer wave. Moreover, at the tail of the CME, we detect the propagation of an accelerating fast field-aligned density structure indicating the presence of magnetohydrodynamic waves. This study confirm that the ground station-pairs are able to form particular spatial projection baselines with high resolution and sensitivity to study the detailed propagation of the nascent dynamic solar wind structure. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 13 pages, 9 figures

arXiv:2210.06777 [pdf, ps, other]

Stability of graph pairs involving vertex-transitive graphs

Authors: Yan-Li Qin, Binzhou Xia, Sanming Zhou

Abstract: A pair of graphs $(Γ,Σ)$ is said to be stable if the full automorphism group of $Γ\timesΣ$ is isomorphic to the product of the full automorphism groups of $Γ$ and $Σ$ and unstable otherwise, where $Γ\timesΣ$ is the direct product of $Γ$ and $Σ$. In this paper, we reduce the study of the stability of any pair of regular graphs $(Γ,Σ)$ with coprime valencies and vertex-transitive $Σ$ to that of… ▽ More A pair of graphs $(Γ,Σ)$ is said to be stable if the full automorphism group of $Γ\timesΣ$ is isomorphic to the product of the full automorphism groups of $Γ$ and $Σ$ and unstable otherwise, where $Γ\timesΣ$ is the direct product of $Γ$ and $Σ$. In this paper, we reduce the study of the stability of any pair of regular graphs $(Γ,Σ)$ with coprime valencies and vertex-transitive $Σ$ to that of $(Γ,K_2)$. Since the latter is well studied in the literature, this enables us to determine the stability of any pair of regular graphs $(Γ,Σ)$ with coprime valencies in the case when $Σ$ is vertex-transitve and the stability of $(Γ,K_2)$ is known. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: 9 pages

arXiv:2210.06311 [pdf, other]

Semantic Cross Attention for Few-shot Learning

Authors: Bin Xiao, Chien-Liang Liu, Wen-Hoar Hsaio

Abstract: Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification probl… ▽ More Few-shot learning (FSL) has attracted considerable attention recently. Among existing approaches, the metric-based method aims to train an embedding network that can make similar samples close while dissimilar samples as far as possible and achieves promising results. FSL is characterized by using only a few images to train a model that can generalize to novel classes in image classification problems, but this setting makes it difficult to learn the visual features that can identify the images' appearance variations. The model training is likely to move in the wrong direction, as the images in an identical semantic class may have dissimilar appearances, whereas the images in different semantic classes may share a similar appearance. We argue that FSL can benefit from additional semantic features to learn discriminative feature representations. Thus, this study proposes a multi-task learning approach to view semantic features of label text as an auxiliary task to help boost the performance of the FSL task. Our proposed model uses word-embedding representations as semantic features to help train the embedding network and a semantic cross-attention module to bridge the semantic features into the typical visual modal. The proposed approach is simple, but produces excellent results. We apply our proposed approach to two previous metric-based FSL methods, all of which can substantially improve performance. The source code for our model is accessible from github. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: ACML2022

arXiv:2210.02936 [pdf, other]

doi 10.1103/PhysRevLett.131.073401

Functional building blocks for scalable multipartite entanglement in optical lattices

Authors: Wei-Yong Zhang, Ming-Gen He, Hui Sun, Yong-Guang Zheng, Ying Liu, An Luo, Han-Yi Wang, Zi-Hang Zhu, Pei-Yue Qiu, Ying-Chao Shen, Xuan-Kai Wang, Wan Lin, Song-Tao Yu, Bin-Chen Li, Bo Xiao, Meng-Da Li, Yu-Meng Yang, Xiao Jiang, Han-Ning Dai, You Zhou, Xiongfeng Ma, Zhen-Sheng Yuan, Jian-Wei Pan

Abstract: Featuring excellent coherence and operated parallelly, ultracold atoms in optical lattices form a competitive candidate for quantum computation. For this, a massive number of parallel entangled atom pairs have been realized in superlattices. However, the more formidable challenge is to scale-up and detect multipartite entanglement due to the lack of manipulations over local atomic spins in retro-r… ▽ More Featuring excellent coherence and operated parallelly, ultracold atoms in optical lattices form a competitive candidate for quantum computation. For this, a massive number of parallel entangled atom pairs have been realized in superlattices. However, the more formidable challenge is to scale-up and detect multipartite entanglement due to the lack of manipulations over local atomic spins in retro-reflected bichromatic superlattices. Here we developed a new architecture based on a cross-angle spin-dependent superlattice for implementing layers of quantum gates over moderately-separated atoms incorporated with a quantum gas microscope for single-atom manipulation. We created and verified functional building blocks for scalable multipartite entanglement by connecting Bell pairs to one-dimensional 10-atom chains and two-dimensional plaquettes of $2\times4$ atoms. This offers a new platform towards scalable quantum computation and simulation. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Journal ref: Phys. Rev. Lett. 131, 073401 (2023)

arXiv:2210.00405 [pdf, other]

Basic Binary Convolution Unit for Binarized Image Restoration Network

Authors: Bin Xia, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Radu Timofte, Luc Van Gool

Abstract: Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use… ▽ More Lighter and faster image restoration (IR) models are crucial for the deployment on resource-limited devices. Binary neural network (BNN), one of the most promising model compression methods, can dramatically reduce the computations and parameters of full-precision convolutional neural networks (CNN). However, there are different properties between BNN and full-precision CNN, and we can hardly use the experience of designing CNN to develop BNN. In this study, we reconsider components in binary convolution, such as residual connection, BatchNorm, activation function, and structure, for IR tasks. We conduct systematic analyses to explain each component's role in binary convolution and discuss the pitfalls. Specifically, we find that residual connection can reduce the information loss caused by binarization; BatchNorm can solve the value range gap between residual connection and binary convolution; The position of the activation function dramatically affects the performance of BNN. Based on our findings and analyses, we design a simple yet efficient basic binary convolution unit (BBCU). Furthermore, we divide IR networks into four parts and specially design variants of BBCU for each part to explore the benefit of binarizing these parts. We conduct experiments on different IR tasks, and our BBCU significantly outperforms other BNNs and lightweight models, which shows that BBCU can serve as a basic unit for binarized IR networks. All codes and models will be released. △ Less

Submitted 16 February, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

Comments: ICLR2023, code is available at https://github.com/Zj-BinXia/BBCU

arXiv:2209.10565 [pdf, other]

Extracting Off-Diagonal Order from Diagonal Basis Measurements

Authors: Bo Xiao, Javier Robledo Moreno, Matthew Fishman, Dries Sels, Ehsan Khatami, Richard Scalettar

Abstract: Quantum gas microscopy has developed into a powerful tool to explore strongly correlated quantum systems. However, discerning phases with topological or off-diagonal long range order requires the ability to extract these correlations from site-resolved measurements. Here, we show that a multi-scale complexity measure can pinpoint the transition to and from the bond ordered wave phase of the one-di… ▽ More Quantum gas microscopy has developed into a powerful tool to explore strongly correlated quantum systems. However, discerning phases with topological or off-diagonal long range order requires the ability to extract these correlations from site-resolved measurements. Here, we show that a multi-scale complexity measure can pinpoint the transition to and from the bond ordered wave phase of the one-dimensional extended Hubbard model with an off-diagonal order parameter, sandwiched between diagonal charge and spin density wave phases, using only diagonal descriptors. We study the model directly in the thermodynamic limit using the recently developed variational uniform matrix product states algorithm, and draw our samples from degenerate ground states related by global spin rotations, emulating the projective measurements that are accessible in experiments. Our results will have important implications for the study of exotic phases using optical lattice experiments. △ Less

Submitted 21 September, 2022; originally announced September 2022.

Comments: 11 pages including supplementary materials, 7 figures

arXiv:2209.09078 [pdf, other]

NIERT: Accurate Numerical Interpolation through Unifying Scattered Data Representations using Transformer Encoder

Authors: Shizhe Ding, Boyang Xia, Milong Ren, Dongbo Bu

Abstract: Interpolation for scattered data is a classical problem in numerical analysis, with a long history of theoretical and practical contributions. Recent advances have utilized deep neural networks to construct interpolators, exhibiting excellent and generalizable performance. However, they still fall short in two aspects: \textbf{1) inadequate representation learning}, resulting from separate embeddi… ▽ More Interpolation for scattered data is a classical problem in numerical analysis, with a long history of theoretical and practical contributions. Recent advances have utilized deep neural networks to construct interpolators, exhibiting excellent and generalizable performance. However, they still fall short in two aspects: \textbf{1) inadequate representation learning}, resulting from separate embeddings of observed and target points in popular encoder-decoder frameworks and \textbf{2) limited generalization power}, caused by overlooking prior interpolation knowledge shared across different domains. To overcome these limitations, we present a \textbf{N}umerical \textbf{I}nterpolation approach using \textbf{E}ncoder \textbf{R}epresentation of \textbf{T}ransformers (called \textbf{NIERT}). On one hand, NIERT utilizes an encoder-only framework rather than the encoder-decoder structure. This way, NIERT can embed observed and target points into a unified encoder representation space, thus effectively exploiting the correlations among them and obtaining more precise representations. On the other hand, we propose to pre-train NIERT on large-scale synthetic mathematical functions to acquire prior interpolation knowledge, and transfer it to multiple interpolation domains with consistent performance gain. On both synthetic and real-world datasets, NIERT outperforms the existing approaches by a large margin, i.e., 4.3$\sim$14.3$\times$ lower MAE on TFRD subsets, and 1.7/1.8/8.7$\times$ lower MSE on Mathit/PhysioNet/PTV datasets. The source code of NIERT is available at https://github.com/DingShizhe/NIERT. △ Less

Submitted 14 March, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: 13 pages, 9 figures

MSC Class: 68T07; 65D05 (Primary) 68T05 (Secondary) ACM Class: I.2.6; G.1.1

arXiv:2209.08455 [pdf, other]

TODE-Trans: Transparent Object Depth Estimation with Transformer

Authors: Kang Chen, Shaochen Wang, Beihao Xia, Dongxu Li, Zhen Kan, Bin Li

Abstract: Transparent objects are widely used in industrial automation and daily life. However, robust visual recognition and perception of transparent objects have always been a major challenge. Currently, most commercial-grade depth cameras are still not good at sensing the surfaces of transparent objects due to the refraction and reflection of light. In this work, we present a transformer-based transpare… ▽ More Transparent objects are widely used in industrial automation and daily life. However, robust visual recognition and perception of transparent objects have always been a major challenge. Currently, most commercial-grade depth cameras are still not good at sensing the surfaces of transparent objects due to the refraction and reflection of light. In this work, we present a transformer-based transparent object depth estimation approach from a single RGB-D input. We observe that the global characteristics of the transformer make it easier to extract contextual information to perform depth estimation of transparent areas. In addition, to better enhance the fine-grained features, a feature fusion module (FFM) is designed to assist coherent prediction. Our empirical evidence demonstrates that our model delivers significant improvements in recent popular datasets, e.g., 25% gain on RMSE and 21% gain on REL compared to previous state-of-the-art convolutional-based counterparts in ClearGrasp dataset. Extensive results show that our transformer-based model enables better aggregation of the object's RGB and inaccurate depth information to obtain a better depth representation. Our code and the pre-trained model will be available at https://github.com/yuchendoudou/TODE. △ Less

Submitted 17 September, 2022; originally announced September 2022.

Comments: Submitted to ICRA2023

arXiv:2209.04271 [pdf, ps, other]

Factorizations of almost simple orthogonal groups of plus type

Authors: Cai Heng Li, Lei Wang, Binzhou Xia

Abstract: This is the fifth one in a series of papers classifying the factorizations of almost simple groups with nonsolvable factors. In this paper we deal with orthogonal groups of plus type. This is the fifth one in a series of papers classifying the factorizations of almost simple groups with nonsolvable factors. In this paper we deal with orthogonal groups of plus type. △ Less

Submitted 9 September, 2022; originally announced September 2022.

Comments: arXiv admin note: text overlap with arXiv:2109.05459, arXiv:2107.09630

arXiv:2209.03067 [pdf, other]

doi 10.3847/1538-4365/ac9127

A Q-band line survey towards Orion KL using the Tianma radio telescope

Authors: Xunchuan Liu, Tie Liu, Zhiqiang Shen, Sheng-Li Qin, Qiuyi Luo, Yu Cheng, Qilao Gu, Tianwei Zhang, Fengyao Zhu, Sheng-Yuan Liu, Xing Lu, Rongbing Zhao, Weiye Zhong, Yajun Wu, Juan Li, Zhang Zhao, **qing Wang, Qinghui Liu, Bo Xia, Bin Li, Li Fu, Zhen Yan, Chao Zhang, Lingling Wang, Qian Ye , et al. (7 additional authors not shown)

Abstract: We have conducted a line survey towards Orion KL using the Q-band receiver of Tianma 65 m radio telescope (TMRT), covering 34.8--50 GHz with a velocity resolution between 0.79 km s$^{-1}$ and 0.55 km s$^{-1}$ respectively. The observations reach a sensitivity on the level of 1-8 mK, proving that the TMRT is sensitive for conducting deep line surveys. In total, 597 Gaussian features are extracted.… ▽ More We have conducted a line survey towards Orion KL using the Q-band receiver of Tianma 65 m radio telescope (TMRT), covering 34.8--50 GHz with a velocity resolution between 0.79 km s$^{-1}$ and 0.55 km s$^{-1}$ respectively. The observations reach a sensitivity on the level of 1-8 mK, proving that the TMRT is sensitive for conducting deep line surveys. In total, 597 Gaussian features are extracted. Among them, 177 radio recombination lines (RRLs) are identified, including 126, 40 and 11 RRLs of hydrogen, helium and carbon, with a maximum $Δn$ of 16, 7, and 3, respectively. The carbon RRLs are confirmed to originate from photodissociation regions with a $V_{\rm LSR}\sim$9 km s$^{-1}$. In addition, 371 molecular transitions of 53 molecular species are identified. Twenty-one molecular species of this survey were not firmly detected in the Q band by Rizzo et al. (2017), including species such as H$_2$CS, HCOOH, C$_2$H$_5$OH, H$_2^{13}$CO, H$_2$CCO, CH$_3$CHO, CH$_2$OCH$_2$, HCN $v_2=1$, and CH$_3$OCHO $v_t=1$. In particular, the vibrationally excited states of ethyl cyanide (C$_2$H$_5$CN $v$13/$v$21) are for the first time firmly detected in the Q band. NH$_3$ (15,15) and (16,16) are identified, and they are so far the highest transitions of the NH$_3$ inversion lines detected towards Orion KL. All the identified lines can be reproduced by a radiative transfer model. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 51 pages, 18 figures, accepted by ApJS

arXiv:2209.01531 [pdf, ps, other]

doi 10.1038/s41534-022-00609-0

A scheme to create and verify scalable entanglement in optical lattice

Authors: You Zhou, Bo Xiao, Meng-Da Li, Qi Zhao, Zhen-Sheng Yuan, Xiongfeng Ma, Jian-Wei Pan

Abstract: To achieve scalable quantum information processing, great efforts have been devoted to the creation of large-scale entangled states in various physical systems. Ultracold atom in optical lattice is considered as one of the promising platforms due to its feasible initialization and parallel manipulation. In this work, we propose an efficient scheme to generate and characterize global entanglement i… ▽ More To achieve scalable quantum information processing, great efforts have been devoted to the creation of large-scale entangled states in various physical systems. Ultracold atom in optical lattice is considered as one of the promising platforms due to its feasible initialization and parallel manipulation. In this work, we propose an efficient scheme to generate and characterize global entanglement in the optical lattice. With only two-layer quantum circuits, the generation utilizes two-qubit entangling gates based on the superexchange interaction in double wells. The parallelism of these operations enables the generation to be fast and scalable. To verify the entanglement of this non-stabilizer state, we mainly design three complementary detection protocols which are less resource-consuming compared to the full tomography. In particular, one just needs two homogenous local measurement settings to identify the entanglement property. Our entanglement generation and verification protocols provide the foundation for the further quantum information processing in optical lattice. △ Less

Submitted 4 September, 2022; originally announced September 2022.

Comments: 13 pages, 5 figures

Journal ref: npj Quantum Information volume 8, Article number: 99 (2022)

arXiv:2208.13068 [pdf, other]

Apiary: A DBMS-Integrated Transactional Function-as-a-Service Framework

Authors: Peter Kraft, Qian Li, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Danny Cho, Jason Li, Robert Redmond, Nathan Weckwerth, Brian Xia, Peter Bailis, Michael Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

Abstract: Developers increasingly use function-as-a-service (FaaS) platforms for data-centric applications that perform low-latency and transactional operations on data, such as for microservices or web serving. Unfortunately, existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in… ▽ More Developers increasingly use function-as-a-service (FaaS) platforms for data-centric applications that perform low-latency and transactional operations on data, such as for microservices or web serving. Unfortunately, existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in interactive transactions accessing remote storage. Physical separation harms performance while logical separation complicates efficiently providing transactional guarantees and fault tolerance. This paper introduces Apiary, a novel DBMS-integrated FaaS platform for deploying and composing fault-tolerant transactional functions. Apiary physically co-locates and logically integrates function execution and data management by wrap** a distributed DBMS engine and using it as a unified runtime for function execution, data management, and operational logging, thus providing similar or stronger transactional guarantees as comparable systems while greatly improving performance and observability. To allow developers to write complex stateful programs, we leverage this integration to enable efficient and fault-tolerant function composition, building a frontend for orchestrating workflows of functions with the guarantees that each workflow runs to completion and each function in a workflow executes exactly once. We evaluate Apiary against research and production FaaS platforms and show it outperforms them by 2--68x on microservice workloads by reducing communication overhead. △ Less

Submitted 30 June, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

Comments: 14 pages, 13 figures, 3 tables. Preprint

arXiv:2208.11764 [pdf]

Negative Interatomic Spring Constant Manifested by Topological Phonon Flat Band

Authors: Bowen Xia, Hang Liu, Feng Liu

Abstract: Phonons as bosons are different from electrons as fermions. Unlike interatomic electron hop** that can be either positive or negative and further tuned by spin-orbit coupling, interatomic spring constant is positive, or the structure of atomic lattices would be dynamically unstable. Surprisingly, we found that topological phonon flat bands (FBs) can manifest either a positive or negative interat… ▽ More Phonons as bosons are different from electrons as fermions. Unlike interatomic electron hop** that can be either positive or negative and further tuned by spin-orbit coupling, interatomic spring constant is positive, or the structure of atomic lattices would be dynamically unstable. Surprisingly, we found that topological phonon flat bands (FBs) can manifest either a positive or negative interatomic spring constant that couples the FB-modes of opposite chirality, as exemplified by first-principles calculations of a 2D material of Kagome-BN. To reveal its physical origin, we first establish a fundamental correspondence between a collective lattice-coupling (CLC) variable of two quasi-particle states (e.g., electronic states or phonon modes) of opposite parity in a periodic lattice with band topology. Topological semimetals arise with zero CLC at special k-points protected by symmetry; while positive and negative CLC at these k-points gives rise to normal and topological insulators, respectively. Then, we show topological FB has a special form of CLC that vanishes at all k-points as characterized by its real-space wave function, and multi-atom FB phonon mode can manifest effectively a negative interatomic spring constant. Our findings shed new light on our fundamental understanding of topology and provide a practical design principle for creating artificial bosonic topological states. △ Less

Submitted 2 September, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

arXiv:2208.09843 [pdf, other]

CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval

Authors: Haoran Wang, Dongliang He, Wenhao Wu, Boyang Xia, Min Yang, Fu Li, Yunlong Yu, Zhong Ji, Errui Ding, **gdong Wang

Abstract: Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities. Contrastive learning has been adopted by most prior arts. Except for limited amount of negative image-text pairs, the capability of constrastive learning is restricted by manually weighting negative pairs as well as unawareness of external knowledge. In this paper, we propose our novel Coupled Diversity-Sensitive… ▽ More Image-Text Retrieval (ITR) is challenging in bridging visual and lingual modalities. Contrastive learning has been adopted by most prior arts. Except for limited amount of negative image-text pairs, the capability of constrastive learning is restricted by manually weighting negative pairs as well as unawareness of external knowledge. In this paper, we propose our novel Coupled Diversity-Sensitive Momentum Constrastive Learning (CODER) for improving cross-modal representation. Firstly, a novel diversity-sensitive contrastive learning (DCL) architecture is invented. We introduce dynamic dictionaries for both modalities to enlarge the scale of image-text pairs, and diversity-sensitiveness is achieved by adaptive negative pair weighting. Furthermore, two branches are designed in CODER. One learns instance-level embeddings from image/text, and it also generates pseudo online clustering labels for its input image/text based on their embeddings. Meanwhile, the other branch learns to query from commonsense knowledge graph to form concept-level descriptors for both modalities. Afterwards, both branches leverage DCL to align the cross-modal embedding spaces while an extra pseudo clustering label prediction loss is utilized to promote concept-level representation learning for the second branch. Extensive experiments conducted on two popular benchmarks, i.e. MSCOCO and Flicker30K, validate CODER remarkably outperforms the state-of-the-art approaches. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: Accepted by ECCV 2022

arXiv:2208.09594 [pdf]

Transferable Cross-Tokamak Disruption Prediction with Deep Hybrid Neural Network Feature Extractor

Authors: Wei Zheng, Fengming Xue, Ming Zhang, Zhongyong Chen, Chengshuo Shen, Xinkun Ai, Nengchao Wang, Dalong Chen, Bihao Guo, Yonghua Ding, Zhipeng Chen, Zhoujun Yang, Biao Shen, Bingjia Xiao, Yuan Pan

Abstract: Predicting disruptions across different tokamaks is a great obstacle to overcome. Future tokamaks can hardly tolerate disruptions at high performance discharge. Few disruption discharges at high performance can hardly compose an abundant training set, which makes it difficult for current data-driven methods to obtain an acceptable result. A machine learning method capable of transferring a disrupt… ▽ More Predicting disruptions across different tokamaks is a great obstacle to overcome. Future tokamaks can hardly tolerate disruptions at high performance discharge. Few disruption discharges at high performance can hardly compose an abundant training set, which makes it difficult for current data-driven methods to obtain an acceptable result. A machine learning method capable of transferring a disruption prediction model trained on one tokamak to another is required to solve the problem. The key is a disruption prediction model containing a feature extractor that is able to extract common disruption precursor traces in tokamak diagnostic data, and a transferable disruption classifier. Based on the concerns above, the paper first presents a deep fusion feature extractor designed specifically for extracting disruption precursor features from common diagnostics on tokamaks according to currently known precursors of disruption, providing a promising foundation for transferable models. The fusion feature extractor is proved by comparing with manual feature extraction on J-TEXT. Based on the feature extractor trained on J-TEXT, the disruption prediction model was transferred to EAST data with mere 20 discharges from EAST experiment. The performance is comparable with a model trained with 1896 discharges from EAST. From the comparison among other model training scenarios, transfer learning showed its potential in predicting disruptions across different tokamaks. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.06031 [pdf, other]

doi 10.1109/GLOBECOM48099.2022.10001253

Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach

Authors: Bin Xiao, Murat Simsek, Burak Kantarci, Ala Abu Alkheir

Abstract: Due to the characteristics of Information and Communications Technology (ICT) products, the critical information of ICT devices is often summarized in big tabular data shared across supply chains. Therefore, it is critical to automatically interpret tabular structures with the surging amount of electronic assets. To transform the tabular data in electronic documents into a machine-interpretable fo… ▽ More Due to the characteristics of Information and Communications Technology (ICT) products, the critical information of ICT devices is often summarized in big tabular data shared across supply chains. Therefore, it is critical to automatically interpret tabular structures with the surging amount of electronic assets. To transform the tabular data in electronic documents into a machine-interpretable format and provide layout and semantic information for information extraction and interpretation, we define a Table Structure Recognition (TSR) task and a Table Cell Type Classification (CTC) task. We use a graph to represent complex table structures for the TSR task. Meanwhile, table cells are categorized into three groups based on their functional roles for the CTC task, namely Header, Attribute, and Data. Subsequently, we propose a multi-task model to solve the defined two tasks simultaneously by using the text modal and image modal features. Our experimental results show that our proposed method can outperform state-of-the-art methods on ICDAR2013 and UNLV datasets. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Comments: 6 pages, 7 tables, 4 figures, IEEE Global Communications Conference (Globecom), 2022

arXiv:2208.05101 [pdf]

Machine Learning with DBOS

Authors: Robert Redmond, Nathan W. Weckwerth, Brian S. Xia, Qian Li, Peter Kraft, Deeptaanshu Kumar, Çağatay Demiralp, Michael Stonebraker

Abstract: We recently proposed a new cluster operating system stack, DBOS, centered on a DBMS. DBOS enables unique support for ML applications by encapsulating ML code within stored procedures, centralizing ancillary ML data, providing security built into the underlying DBMS, co-locating ML code and data, and tracking data and workflow provenance. Here we demonstrate a subset of these benefits around two ML… ▽ More We recently proposed a new cluster operating system stack, DBOS, centered on a DBMS. DBOS enables unique support for ML applications by encapsulating ML code within stored procedures, centralizing ancillary ML data, providing security built into the underlying DBMS, co-locating ML code and data, and tracking data and workflow provenance. Here we demonstrate a subset of these benefits around two ML applications. We first show that image classification and object detection models using GPUs can be served as DBOS stored procedures with performance competitive to existing systems. We then present a 1D CNN trained to detect anomalies in HTTP requests on DBOS-backed web services, achieving SOTA results. We use this model to develop an interactive anomaly detection system and evaluate it through qualitative user feedback, demonstrating its usefulness as a proof of concept for future work to develop learned real-time security services on top of DBOS. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: AIDB@VLDB 2022

arXiv:2208.02728 [pdf, other]

doi 10.1103/PhysRevD.107.034508

Neural-network preconditioners for solving the Dirac equation in lattice gauge theory

Authors: Salvatore Calì, Daniel C. Hackett, Yin Lin, Phiala E. Shanahan, Brian Xiao

Abstract: This work develops neural-network--based preconditioners to accelerate solution of the Wilson-Dirac normal equation in lattice quantum field theories. The approach is implemented for the two-flavor lattice Schwinger model near the critical point. In this system, neural-network preconditioners are found to accelerate the convergence of the conjugate gradient solver compared with the solution of unp… ▽ More This work develops neural-network--based preconditioners to accelerate solution of the Wilson-Dirac normal equation in lattice quantum field theories. The approach is implemented for the two-flavor lattice Schwinger model near the critical point. In this system, neural-network preconditioners are found to accelerate the convergence of the conjugate gradient solver compared with the solution of unpreconditioned systems or those preconditioned with conventional approaches based on even-odd or incomplete Cholesky decompositions, as measured by reductions in the number of iterations and/or complex operations required for convergence. It is also shown that a preconditioner trained on ensembles with small lattice volumes can be used to construct preconditioners for ensembles with many times larger lattice volumes, with minimal degradation of performance. This volume-transferring technique amortizes the training cost and presents a pathway towards scaling such preconditioners to lattice field theory calculations with larger lattice volumes and in four dimensions. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 11 pages, 6 figures, and 2 tables

arXiv:2208.00887 [pdf, ps, other]

A solution to Babai's problem on digraphs with non-diagonalizable adjacency matrix

Authors: Yuxuan Li, Binzhou Xia, Sanming Zhou, Wenying Zhu

Abstract: The fact that the adjacency matrix of every finite graph is diagonalizable plays a fundamental role in spectral graph theory. Since this fact does not hold in general for digraphs, it is natural to ask whether it holds for digraphs with certain level of symmetry. Interest on this question dates back to early 1980s, when P.~J.~Cameron asked for the existence of arc-transitive digraphs with non-diag… ▽ More The fact that the adjacency matrix of every finite graph is diagonalizable plays a fundamental role in spectral graph theory. Since this fact does not hold in general for digraphs, it is natural to ask whether it holds for digraphs with certain level of symmetry. Interest on this question dates back to early 1980s, when P.~J.~Cameron asked for the existence of arc-transitive digraphs with non-diagonalizable adjacency matrix. This was answered in the affirmative by L.~Babai in 1985. Then Babai posed the open problem of constructing a 2-arc-transitive digraph and a vertex-primitive digraph whose adjacency matrices are not diagonalizable. In this paper, we solve Babai's problem by constructing an infinite family of $s$-arc-transitive digraphs for each integer $s\geq2$, and an infinite family of vertex-primitive digraphs, respectively, both of whose adjacency matrices are non-diagonalizable. △ Less

Submitted 12 September, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

arXiv:2207.13963 [pdf, other]

Meta-Learning based Degradation Representation for Blind Super-Resolution

Authors: Bin Xia, Yapeng Tian, Yulun Zhang, Yucheng Hang, Wenming Yang, Qingmin Liao

Abstract: The most of CNN based super-resolution (SR) methods assume that the degradation is known (\eg, bicubic). These methods will suffer a severe performance drop when the degradation is different from their assumption. Therefore, some approaches attempt to train SR networks with the complex combination of multiple degradations to cover the real degradation space. To adapt to multiple unknown degradatio… ▽ More The most of CNN based super-resolution (SR) methods assume that the degradation is known (\eg, bicubic). These methods will suffer a severe performance drop when the degradation is different from their assumption. Therefore, some approaches attempt to train SR networks with the complex combination of multiple degradations to cover the real degradation space. To adapt to multiple unknown degradations, introducing an explicit degradation estimator can actually facilitate SR performance. However, previous explicit degradation estimation methods usually predict Gaussian blur with the supervision of groundtruth blur kernels, and estimation errors may lead to SR failure. Thus, it is necessary to design a method that can extract implicit discriminative degradation representation. To this end, we propose a Meta-Learning based Region Degradation Aware SR Network (MRDA), including Meta-Learning Network (MLN), Degradation Extraction Network (DEN), and Region Degradation Aware SR Network (RDAN). To handle the lack of groundtruth degradation, we use the MLN to rapidly adapt to the specific complex degradation after several iterations and extract implicit degradation information. Subsequently, a teacher network MRDA$_{T}$ is designed to further utilize the degradation information extracted by MLN for SR. However, MLN requires iterating on paired low-resolution (LR) and corresponding high-resolution (HR) images, which is unavailable in the inference phase. Therefore, we adopt knowledge distillation (KD) to make the student network learn to directly extract the same implicit degradation representation (IDR) as the teacher from LR images. △ Less

Submitted 3 June, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: This paper is accepted by TIP 2023, and code will be released at https://github.com/Zj-BinXia/MRDA

arXiv:2207.13478 [pdf, other]

Partial Selfish Mining for More Profits

Authors: Jia** Yu, Shang Gao, Rui Song, Zhi** Cai, Bin Xiao

Abstract: Mining attacks aim to gain an unfair share of extra rewards in the blockchain mining. Selfish mining can preserve discovered blocks and strategically release them, wasting honest miners' computing resources and getting higher profits. Previous mining attacks either conceal the mined whole blocks (hiding or discarding), or release them completely in a particular time slot (e.g., causing a fork). In… ▽ More Mining attacks aim to gain an unfair share of extra rewards in the blockchain mining. Selfish mining can preserve discovered blocks and strategically release them, wasting honest miners' computing resources and getting higher profits. Previous mining attacks either conceal the mined whole blocks (hiding or discarding), or release them completely in a particular time slot (e.g., causing a fork). In this paper, we extend the mining attack's strategy space to partial block sharing, and propose a new and feasible Partial Selfish Mining (PSM) attack. We show that by releasing partial block data publicly and attracting rational miners to work on attacker's private branch, attackers and these attracted miners can gain an unfair share of mining rewards. We then propose Advanced PSM (A-PSM) attack that can further improve attackers' profits to be no less than the selfish mining. Both theoretical and experimental results show that PSM attackers can be more profitable than selfish miners under a certain range of mining power and network conditions. A-PSM attackers can gain even higher profits than both selfish mining and honest mining with attracted rational miners. △ Less

Submitted 6 April, 2024; v1 submitted 27 July, 2022; originally announced July 2022.

arXiv:2207.12661 [pdf, other]

Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training

Authors: Haoxuan You, Luowei Zhou, Bin Xiao, Noel Codella, Yu Cheng, Ruochen Xu, Shih-Fu Chang, Lu Yuan

Abstract: Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by map** multiple modalities into a shared embedding space. Typically, this has employed separate encoders for each modality. However, recent work suggests that transformers can support learning across multiple modalities and allow knowledge sharing. Insp… ▽ More Large-scale multi-modal contrastive pre-training has demonstrated great utility to learn transferable features for a range of downstream tasks by map** multiple modalities into a shared embedding space. Typically, this has employed separate encoders for each modality. However, recent work suggests that transformers can support learning across multiple modalities and allow knowledge sharing. Inspired by this, we investigate a variety of Modality-Shared Contrastive Language-Image Pre-training (MS-CLIP) frameworks. More specifically, we question how many parameters of a transformer model can be shared across modalities during contrastive pre-training, and rigorously examine architectural design choices that position the proportion of parameters shared along a spectrum. In studied conditions, we observe that a mostly unified encoder for vision and language signals outperforms all other variations that separate more parameters. Additionally, we find that light-weight modality-specific parallel modules further improve performance. Experimental results show that the proposed MS-CLIP approach outperforms vanilla CLIP by up to 13\% relative in zero-shot ImageNet classification (pre-trained on YFCC-100M), while simultaneously supporting a reduction of parameters. In addition, our approach outperforms vanilla CLIP by 1.6 points in linear probing on a collection of 24 downstream vision tasks. Furthermore, we discover that sharing parameters leads to semantic concepts from different modalities being encoded more closely in the embedding space, facilitating the transferring of common semantic structure (e.g., attention patterns) from language to vision. Code is available at \href{https://github.com/Hxyou/MSCLIP}{URL}. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022, 22 pages, 4 figures

arXiv:2207.10745 [pdf, other]

doi 10.1103/PhysRevC.107.014907

Measurement of $φ$-meson production in Cu$+$Au at $\sqrt{s_{_{NN}}}=200$ GeV and U$+$U at $\sqrt{s_{_{NN}}}=193$ GeV

Authors: N. J. Abdulameer, U. Acharya, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, J. Alexander, M. Alfred, M. Alibordi, K. Aoki, N. Apadula, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, X. Bai, B. Bannier, K. N. Barish, S. Bathe, V. Baublis, C. Baumann, S. Baumgart, A. Bazilevsky , et al. (387 additional authors not shown)

Abstract: The PHENIX experiment reports systematic measurements at the Relativistic Heavy Ion Collider of $φ$-meson production in asymmetric Cu$+$Au collisions at $\sqrt{s_{_{NN}}}$=200 GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}$=193 GeV. Measurements were performed via the $φ\rightarrow K^{+}K^{-}$ decay channel at midrapidity $|η|<0.35$. Features of $φ$-meson production measured in Cu$+$Cu, Cu$+$Au,… ▽ More The PHENIX experiment reports systematic measurements at the Relativistic Heavy Ion Collider of $φ$-meson production in asymmetric Cu$+$Au collisions at $\sqrt{s_{_{NN}}}$=200 GeV and in U$+$U collisions at $\sqrt{s_{_{NN}}}$=193 GeV. Measurements were performed via the $φ\rightarrow K^{+}K^{-}$ decay channel at midrapidity $|η|<0.35$. Features of $φ$-meson production measured in Cu$+$Cu, Cu$+$Au, Au$+$Au, and U$+$U collisions were found to not depend on the collision geometry, which was expected because the yields are averaged over the azimuthal angle and follow the expected scaling with nuclear-overlap size. The elliptic flow of the $φ$ meson in Cu$+$Au, Au$+$Au, and U$+$U collisions scales with second-order-participant eccentricity and the length scale of the nuclear-overlap region (estimated with the number of participating nucleons). At moderate $p_T$, $φ$-meson production measured in Cu$+$Au and U$+$U collisions is consistent with coalescence-model predictions, whereas at high $p_T$ the production is in agreement with expectations for in-medium energy loss of parent partons prior to their fragmentation. The elliptic flow for $φ$ mesons measured in Cu$+$Au and U$+$U collisions is well described by a (2+1)D viscous-hydrodynamic model with specific-shear viscosity $η/s=1/4π$. △ Less

Submitted 13 January, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 412 authors from 76 institutions, 16 pages, 12 figures, 9 tables, 2012 data. v2 is version accepted for publication by Physical Review C. HEPdata for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

Journal ref: Phys. Rev. C 107, 014907 (2023)

arXiv:2207.10666 [pdf, other]

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Authors: Kan Wu, **nian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

Abstract: Vision transformer (ViT) recently has drawn great attention in computer vision due to its remarkable model capability. However, most prevailing ViT models suffer from huge number of parameters, restricting their applicability on devices with limited resources. To alleviate this issue, we propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale dataset… ▽ More Vision transformer (ViT) recently has drawn great attention in computer vision due to its remarkable model capability. However, most prevailing ViT models suffer from huge number of parameters, restricting their applicability on devices with limited resources. To alleviate this issue, we propose TinyViT, a new family of tiny and efficient small vision transformers pretrained on large-scale datasets with our proposed fast distillation framework. The central idea is to transfer knowledge from large pretrained models to small ones, while enabling small models to get the dividends of massive pretraining data. More specifically, we apply distillation during pretraining for knowledge transfer. The logits of large teacher models are sparsified and stored in disk in advance to save the memory cost and computation overheads. The tiny student transformers are automatically scaled down from a large pretrained model with computation and parameter constraints. Comprehensive experiments demonstrate the efficacy of TinyViT. It achieves a top-1 accuracy of 84.8% on ImageNet-1k with only 21M parameters, being comparable to Swin-B pretrained on ImageNet-21k while using 4.2 times fewer parameters. Moreover, increasing image resolutions, TinyViT can reach 86.5% accuracy, being slightly better than Swin-L while using only 11% parameters. Last but not the least, we demonstrate a good transfer ability of TinyViT on various downstream tasks. Code and models are available at https://github.com/microsoft/Cream/tree/main/TinyViT. △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.10388 [pdf, other]

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Authors: Boyang Xia, Wenhao Wu, Haoran Wang, Rui Su, Dongliang He, Haosen Yang, Xiaoran Fan, Wanli Ouyang

Abstract: It is challenging for artificial intelligence systems to achieve accurate video recognition under the scenario of low computation costs. Adaptive inference based efficient video recognition methods typically preview videos and focus on salient parts to reduce computation costs. Most existing works focus on complex networks learning with video classification based objectives. Taking all frames as p… ▽ More It is challenging for artificial intelligence systems to achieve accurate video recognition under the scenario of low computation costs. Adaptive inference based efficient video recognition methods typically preview videos and focus on salient parts to reduce computation costs. Most existing works focus on complex networks learning with video classification based objectives. Taking all frames as positive samples, few of them pay attention to the discrimination between positive samples (salient frames) and negative samples (non-salient frames) in supervisions. To fill this gap, in this paper, we propose a novel Non-saliency Suppression Network (NSNet), which effectively suppresses the responses of non-salient frames. Specifically, on the frame level, effective pseudo labels that can distinguish between salient and non-salient frames are generated to guide the frame saliency learning. On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations. Saliency measurements from both two levels are combined for exploitation of multi-granularity complementary information. Extensive experiments conducted on four well-known benchmarks verify our NSNet not only achieves the state-of-the-art accuracy-efficiency trade-off but also present a significantly faster (2.4~4.3x) practical inference speed than state-of-the-art methods. Our project page is at https://lawrencexia2008.github.io/projects/nsnet . △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.10379 [pdf, other]

Temporal Saliency Query Network for Efficient Video Recognition

Authors: Boyang Xia, Zhihao Wang, Wenhao Wu, Haoran Wang, Jungong Han

Abstract: Efficient video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices. Most existing methods select the salient frames without awareness of the class-specific saliency scores, which neglect the implicit association between the saliency of frames and its belonging category. To alleviate this issue, we devise a novel Temporal Salienc… ▽ More Efficient video recognition is a hot-spot research topic with the explosive growth of multimedia data on the Internet and mobile devices. Most existing methods select the salient frames without awareness of the class-specific saliency scores, which neglect the implicit association between the saliency of frames and its belonging category. To alleviate this issue, we devise a novel Temporal Saliency Query (TSQ) mechanism, which introduces class-specific information to provide fine-grained cues for saliency measurement. Specifically, we model the class-specific saliency measuring process as a query-response task. For each category, the common pattern of it is employed as a query and the most salient frames are responded to it. Then, the calculated similarities are adopted as the frame saliency scores. To achieve it, we propose a Temporal Saliency Query Network (TSQNet) that includes two instantiations of the TSQ mechanism based on visual appearance similarities and textual event-object relations. Afterward, cross-modality interactions are imposed to promote the information exchange between them. Finally, we use the class-specific saliencies of the most confident categories generated by two modalities to perform the selection of salient frames. Extensive experiments demonstrate the effectiveness of our method by achieving state-of-the-art results on ActivityNet, FCVID and Mini-Kinetics datasets. Our project page is at https://lawrencexia2008.github.io/projects/tsqnet . △ Less

Submitted 21 July, 2022; originally announced July 2022.

Comments: Accepted by ECCV 2022

arXiv:2207.09389 [pdf, other]

Image Synthesis with Disentangled Attributes for Chest X-Ray Nodule Augmentation and Detection

Authors: Zhenrong Shen, Xi Ouyang, Bin Xiao, Jie-Zhi Cheng, Qian Wang, Dinggang Shen

Abstract: Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung… ▽ More Lung nodule detection in chest X-ray (CXR) images is common to early screening of lung cancers. Deep-learning-based Computer-Assisted Diagnosis (CAD) systems can support radiologists for nodule screening in CXR. However, it requires large-scale and diverse medical data with high-quality annotations to train such robust and accurate CADs. To alleviate the limited availability of such datasets, lung nodule synthesis methods are proposed for the sake of data augmentation. Nevertheless, previous methods lack the ability to generate nodules that are realistic with the size attribute desired by the detector. To address this issue, we introduce a novel lung nodule synthesis framework in this paper, which decomposes nodule attributes into three main aspects including shape, size, and texture, respectively. A GAN-based Shape Generator firstly models nodule shapes by generating diverse shape masks. The following Size Modulation then enables quantitative control on the diameters of the generated nodule shapes in pixel-level granularity. A coarse-to-fine gated convolutional Texture Generator finally synthesizes visually plausible nodule textures conditioned on the modulated shape masks. Moreover, we propose to synthesize nodule CXR images by controlling the disentangled nodule attributes for data augmentation, in order to better compensate for the nodules that are easily missed in the detection task. Our experiments demonstrate the enhanced image quality, diversity, and controllability of the proposed lung nodule synthesis framework. We also validate the effectiveness of our data augmentation on greatly improving nodule detection performance. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.09334 [pdf, other]

A Massively-Parallel 3D Simulator for Soft and Hybrid Robots

Authors: Joel Clay, Sofia Wyetzner, Alex Gaudio, Boxi Xia, Andrew Moshova, Jacob Austin, Max Segan, Hod Lipson

Abstract: Simulation is an important step in robotics for creating control policies and testing various physical parameters. Soft robotics is a field that presents unique physical challenges for simulating its subjects due to the nonlinearity of deformable material components along with other innovative, and often complex, physical properties. Because of the computational cost of simulating soft and heterog… ▽ More Simulation is an important step in robotics for creating control policies and testing various physical parameters. Soft robotics is a field that presents unique physical challenges for simulating its subjects due to the nonlinearity of deformable material components along with other innovative, and often complex, physical properties. Because of the computational cost of simulating soft and heterogeneous objects with traditional techniques, rigid robotics simulators are not well suited to simulating soft robots. Thus, many engineers must build their own one-off simulators tailored to their system, or use existing simulators with reduced performance. In order to facilitate the development of this exciting technology, this work presents an interactive-speed, accurate, and versatile simulator for a variety of types of soft robots. Cronos, our open-source 3D simulation engine, parallelizes a mass-spring model for ultra-fast performance on both deformable and rigid objects. Our approach is applicable to a wide array of nonlinear material configurations, including high deformability, volumetric actuation, or heterogenous stiffness. This versatility provides the ability to mix materials and geometric components freely within a single robot simulation. By exploiting the flexibility and scalability of nonlinear Hookean mass-spring systems, this framework simulates soft and rigid objects via a highly parallel model for near real-time speed. We describe an efficient GPU CUDA implementation, which we demonstrate to achieve computation of over 1 billion elements per second on consumer-grade GPU cards. Dynamic physical accuracy of the system is validated by comparing results to Euler-Bernoulli beam theory, natural frequency predictions, and empirical data of a soft structure under large deformation. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2207.09249 [pdf, other]

Fabric-GC: A Blockchain-based Gantt Chart System for Cross-organizational Project Management

Authors: Dun Li, Dezhi Han, Benhui Xia, Tien-Hsiung Weng, Arcangelo Castiglione, Kuan-Ching Li

Abstract: Large-scale production is always associated with more and more development and interaction among peers, and many fields achieve higher economic benefits through project cooperation. However, project managers in the traditional centralized approach cannot rearrange their activities to cross-organizational project management. Thanks to its characteristics, the Blockchain can represent a valid soluti… ▽ More Large-scale production is always associated with more and more development and interaction among peers, and many fields achieve higher economic benefits through project cooperation. However, project managers in the traditional centralized approach cannot rearrange their activities to cross-organizational project management. Thanks to its characteristics, the Blockchain can represent a valid solution to the problems mentioned above. In this article, we propose Fabric-GC, a Blockchain-based Gantt chart system. Fabric-GC enables to realize secure and effective cross-organizational cooperation for project management, providing access control to multiple parties for project visualization. Compared with other solutions, the proposed system is versatile, as it can be applied to project management in different fields and achieve effective and agile scheduling. Experimental results show that Fabric-GC achieves stable performance in large-scale request and processing distributed environments, where the data synchronization speed of the consortium chain reached four times faster than a public chain, achieving faster data consistency. △ Less

Submitted 19 July, 2022; originally announced July 2022.

arXiv:2206.07687 [pdf, other]

Structured Sparsity Learning for Efficient Video Super-Resolution

Authors: Bin Xia, **gwen He, Yulun Zhang, Yitong Wang, Yapeng Tian, Wenming Yang, Luc Van Gool

Abstract: The high computational costs of video super-resolution (VSR) models hinder their deployment on resource-limited devices, (e.g., smartphones and drones). Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties… ▽ More The high computational costs of video super-resolution (VSR) models hinder their deployment on resource-limited devices, (e.g., smartphones and drones). Existing VSR models contain considerable redundant filters, which drag down the inference efficiency. To prune these unimportant filters, we develop a structured pruning scheme called Structured Sparsity Learning (SSL) according to the properties of VSR. In SSL, we design pruning schemes for several key components in VSR models, including residual blocks, recurrent networks, and upsampling networks. Specifically, we develop a Residual Sparsity Connection (RSC) scheme for residual blocks of recurrent networks to liberate pruning restrictions and preserve the restoration information. For upsampling networks, we design a pixel-shuffle pruning scheme to guarantee the accuracy of feature channel-space conversion. In addition, we observe that pruning error would be amplified as the hidden states propagate along with recurrent networks. To alleviate the issue, we design Temporal Finetuning (TF). Extensive experiments show that SSL can significantly outperform recent methods quantitatively and qualitatively. △ Less

Submitted 25 March, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted by CVPR2023, code is available at https://github.com/Zj-BinXia/SSL

arXiv:2206.06952 [pdf, other]

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Authors: Bolun "Namir" Xia, Vipula D. Rawte, Mohammed J. Zaki, Aparna Gupta

Abstract: Unstructured data, especially text, continues to grow rapidly in various domains. In particular, in the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to cont… ▽ More Unstructured data, especially text, continues to grow rapidly in various domains. In particular, in the financial sphere, there is a wealth of accumulated unstructured financial data, such as the textual disclosure documents that companies submit on a regular basis to regulatory agencies, such as the Securities and Exchange Commission (SEC). These documents are typically very long and tend to contain valuable soft information about a company's performance. It is therefore of great interest to learn predictive models from these long textual documents, especially for forecasting numerical key performance indicators (KPIs). Whereas there has been a great progress in pre-trained language models (LMs) that learn from tremendously large corpora of textual data, they still struggle in terms of effective representations for long documents. Our work fills this critical need, namely how to develop better models to extract useful information from long textual documents and learn effective features that can leverage the soft financial and risk information for text regression (prediction) tasks. In this paper, we propose and implement a deep learning framework that splits long documents into chunks and utilizes pre-trained LMs to process and aggregate the chunks into vector representations, followed by self-attention to extract valuable document-level features. We evaluate our model on a collection of 10-K public disclosure reports from US banks, and another dataset of reports submitted by US companies. Overall, our framework outperforms strong baseline methods for textual modeling as well as a baseline regression model using only numerical data. Our work provides better insights into how utilizing pre-trained domain-specific and fine-tuned long-input LMs in representing long documents can improve the quality of representation of textual data, and therefore, help in improving predictive analyses. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: 10 pages, 9 figures, 7 tables

ACM Class: I.2.7

arXiv:2206.05908 [pdf, ps, other]

The smallest vertex-primitive $2$-arc-transitive digraph

Authors: Fu-Gang Yin, Yan-quan Feng, Binzhou Xia

Abstract: In 2017, Giudici, Li and the third author constructed the first known family of vertex-primitive $2$-arc-transitive digraphs of valency at least $2$. The smallest digraph in this family admits $\mathrm{PSL}_3(49)$ acting $2$-arc-transitively with vertex-stabilizer $\mathrm{A}_6$ and hence has $30758154560$ vertices. In this paper, we prove that this digraph is the vertex-primitive $2$-arc-transiti… ▽ More In 2017, Giudici, Li and the third author constructed the first known family of vertex-primitive $2$-arc-transitive digraphs of valency at least $2$. The smallest digraph in this family admits $\mathrm{PSL}_3(49)$ acting $2$-arc-transitively with vertex-stabilizer $\mathrm{A}_6$ and hence has $30758154560$ vertices. In this paper, we prove that this digraph is the vertex-primitive $2$-arc-transitive digraph of valency at least $2$ with fewest vertices. △ Less

Submitted 21 March, 2023; v1 submitted 13 June, 2022; originally announced June 2022.

arXiv:2206.04963 [pdf]

Elemental (im-)miscibility determines phase formation of multinary nanoparticles co-sputtered in ionic liquids

Authors: Michael Meischein, Alba Garzón-Manjón, Thomas Hammerschmidt, Bin Xiao, Siyuan Zhang, Lamya Abdellaoui, Christina Scheu, Alfred Ludwig

Abstract: Non-equilibrium synthesis methods allow to alloy bulk-immiscible elements into multinary nanoparticles, which broadens the design space for new materials. Whereas sputtering onto solid substrates can combine immiscible elements into thin film solid solutions, this is not clear for sputtering of nanoparticles in ionic liquids. Thus, the suitability of sputtering in ionic liquids for producing nanop… ▽ More Non-equilibrium synthesis methods allow to alloy bulk-immiscible elements into multinary nanoparticles, which broadens the design space for new materials. Whereas sputtering onto solid substrates can combine immiscible elements into thin film solid solutions, this is not clear for sputtering of nanoparticles in ionic liquids. Thus, the suitability of sputtering in ionic liquids for producing nanoparticles of immiscible elements is investigated by co-sputtering the systems Au-Cu (miscible), Au-Ru and Cu-Ru (both immiscible), and Au-Cu-Ru on the surface of the ionic liquid 1-butyl-3-methylimidazolium bis-trifluoromethylsulfonyl)imide [Bmim][(Tf)2N]. The sputtered nanoparticles were analyzed to obtain (i) knowledge concerning the general formation process of nanoparticles when sputtering onto ionic liquid surfaces and (ii) information, if alloy nanoparticles of immiscible elements can be synthesized as well as (iii) evidence if the Hume-Rothery rules for solid solubility are valid for sputtered nanoparticles. Accompanying atomistic simulations using density-functional theory for clusters of different size and ordering confirm that the miscibility of Au-Cu and the immiscibility of Au-Ru and Cu-Ru govern the thermodynamic stability of the nanoparticles. Based on the matching experimental and theoretical results for the NP/IL-systems concerning NP stability, a formation model of multinary NPs in ILs was developed. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2206.03387 [pdf, other]

Spatial Analysis of the Association between School Proximity and Crime in Philadelphia

Authors: Leonardo de Castro Harth, Bangxi Xiao, Shane T. Jensen

Abstract: We use high resolution data to investigate the association between crime incidence and proximity to different types of public schools over the past fifteen years in the city of Philadelphia. We employ two statistical methods, regression modeling and propensity score matching, in order to better isolate the association between crime and school proximity while controlling for the demographic, econom… ▽ More We use high resolution data to investigate the association between crime incidence and proximity to different types of public schools over the past fifteen years in the city of Philadelphia. We employ two statistical methods, regression modeling and propensity score matching, in order to better isolate the association between crime and school proximity while controlling for the demographic, economic, land use and disorder characteristics of the surrounding neighborhood. With both of these approaches, we find significantly increased crime incidence near to public schools regardless of crime outcome, educational level and time period. The effect of school proximity on crime varies substantially depending on whether or not school is in session, as well as between different types of crime and educational levels of the school. We see the largest effects of school proximity on crime for violent crimes near to high schools during their in-session time periods. Our results support several theories which suggest that crime should be elevated near to schools, as well as finding significant associations between crime and other aspects of the built environment. △ Less

Submitted 25 December, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.00184 [pdf, other]

How Much Demand Flexibility Could Have Spared Texas from the 2021 Outage?

Authors: Dongqi Wu, Xiangtian Zheng, Ali Menati, Lane Smith, Bainan Xia, Yixing Xu, Chanan Singh, Le Xie

Abstract: The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load r… ▽ More The February 2021 Texas winter power outage has led to hundreds of deaths and billions of dollars in economic losses, largely due to the generation failure and record-breaking electric demand. In this paper, we study the scaling-up of demand flexibility as a means to avoid load shedding during such an extreme weather event. The three mechanisms considered are interruptible load, residential load rationing, and incentive-based demand response. By simulating on a synthetic but realistic large-scale Texas grid model along with demand flexibility modeling and electricity outage data, we identify portfolios of mixing mechanisms that exactly avoid outages, which a single mechanism may fail due to decaying marginal effects. We also reveal a complementary relationship between interruptible load and residential load rationing and find nonlinear impacts of incentive-based demand response on the efficacy of other mechanisms. △ Less

Submitted 31 May, 2022; originally announced June 2022.

Comments: This paper has been submitted to a journal for review

arXiv:2205.13412 [pdf, other]

Physical-World Optical Adversarial Attacks on 3D Face Recognition

Authors: Yanjie Li, Yiquan Li, Xuelong Dai, Songtao Guo, Bin Xiao

Abstract: 2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as ou… ▽ More 2D face recognition has been proven insecure for physical adversarial attacks. However, few studies have investigated the possibility of attacking real-world 3D face recognition systems. 3D-printed attacks recently proposed cannot generate adversarial points in the air. In this paper, we attack 3D face recognition systems through elaborate optical noises. We took structured light 3D scanners as our attack target. End-to-end attack algorithms are designed to generate adversarial illumination for 3D faces through the inherent or an additional projector to produce adversarial points at arbitrary positions. Nevertheless, face reflectance is a complex procedure because the skin is translucent. To involve this projection-and-capture procedure in optimization loops, we model it by Lambertian rendering model and use SfSNet to estimate the albedo. Moreover, to improve the resistance to distance and angle changes while maintaining the perturbation unnoticeable, a 3D transform invariant loss and two kinds of sensitivity maps are introduced. Experiments are conducted in both simulated and physical worlds. We successfully attacked point-cloud-based and depth-image-based 3D face recognition algorithms while needing fewer perturbations than previous state-of-the-art physical-world 3D adversarial attacks. △ Less

Submitted 13 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

Comments: Submitted to CVPR 2023

arXiv:2205.08060 [pdf, ps, other]

doi 10.1103/PhysRevD.106.094015

Semi-inclusive Diffractive Deep Inelastic Scattering at Small-$x$

Authors: Yoshitaka Hatta, Bo-Wen Xiao, Feng Yuan

Abstract: Inspired by a recent study of Iancu, Mueller and Triantafyllopoulos [1] and earlier papers by Golec-Biernat and Wusthoff [2,3], we propose semi-inclusive diffractive deep inelastic scattering (SIDDIS) to investigate the gluon tomography in the nucleon and nuclei at small-$x$. The relevant diffractive quark and gluon parton distribution functions (DPDF) can be computed in terms of the color dipole… ▽ More Inspired by a recent study of Iancu, Mueller and Triantafyllopoulos [1] and earlier papers by Golec-Biernat and Wusthoff [2,3], we propose semi-inclusive diffractive deep inelastic scattering (SIDDIS) to investigate the gluon tomography in the nucleon and nuclei at small-$x$. The relevant diffractive quark and gluon parton distribution functions (DPDF) can be computed in terms of the color dipole S-matrices in the fundamental and adjoint representations, respectively. Novel correlations from the gluon tomography in the dipole S-matrix can be experimentally studied through the DPDFs in these processes at the future electron-ion collider (EIC). △ Less

Submitted 7 November, 2022; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: Added appendices, 10 pages, 5 figures

arXiv:2205.04113 [pdf, other]

doi 10.1109/ACCESS.2022.3232175

Damage Maximization for Combat Network with Limited Costs

Authors: **tao Yu, Bing Xiao, Yuzhu Cui

Abstract: Maximizing the damage by attacking specific nodes of the combat network can efficiently disrupt enemies' defense capability, protect our critical units, and enhance the resistance to the destruction of system-of-system~(SOS). However, the modeling of the combat network damage is not practical enough. In this paper, we report a more realistic model to study the combat network damage maximization pr… ▽ More Maximizing the damage by attacking specific nodes of the combat network can efficiently disrupt enemies' defense capability, protect our critical units, and enhance the resistance to the destruction of system-of-system~(SOS). However, the modeling of the combat network damage is not practical enough. In this paper, we report a more realistic model to study the combat network damage maximization problems. By analyzing realistic situations, the cost of damage is redefined based on the network topology and the functional characteristics of nodes. The damage effect is also updated according to the combat network topology and operational capability. Hence, a cost-limited damage maximization model for the combat network is constructed. In addition, to obtain optimal solutions, an improved genetic algorithm~(IPGA) based on prior information is proposed. As a result, our method has a significant advantage in the feasibility and effectiveness compared with other algorithms in experiments. The attack pattern of the combat network and the convergence and complexity of the proposed algorithm are further explored. The improved model and algorithm, as well as the mined attack patterns, can provide support for military decisions. △ Less

Submitted 23 December, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

arXiv:2205.04099 [pdf, other]

Robustness of double-layer group-dependent combat network with cascading failure

Authors: **tao Yu, Bing Xiao, Yuzhu Cui

Abstract: The networked combat system-of-system (CSOS) is the trend of combat development with the innovation of technology. The achievement of combat effectiveness requires CSOS to have a good ability to deal with external interference. Here we report a modeling method of CSOS from the perspective of complex networks and explore the robustness of the combat network based on this. Firstly, a more realistic… ▽ More The networked combat system-of-system (CSOS) is the trend of combat development with the innovation of technology. The achievement of combat effectiveness requires CSOS to have a good ability to deal with external interference. Here we report a modeling method of CSOS from the perspective of complex networks and explore the robustness of the combat network based on this. Firstly, a more realistic double-layer heterogeneous dependent combat network model is established. Then, the conditional group dependency situation is considered to design failure rules for dependent failure, and the coupling relation between the double-layer subnets is analyzed for cascading failure. Based on this, the initial load and capacity of the node are defined, respectively, as well as the load redistribution strategy and the status judgment rules for the cascading failure model. Simulation experiments are carried out by changing the attack modes and different parameters, and the results show that the robustness of the combat network can be effectively improved by improving the tolerance limit of one-way dependency of the functional net, the node capacity of the functional subnet and the tolerance of the overload state. The conclusions of this paper can provide a useful reference for network structure optimization and network security protection in the military field. △ Less

Submitted 9 December, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

arXiv:2205.04089 [pdf]

Observation of fractal topological states in acoustic metamaterials

Authors: Shengjie Zheng, Xianfeng Man, Ze-Lin Kong, Zhi-Kang Lin, Guiju Duan, Ning Chen, Dejie Yu, Jian-Hua Jiang, Baizhan Xia

Abstract: Topological phases of matter have been extensively investigated in solid state materials and classical wave systems with integer dimensions. However, topological states in non-integer dimensions remain largely unexplored. Fractals, being nearly the same at different scales, are one of the intriguing complex geometries with non-integer dimensions. Here, we demonstrate acoustic Sierpiński fractal to… ▽ More Topological phases of matter have been extensively investigated in solid state materials and classical wave systems with integer dimensions. However, topological states in non-integer dimensions remain largely unexplored. Fractals, being nearly the same at different scales, are one of the intriguing complex geometries with non-integer dimensions. Here, we demonstrate acoustic Sierpiński fractal topological insulators with unconventional higher-order topological phenomena via consistent theory and experiments. We discover abundant topological edge and corner states emerging in our acoustic systems due to the rich edge and corner boundaries inside the fractals. Interestingly, the numbers of the edge and corner states scale the same as the bulk states with the system size and the exponents coincide with the Hausdorff fractal dimension of the Sierpiński carpet. Furthermore, the emergent corner states exhibit unconventional spectrum and wave patterns. Our study opens a pathway toward topological states in fractal geometries. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 14 Pages, 4 Figures

arXiv:2205.01818 [pdf, other]

i-Code: An Integrative and Composable Multimodal Learning Framework

Authors: Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang

Abstract: Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. I… ▽ More Human intelligence is multimodal; we integrate visual, linguistic, and acoustic signals to maintain a holistic worldview. Most current pretraining methods, however, are limited to one or two modalities. We present i-Code, a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations. In this framework, data from each modality are first given to pretrained single-modality encoders. The encoder outputs are then integrated with a multimodal fusion network, which uses novel attention mechanisms and other architectural innovations to effectively combine information from the different modalities. The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning. Unlike previous research using only video for pretraining, the i-Code framework can dynamically process single, dual, and triple-modality data during training and inference, flexibly projecting different combinations of modalities into a single representation space. Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11% and demonstrating the power of integrative multimodal pretraining. △ Less

Submitted 5 May, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

arXiv:2204.13962 [pdf, other]

SCS-Co: Self-Consistent Style Contrastive Learning for Image Harmonization

Authors: Yucheng Hang, Bin Xia, Wenming Yang, Qingmin Liao

Abstract: Image harmonization aims to achieve visual consistency in composite images by adapting a foreground to make it compatible with a background. However, existing methods always only use the real image as the positive sample to guide the training, and at most introduce the corresponding composite image as a single negative sample for an auxiliary constraint, which leads to limited distortion knowledge… ▽ More Image harmonization aims to achieve visual consistency in composite images by adapting a foreground to make it compatible with a background. However, existing methods always only use the real image as the positive sample to guide the training, and at most introduce the corresponding composite image as a single negative sample for an auxiliary constraint, which leads to limited distortion knowledge, and further causes a too large solution space, making the generated harmonized image distorted. Besides, none of them jointly constrain from the foreground self-style and foreground-background style consistency, which exacerbates this problem. Moreover, recent region-aware adaptive instance normalization achieves great success but only considers the global background feature distribution, making the aligned foreground feature distribution biased. To address these issues, we propose a self-consistent style contrastive learning scheme (SCS-Co). By dynamically generating multiple negative samples, our SCS-Co can learn more distortion knowledge and well regularize the generated harmonized image in the style representation space from two aspects of the foreground self-style and foreground-background style consistency, leading to a more photorealistic visual result. In addition, we propose a background-attentional adaptive instance normalization (BAIN) to achieve an attention-weighted background feature distribution according to the foreground-background feature similarity. Experiments demonstrate the superiority of our method over other state-of-the-art methods in both quantitative comparison and visual analysis. △ Less

Submitted 29 April, 2022; originally announced April 2022.

Comments: Accepted by CVPR 2022

arXiv:2204.11116 [pdf, other]

Human-Robot Shared Control for Surgical Robot Based on Context-Aware Sim-to-Real Adaptation

Authors: Dandan Zhang, Zicong Wu, Junhong Chen, Ruiqi Zhu, Adnan Munawar, Bo Xiao, Yuan Guan, Hang Su, Wuzhou Hong, Yao Guo, Gregory S. Fischer, Benny Lo, Guang-Zhong Yang

Abstract: Human-robot shared control, which integrates the advantages of both humans and robots, is an effective approach to facilitate efficient surgical operation. Learning from demonstration (LfD) techniques can be used to automate some of the surgical subtasks for the construction of the shared control framework. However, a sufficient amount of data is required for the robot to learn the manoeuvres. Usi… ▽ More Human-robot shared control, which integrates the advantages of both humans and robots, is an effective approach to facilitate efficient surgical operation. Learning from demonstration (LfD) techniques can be used to automate some of the surgical subtasks for the construction of the shared control framework. However, a sufficient amount of data is required for the robot to learn the manoeuvres. Using a surgical simulator to collect data is a less resource-demanding approach. With sim-to-real adaptation, the manoeuvres learned from a simulator can be transferred to a physical robot. To this end, we propose a sim-to-real adaptation method to construct a human-robot shared control framework for robotic surgery. In this paper, a desired trajectory is generated from a simulator using LfD method, while dynamic motion primitives (DMPs) based method is used to transfer the desired trajectory from the simulator to the physical robotic platform. Moreover, a role adaptation mechanism is developed such that the robot can adjust its role according to the surgical operation contexts predicted by a neural network model. The effectiveness of the proposed framework is validated on the da Vinci Research Kit (dVRK). Results of the user studies indicated that with the adaptive human-robot shared control framework, the path length of the remote controller, the total clutching number and the task completion time can be reduced significantly. The proposed method outperformed the traditional manual control via teleoperation. △ Less

Submitted 4 June, 2022; v1 submitted 23 April, 2022; originally announced April 2022.

Comments: Accepted by 2022ICRA

arXiv:2204.10496 [pdf, other]

Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks

Authors: Zhecan Wang, Noel Codella, Yen-Chun Chen, Luowei Zhou, Xiyang Dai, Bin Xiao, Jianwei Yang, Haoxuan You, Kai-Wei Chang, Shih-fu Chang, Lu Yuan

Abstract: Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, un… ▽ More Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, unimodal encoders have achieved state-of-art (SOTA) on many downstream tasks. However, challenges remain when applying to VL tasks. The pretraining data is not optimal for cross-modal architectures and requires heavy computational resources. In addition, unimodal architectures lack cross-modal interactions that have demonstrated significant benefits for VL tasks. Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. In this work, we propose a method to leverage unimodal vision and text encoders for VL tasks that augment existing VL approaches while conserving computational complexity. Specifically, we propose Multimodal Adaptive Distillation (MAD), which adaptively distills useful knowledge from pretrained encoders to cross-modal VL encoders. Second, to better capture nuanced impacts on VL task performance, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data constraints and conditions of domain shift. Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data. Finally, MAD outperforms concurrent works utilizing pretrained vision encoder from CLIP. Code will be made available. △ Less

Submitted 28 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.05729

arXiv:2204.08329 [pdf, other]

A Comprehensive Survey on Data-Efficient GANs in Image Generation

Authors: Ziqiang Li, Beihao Xia, **g Zhang, Chaoyue Wang, Bin Li

Abstract: Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis. These successes of GANs rely on large scale datasets, requiring too much cost. With limited training data, how to stable the training process of GANs and generate realistic images have attracted more attention. The challenges of Data-Efficient GANs (DE-GANs) mainly arise from three aspects: (i) Mismatc… ▽ More Generative Adversarial Networks (GANs) have achieved remarkable achievements in image synthesis. These successes of GANs rely on large scale datasets, requiring too much cost. With limited training data, how to stable the training process of GANs and generate realistic images have attracted more attention. The challenges of Data-Efficient GANs (DE-GANs) mainly arise from three aspects: (i) Mismatch Between Training and Target Distributions, (ii) Overfitting of the Discriminator, and (iii) Imbalance Between Latent and Data Spaces. Although many augmentation and pre-training strategies have been proposed to alleviate these issues, there lacks a systematic survey to summarize the properties, challenges, and solutions of DE-GANs. In this paper, we revisit and define DE-GANs from the perspective of distribution optimization. We conclude and analyze the challenges of DE-GANs. Meanwhile, we propose a taxonomy, which classifies the existing methods into three categories: Data Selection, GANs Optimization, and Knowledge Sharing. Last but not the least, we attempt to highlight the current problems and the future directions. △ Less

Submitted 8 October, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

Comments: Under review

arXiv:2204.07154 [pdf, other]

MiniViT: Compressing Vision Transformers with Weight Multiplexing

Authors: **nian Zhang, Houwen Peng, Kan Wu, Mengchen Liu, Bin Xiao, Jianlong Fu, Lu Yuan

Abstract: Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. However, ViT models suffer from huge number of parameters, restricting their applicability on devices with limited memory. To alleviate this problem, we propose MiniViT, a new compression framework, which achieves parameter reduction in vision transformers while retaining the s… ▽ More Vision Transformer (ViT) models have recently drawn much attention in computer vision due to their high model capability. However, ViT models suffer from huge number of parameters, restricting their applicability on devices with limited memory. To alleviate this problem, we propose MiniViT, a new compression framework, which achieves parameter reduction in vision transformers while retaining the same performance. The central idea of MiniViT is to multiplex the weights of consecutive transformer blocks. More specifically, we make the weights shared across layers, while imposing a transformation on the weights to increase diversity. Weight distillation over self-attention is also applied to transfer knowledge from large-scale ViT models to weight-multiplexed compact models. Comprehensive experiments demonstrate the efficacy of MiniViT, showing that it can reduce the size of the pre-trained Swin-B transformer by 48\%, while achieving an increase of 1.0\% in Top-1 accuracy on ImageNet. Moreover, using a single-layer of parameters, MiniViT is able to compress DeiT-B by 9.7 times from 86M to 9M parameters, without seriously compromising the performance. Finally, we verify the transferability of MiniViT by reporting its performance on downstream benchmarks. Code and models are available at here. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted by CVPR 2022

arXiv:2204.03645 [pdf, other]

DaViT: Dual Attention Vision Transformers

Authors: Mingyu Ding, Bin Xiao, Noel Codella, ** Luo, **gdong Wang, Lu Yuan

Abstract: In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both "spatial tokens" and "channel tokens". With spatial tokens, the spatial dimension d… ▽ More In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective vision transformer architecture that is able to capture global context while maintaining computational efficiency. We propose approaching the problem from an orthogonal angle: exploiting self-attention mechanisms with both "spatial tokens" and "channel tokens". With spatial tokens, the spatial dimension defines the token scope, and the channel dimension defines the token feature dimension. With channel tokens, we have the inverse: the channel dimension defines the token scope, and the spatial dimension defines the token feature dimension. We further group tokens along the sequence direction for both spatial and channel tokens to maintain the linear complexity of the entire model. We show that these two self-attentions complement each other: (i) since each channel token contains an abstract representation of the entire image, the channel attention naturally captures global interactions and representations by taking all spatial positions into account when computing attention scores between channels; (ii) the spatial attention refines the local representations by performing fine-grained interactions across spatial locations, which in turn helps the global information modeling in channel attention. Extensive experiments show our DaViT achieves state-of-the-art performance on four different tasks with efficient computations. Without extra data, DaViT-Tiny, DaViT-Small, and DaViT-Base achieve 82.8%, 84.2%, and 84.6% top-1 accuracy on ImageNet-1K with 28.3M, 49.7M, and 87.9M parameters, respectively. When we further scale up DaViT with 1.5B weakly supervised image and text pairs, DaViT-Gaint reaches 90.4% top-1 accuracy on ImageNet-1K. Code is available at https://github.com/dingmyu/davit. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2204.03610 [pdf, other]

Unified Contrastive Learning in Image-Text-Label Space

Authors: Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Bin Xiao, Ce Liu, Lu Yuan, Jianfeng Gao

Abstract: Visual recognition is recently learned via either supervised learning on human-annotated image-label data or language-image contrastive learning with webly-crawled image-text pairs. While supervised learning may result in a more discriminative representation, language-image pretraining shows unprecedented zero-shot recognition capability, largely due to the different properties of data sources and… ▽ More Visual recognition is recently learned via either supervised learning on human-annotated image-label data or language-image contrastive learning with webly-crawled image-text pairs. While supervised learning may result in a more discriminative representation, language-image pretraining shows unprecedented zero-shot recognition capability, largely due to the different properties of data sources and learning objectives. In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. In this space, we propose a new learning paradigm, called Unified Contrastive Learning (UniCL) with a single learning objective to seamlessly prompt the synergy of two data types. Extensive experiments show that our UniCL is an effective way of learning semantically rich yet discriminative representations, universally for image recognition in zero-shot, linear-probe, fully finetuning and transfer learning scenarios. Particularly, it attains gains up to 9.2% and 14.5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively. In linear probe setting, it also boosts the performance over the two methods by 7.3% and 3.4%, respectively. Our study also indicates that UniCL stand-alone is a good learner on pure image-label data, rivaling the supervised learning methods across three image classification datasets and two types of vision backbones, ResNet and Swin Transformer. Code is available at https://github.com/microsoft/UniCL. △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: CVPR 2022

arXiv:2203.17187 [pdf, other]

doi 10.1103/PhysRevC.109.044912

Nonprompt direct-photon production in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: U. A. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok , et al. (311 additional authors not shown)

Abstract: The measurement of the direct-photon spectrum from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV is presented by the PHENIX collaboration using the external-photon-conversion technique for 0\%--93\% central collisions in a transverse-momentum ($p_T$) range of 0.8--10 GeV/$c$. An excess of direct photons, above prompt-photon production from hard-scattering processes, is observed for $p_T<6$ GeV/… ▽ More The measurement of the direct-photon spectrum from Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV is presented by the PHENIX collaboration using the external-photon-conversion technique for 0\%--93\% central collisions in a transverse-momentum ($p_T$) range of 0.8--10 GeV/$c$. An excess of direct photons, above prompt-photon production from hard-scattering processes, is observed for $p_T<6$ GeV/$c$. Nonprompt direct photons are measured by subtracting the prompt component, which is estimated as $N_{\rm coll}$-scaled direct photons from $p$$+$$p$ collisions at 200 GeV, from the direct-photon spectrum. Results are obtained for $0.8<p_T<6.0$ GeV/$c$ and suggest that the spectrum has an increasing inverse slope from ${\approx}0.2$ to 0.4 GeV/$c$ with increasing $p_T$, which indicates a possible sensitivity of the measurement to photons from earlier stages of the evolution of the collision. In addition, like the direct-photon production, the $p_T$-integrated nonprompt direct-photon yields also follow a power-law scaling behavior as a function of collision-system size. The exponent, $α$, for the nonprompt component is found to be consistent with 1.1 with no apparent $p_T$ dependence. △ Less

Submitted 19 April, 2024; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: 336 authors from 71 institutions, 26 pages, 30 figures, 4 tabels, 2014 data. v2 is version accepted for publication in Physical Review C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

Journal ref: Phys. Rev. C 109, 044912 (2024)

arXiv:2203.17058 [pdf, other]

doi 10.1103/PhysRevC.109.044907

Charm- and Bottom-Quark Production in Au$+$Au Collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship , et al. (321 additional authors not shown)

Abstract: The invariant yield of electrons from open-heavy-flavor decays for $1<p_T<8$ GeV/$c$ at midrapidity $|y|<0.35$ in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV has been measured by the PHENIX experiment at the Relativistic Heavy Ion Collider. A displaced-vertex analysis with the PHENIX silicon-vertex detector enables extraction of the fraction of charm and bottom hadron decays and unfolding o… ▽ More The invariant yield of electrons from open-heavy-flavor decays for $1<p_T<8$ GeV/$c$ at midrapidity $|y|<0.35$ in Au$+$Au collisions at $\sqrt{s_{_{NN}}}$ = 200 GeV has been measured by the PHENIX experiment at the Relativistic Heavy Ion Collider. A displaced-vertex analysis with the PHENIX silicon-vertex detector enables extraction of the fraction of charm and bottom hadron decays and unfolding of the invariant yield of parent charm and bottom hadrons. The nuclear-modification factors $R_{AA}$ for electrons from charm and bottom hadron decays and heavy-flavor hadrons show both a centrality and a quark-mass dependence, indicating suppression in the quark-gluon plasma produced in these collisions that is medium sized and quark-mass dependent. △ Less

Submitted 11 April, 2024; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: 345 authors from 72 institutions, 16 pages, 18 figures, 2014 data. v2 is version accepted for publication in Physical Review C. Plain text data tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

Journal ref: Phys. Rev. C 109, 044907 (2024)

Showing 151–200 of 696 results for author: Xia, B