Search | arXiv e-print repository

doi 10.1103/PhysRevD.107.074018

Mass spectra of neutral mesons $K_0,\ π_0,\ η,\ η'$ at finite magnetic field, temperature and baryon chemical potential

Abstract: The mass spectra of neutral mesons $K_0, π_0, η, η'$ on temperature-quark chemical potential $(T-μ)$ plane in the presence of a constant magnetic field is investigated in the $SU(3)$ NJL model. As a Goldstone boson of chiral symmetry breaking, the mass of $K_0$ meson increases with temperature and/or quark chemical potential, and we observe two kinds of mass jumps of $K_0$ meson in media, which is… ▽ More The mass spectra of neutral mesons $K_0, π_0, η, η'$ on temperature-quark chemical potential $(T-μ)$ plane in the presence of a constant magnetic field is investigated in the $SU(3)$ NJL model. As a Goldstone boson of chiral symmetry breaking, the mass of $K_0$ meson increases with temperature and/or quark chemical potential, and we observe two kinds of mass jumps of $K_0$ meson in media, which is induced by the mass jump of constituent quarks and the magnetic field, respectively. Due to the breaking of isospin symmetry between $u$ and $d$ quarks in magnetic fields, the mixing of $π_0-η- η'$ mesons occurs and this leads to rich structures of their mass spectra. For instance, $π_0$ mass is influenced by the strange quark. There appear the change of increase ratio of $π_0$ mass at high $μ$ and vanishing $T$ and the $π_0$ mass jump crossing over the threshold of two times of strange quark mass at finite $T$ and $μ$. The mass ordering of $π_0, \ η,\ η'$ mesons varies in media, due to their mass jumps, which are induced by the mass jump of constituent quarks or the magnetic field. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: 8 pages, 4 figures

arXiv:2212.01778 [pdf, ps, other]

Improving End-to-end Speech Translation by Leveraging Auxiliary Speech and Text Data

Authors: Yuhao Zhang, Chen Xu, Bojie Hu, Chunliang Zhang, Tong Xiao, **gbo Zhu

Abstract: We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a deno… ▽ More We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the speech translation model can learn from both unlabeled and labeled data, especially when the source-language text data is abundant. Beyond this, we present a denoising method to build a robust text encoder that can deal with both normal and noisy text data. Our system sets new state-of-the-arts on the MuST-C En-De, En-Fr, and LibriSpeech En-Fr tasks. △ Less

Submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted to AAAI 2023

arXiv:2212.00328 [pdf, other]

Differentially Private Learning with Per-Sample Adaptive Clip**

Authors: Tianyu Xia, Shuheng Shen, Su Yao, Xinyi Fu, Ke Xu, Xiaolong Xu, Xing Fu

Abstract: Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clip**, which require… ▽ More Privacy in AI remains a topic that draws attention from researchers and the general public in recent years. As one way to implement privacy-preserving AI, differentially private learning is a framework that enables AI models to use differential privacy (DP). To achieve DP in the learning process, existing algorithms typically limit the magnitude of gradients with a constant clip**, which requires carefully tuned due to its significant impact on model performance. As a solution to this issue, latest works NSGD and Auto-S innovatively propose to use normalization instead of clip** to avoid hyperparameter tuning. However, normalization-based approaches like NSGD and Auto-S rely on a monotonic weight function, which imposes excessive weight on small gradient samples and introduces extra deviation to the update. In this paper, we propose a Differentially Private Per-Sample Adaptive Clip** (DP-PSAC) algorithm based on a non-monotonic adaptive weight function, which guarantees privacy without the typical hyperparameter tuning process of using a constant clip** while significantly reducing the deviation between the update and true batch-averaged gradient. We provide a rigorous theoretical convergence analysis and show that with convergence rate at the same order, the proposed algorithm achieves a lower non-vanishing bound, which is maintained over training iterations, compared with NSGD/Auto-S. In addition, through extensive experimental evaluation, we show that DP-PSAC outperforms or matches the state-of-the-art methods on multiple main-stream vision and language tasks. △ Less

Submitted 2 May, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

Comments: To appear in AAAI 2023, Revised acknowledgments and citations

arXiv:2211.16594 [pdf, other]

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

Authors: Taihong Xiao, Zirui Wang, Liangliang Cao, Jiahui Yu, Shengyang Dai, Ming-Hsuan Yang

Abstract: Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are on… ▽ More Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37% on ImageNet and 96.08% on Stanford Cars, both using five-shot learning). △ Less

Submitted 18 April, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.12290 [pdf, other]

Parametric study on the water impacting of a free-falling symmetric wedge based on the extended von Karman's momentum theory

Authors: Yu** Lu, Alessandro Del Buono, Tianhang Xiao, Alessandro Iafrati, **fa Xu, Shuanghou Deng, Jichang Chen

Abstract: The present study is concerned with the peak acceleration azmax occurring during the water impact of a symmetric wedge. This aspect can be important for design considerations of safe marine vehicles. The water-entry problem is firstly studied numerically using the finite-volume discretization of the incompressible Navier-Stokes equations and the volume-of-fluid method to capture the air-water inte… ▽ More The present study is concerned with the peak acceleration azmax occurring during the water impact of a symmetric wedge. This aspect can be important for design considerations of safe marine vehicles. The water-entry problem is firstly studied numerically using the finite-volume discretization of the incompressible Navier-Stokes equations and the volume-of-fluid method to capture the air-water interface. The choice of the mesh size and time-step is validated by comparison with experimental data of a free fall water-entry of a wedge. The key original contribution of the article concerns the derivation of a relationship for azmax (as well as the correlated parameters when azmax occurs), the initial velocity, the deadrise angle and the mass of the wedge based on the transformation of von Karman momentum theory which is extended with the inclusion of the pile-up effect. The pile-up coefficient, which has been proven dependent on the deadrise angle in the case of water-entry with a constant velocity, is then investigated for the free fall motion and the dependence law derived from Dobrovol'skaya is still valid for varying deadrise angle. Reasonable good theoretical estimates of the kinematic parameters are provided for a relatively wide range of initial velocity, deadrise angle and mass using the extended von Karman momentum theory which is the combination of the original von Karman method and Dobrovol'skaya's solution and this theoretical approach can be extended to predict the kinematic parameters during the whole impacting phase. △ Less

Submitted 30 January, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: 20 pages, 22 figures, 3 tables

Journal ref: Ocean Engineering, Volume 271, March 2023

arXiv:2211.11736 [pdf, other]

Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models

Authors: Ted Xiao, Harris Chan, Pierre Sermanet, Ayzaan Wahid, Anthony Brohan, Karol Hausman, Sergey Levine, Jonathan Tompson

Abstract: In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like… ▽ More In recent years, much progress has been made in learning robotic manipulation policies that follow natural language instructions. Such methods typically learn from corpora of robot-language data that was either collected with specific tasks in mind or expensively re-labelled by humans with rich language descriptions in hindsight. Recently, large-scale pretrained vision-language models (VLMs) like CLIP or ViLD have been applied to robotics for learning representations and scene descriptors. Can these pretrained models serve as automatic labelers for robot data, effectively importing Internet-scale knowledge into existing datasets to make them useful even for tasks that are not reflected in their ground truth annotations? To accomplish this, we introduce Data-driven Instruction Augmentation for Language-conditioned control (DIAL): we utilize semi-supervised language labels leveraging the semantic understanding of CLIP to propagate knowledge onto large datasets of unlabelled demonstration data and then train language-conditioned policies on the augmented datasets. This method enables cheaper acquisition of useful language descriptions compared to expensive human labels, allowing for more efficient label coverage of large-scale datasets. We apply DIAL to a challenging real-world robotic manipulation domain where 96.5% of the 80,000 demonstrations do not contain crowd-sourced language annotations. DIAL enables imitation learning policies to acquire new capabilities and generalize to 60 novel instructions unseen in the original dataset. △ Less

Submitted 1 July, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

Comments: Published as a conference paper at RSS 2023

arXiv:2211.09119 [pdf, other]

Token Turing Machines

Authors: Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

Abstract: We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history (i.e., frames). This memory is efficiently addressed, read and written using a Transformer as the p… ▽ More We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history (i.e., frames). This memory is efficiently addressed, read and written using a Transformer as the processing unit/controller at each step. The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step. We show that TTM outperforms other alternatives, such as other Transformer models designed for long sequences and recurrent neural networks, on two real-world sequential visual understanding tasks: online temporal activity detection from videos and vision-based robot action policy learning. Code is publicly available at: https://github.com/google-research/scenic/tree/main/scenic/projects/token_turing △ Less

Submitted 13 April, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: CVPR 2023 camera-ready copy

Journal ref: CVPR 2023

arXiv:2211.08149 [pdf, other]

doi 10.1016/j.jcp.2023.112317

RelaxNet: A structure-preserving neural network to approximate the Boltzmann collision operator

Authors: Tianbai Xiao, Martin Frank

Abstract: This paper addresses a neural network-based surrogate model that provides a structure-preserving approximation for the fivefold collision integral. The notion originates from the similarity in structure between the BGK-type relaxation model and residual neural network (ResNet) when a particle distribution function is treated as the input to the neural network function. We extend the ResNet archite… ▽ More This paper addresses a neural network-based surrogate model that provides a structure-preserving approximation for the fivefold collision integral. The notion originates from the similarity in structure between the BGK-type relaxation model and residual neural network (ResNet) when a particle distribution function is treated as the input to the neural network function. We extend the ResNet architecture and construct what we call the relaxation neural network (RelaxNet). Specifically, two feed-forward neural networks with physics-informed connections and activations are introduced as building blocks in RelaxNet, which provide bounded and physically realizable approximations of the equilibrium distribution and velocity-dependent relaxation time respectively. The evaluation of the collision term is significantly accelerated since the convolution in the fivefold integral is replaced by tensor multiplication in the neural network. We fuse the mechanical advection operator and the RelaxNet-based collision operator into a unified model named the universal Boltzmann equation (UBE). We prove that UBE preserves the key structural properties in a many-particle system, i.e., positivity, conservation, invariance, and H-theorem. These properties promise that RelaxNet is superior to strategies that naively approximate the right-hand side of the Boltzmann equation using a machine learning model. The construction of the RelaxNet-based UBE and its solution algorithm are presented in detail. Several numerical experiments are investigated. The capability of the current approach for simulating non-equilibrium flow physics is validated through excellent in- and out-of-distribution performance. △ Less

Submitted 15 November, 2022; originally announced November 2022.

Comments: 44 pages, 27 figures, 11 tables

arXiv:2211.06689 [pdf, other]

TINC: Tree-structured Implicit Neural Compression

Authors: Runzhao Yang, Tingxiong Xiao, Yuxiao Cheng, **li Suo, Qionghai Dai

Abstract: Implicit neural representation (INR) can describe the target scenes with high fidelity using a small number of parameters, and is emerging as a promising data compression technique. However, limited spectrum coverage is intrinsic to INR, and it is non-trivial to remove redundancy in diverse complex data effectively. Preliminary studies can only exploit either global or local correlation in the tar… ▽ More Implicit neural representation (INR) can describe the target scenes with high fidelity using a small number of parameters, and is emerging as a promising data compression technique. However, limited spectrum coverage is intrinsic to INR, and it is non-trivial to remove redundancy in diverse complex data effectively. Preliminary studies can only exploit either global or local correlation in the target data and thus of limited performance. In this paper, we propose a Tree-structured Implicit Neural Compression (TINC) to conduct compact representation for local regions and extract the shared features of these local representations in a hierarchical manner. Specifically, we use Multi-Layer Perceptrons (MLPs) to fit the partitioned local regions, and these MLPs are organized in tree structure to share parameters according to the spatial distance. The parameter sharing scheme not only ensures the continuity between adjacent regions, but also jointly removes the local and non-local redundancy. Extensive experiments show that TINC improves the compression fidelity of INR, and has shown impressive compression capabilities over commercial tools and other deep learning based methods. Besides, the approach is of high flexibility and can be tailored for different data and parameter settings. The source code can be found at https://github.com/RichealYoung/TINC . △ Less

Submitted 21 March, 2023; v1 submitted 12 November, 2022; originally announced November 2022.

Comments: Accepted to CVPR2023

ACM Class: I.4.2; E.4

arXiv:2211.06197 [pdf, ps, other]

A convergence study of SGD-type methods for stochastic optimization

Authors: Tiannan Xiao, Guoguo Yang

Abstract: In this paper, we first reinvestigate the convergence of vanilla SGD method in the sense of $L^2$ under more general learning rates conditions and a more general convex assumption, which relieves the conditions on learning rates and do not need the problem to be strongly convex. Then, by taking advantage of the Lyapunov function technique, we present the convergence of the momentum SGD and Nestero… ▽ More In this paper, we first reinvestigate the convergence of vanilla SGD method in the sense of $L^2$ under more general learning rates conditions and a more general convex assumption, which relieves the conditions on learning rates and do not need the problem to be strongly convex. Then, by taking advantage of the Lyapunov function technique, we present the convergence of the momentum SGD and Nesterov accelerated SGD methods for the convex and non-convex problem under $L$-smooth assumption that extends the bounded gradient limitation to a certain extent. The convergence of time averaged SGD was also analyzed. △ Less

Submitted 9 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 14 pages

MSC Class: 60F05; 60J22; 37N40

arXiv:2210.17467 [pdf, other]

Iterative Teaching by Data Hallucination

Authors: Zeju Qiu, Weiyang Liu, Tim Z. Xiao, Zhen Liu, Umang Bhatt, Yucen Luo, Adrian Weller, Bernhard Schölkopf

Abstract: We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving… ▽ More We consider the problem of iterative machine teaching, where a teacher sequentially provides examples based on the status of a learner under a discrete input space (i.e., a pool of finite samples), which greatly limits the teacher's capability. To address this issue, we study iterative teaching under a continuous input space where the input example (i.e., image) can be either generated by solving an optimization problem or drawn directly from a continuous distribution. Specifically, we propose data hallucination teaching (DHT) where the teacher can generate input data intelligently based on labels, the learner's status and the target concept. We study a number of challenging teaching setups (e.g., linear/neural learners in omniscient and black-box settings). Extensive empirical results verify the effectiveness of DHT. △ Less

Submitted 12 April, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: AISTATS 2023 (v2: 22 pages, 24 figures)

arXiv:2210.13368 [pdf, other]

System Configuration and Navigation of a Guide Dog Robot: Toward Animal Guide Dog-Level Guiding Work

Authors: Hochul Hwang, Tim Xia, Ibrahima Keita, Ken Suzuki, Joydeep Biswas, Sunghoon I. Lee, Donghyun Kim

Abstract: A robot guide dog has compelling advantages over animal guide dogs for its cost-effectiveness, potential for mass production, and low maintenance burden. However, despite the long history of guide dog robot research, previous studies were conducted with little or no consideration of how the guide dog handler and the guide dog work as a team for navigation. To develop a robotic guiding system that… ▽ More A robot guide dog has compelling advantages over animal guide dogs for its cost-effectiveness, potential for mass production, and low maintenance burden. However, despite the long history of guide dog robot research, previous studies were conducted with little or no consideration of how the guide dog handler and the guide dog work as a team for navigation. To develop a robotic guiding system that is genuinely beneficial to blind or visually impaired individuals, we performed qualitative research, including interviews with guide dog handlers and trainers and first-hand blindfold walking experiences with various guide dogs. Grounded on the facts learned from vivid experience and interviews, we build a collaborative indoor navigation scheme for a guide dog robot that includes preferred features such as speed and directional control. For collaborative navigation, we propose a semantic-aware local path planner that enables safe and efficient guiding work by utilizing semantic information about the environment and considering the handler's position and directional cues to determine the collision-free path. We evaluate our integrated robotic system by testing guide blindfold walking in indoor settings and demonstrate guide dog-like navigation behavior by avoiding obstacles at typical gait speed ($0.7 \mathrm{m/s}$). △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally

arXiv:2210.13002 [pdf, other]

An Empirical Revisiting of Linguistic Knowledge Fusion in Language Understanding Tasks

Authors: Changlong Yu, Tianyi Xiao, Lingpeng Kong, Yangqiu Song, Wilfred Ng

Abstract: Though linguistic knowledge emerges during large-scale language model pretraining, recent work attempt to explicitly incorporate human-defined linguistic priors into task-specific fine-tuning. Infusing language models with syntactic or semantic knowledge from parsers has shown improvements on many language understanding tasks. To further investigate the effectiveness of structural linguistic prior… ▽ More Though linguistic knowledge emerges during large-scale language model pretraining, recent work attempt to explicitly incorporate human-defined linguistic priors into task-specific fine-tuning. Infusing language models with syntactic or semantic knowledge from parsers has shown improvements on many language understanding tasks. To further investigate the effectiveness of structural linguistic priors, we conduct empirical study of replacing parsed graphs or trees with trivial ones (rarely carrying linguistic knowledge e.g., balanced tree) for tasks in the GLUE benchmark. Encoding with trivial graphs achieves competitive or even better performance in fully-supervised and few-shot settings. It reveals that the gains might not be significantly attributed to explicit linguistic priors but rather to more feature interactions brought by fusion layers. Hence we call for attention to using trivial graphs as necessary baselines to design advanced knowledge fusion methods in the future. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: EMNLP 2022 Main Conference

arXiv:2210.12733 [pdf, other]

Self-supervised Amodal Video Object Segmentation

Authors: Jian Yao, Yuxin Hong, Chiyu Wang, Tianjun Xiao, Tong He, Francesco Locatello, David Wipf, Yanwei Fu, Zheng Zhang

Abstract: Amodal perception requires inferring the full shape of an object that is partially occluded. This task is particularly challenging on two levels: (1) it requires more information than what is contained in the instant retina or imaging sensor, (2) it is difficult to obtain enough well-annotated amodal labels for supervision. To this end, this paper develops a new framework of Self-supervised amodal… ▽ More Amodal perception requires inferring the full shape of an object that is partially occluded. This task is particularly challenging on two levels: (1) it requires more information than what is contained in the instant retina or imaging sensor, (2) it is difficult to obtain enough well-annotated amodal labels for supervision. To this end, this paper develops a new framework of Self-supervised amodal Video object segmentation (SaVos). Our method efficiently leverages the visual information of video temporal sequences to infer the amodal mask of objects. The key intuition is that the occluded part of an object can be explained away if that part is visible in other frames, possibly deformed as long as the deformation can be reasonably learned. Accordingly, we derive a novel self-supervised learning paradigm that efficiently utilizes the visible object parts as the supervision to guide the training on videos. In addition to learning type prior to complete masks for known types, SaVos also learns the spatiotemporal prior, which is also useful for the amodal task and could generalize to unseen types. The proposed framework achieves the state-of-the-art performance on the synthetic amodal segmentation benchmark FISHBOWL and the real world benchmark KINS-Video-Car. Further, it lends itself well to being transferred to novel distributions using test-time adaptation, outperforming existing models even after the transfer to a new distribution. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: accepted in Neurips2022

arXiv:2210.10780 [pdf, other]

An out-of-distribution discriminator based on Bayesian neural network epistemic uncertainty

Authors: Ethan Ancell, Christopher Bennett, Bert Debusschere, Sapan Agarwal, Park Hays, T. Patrick Xiao

Abstract: Neural networks have revolutionized the field of machine learning with increased predictive capability. In addition to improving the predictions of neural networks, there is a simultaneous demand for reliable uncertainty quantification on estimates made by machine learning methods such as neural networks. Bayesian neural networks (BNNs) are an important type of neural network with built-in capabil… ▽ More Neural networks have revolutionized the field of machine learning with increased predictive capability. In addition to improving the predictions of neural networks, there is a simultaneous demand for reliable uncertainty quantification on estimates made by machine learning methods such as neural networks. Bayesian neural networks (BNNs) are an important type of neural network with built-in capability for quantifying uncertainty. This paper discusses aleatoric and epistemic uncertainty in BNNs and how they can be calculated. With an example dataset of images where the goal is to identify the amplitude of an event in the image, it is shown that epistemic uncertainty tends to be lower in images which are well-represented in the training dataset and tends to be high in images which are not well-represented. An algorithm for out-of-distribution (OoD) detection with BNN epistemic uncertainty is introduced along with various experiments demonstrating factors influencing the OoD detection capability in a BNN. The OoD detection capability with epistemic uncertainty is shown to be comparable to the OoD detection in the discriminator network of a generative adversarial network (GAN) with comparable network architecture. △ Less

Submitted 9 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

Comments: 26 pages, 25 figures

arXiv:2210.08217 [pdf, other]

PI-QT-Opt: Predictive Information Improves Multi-Task Robotic Reinforcement Learning at Scale

Authors: Kuang-Huei Lee, Ted Xiao, Adrian Li, Paul Wohlhart, Ian Fischer, Yao Lu

Abstract: The predictive information, the mutual information between the past and future, has been shown to be a useful representation learning auxiliary loss for training reinforcement learning agents, as the ability to model what will happen next is critical to success on many control tasks. While existing studies are largely restricted to training specialist agents on single-task settings in simulation,… ▽ More The predictive information, the mutual information between the past and future, has been shown to be a useful representation learning auxiliary loss for training reinforcement learning agents, as the ability to model what will happen next is critical to success on many control tasks. While existing studies are largely restricted to training specialist agents on single-task settings in simulation, in this work, we study modeling the predictive information for robotic agents and its importance for general-purpose agents that are trained to master a large repertoire of diverse skills from large amounts of data. Specifically, we introduce Predictive Information QT-Opt (PI-QT-Opt), a QT-Opt agent augmented with an auxiliary loss that learns representations of the predictive information to solve up to 297 vision-based robot manipulation tasks in simulation and the real world with a single set of parameters. We demonstrate that modeling the predictive information significantly improves success rates on the training tasks and leads to better zero-shot transfer to unseen novel tasks. Finally, we evaluate PI-QT-Opt on real robots, achieving substantial and consistent improvement over QT-Opt in multiple experimental settings of varying environments, skills, and multi-task configurations. △ Less

Submitted 24 November, 2022; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: CoRL 2022. 21 pages, 9 figures. The supplementary video is available at https://kuanghuei.github.io/piqtopt

arXiv:2210.03420 [pdf, ps, other]

doi 10.1103/PhysRevMaterials.6.094013

Anisotropic magnetic properties and tunable conductivity in two-dimensional layered NaCrX2 (X=Te,Se,S) single crystals

Authors: Jiale Huang, Bingxian Shi, Feihao Pan, **chen Wang, Juanjuan, Liu, Daye Xu, Hongxia Zhang, Tianlong Xia, Peng Cheng

Abstract: Monolayer NaCrX2 (X=Te,Se,S) were theoretically proposed to be two-dimensional intrinsic ferromagnetic semiconductors while their physical properties have not been thoroughly investigated in bulk single crystals. We report the single-crystal growth, structural, magnetic and electronic transport properties of NaCr(Te1-xSex)2 (0 6 x 6 1) and NaCrS2. For NaCr(Te1-xSex)2, the strong perpendicular magn… ▽ More Monolayer NaCrX2 (X=Te,Se,S) were theoretically proposed to be two-dimensional intrinsic ferromagnetic semiconductors while their physical properties have not been thoroughly investigated in bulk single crystals. We report the single-crystal growth, structural, magnetic and electronic transport properties of NaCr(Te1-xSex)2 (0 6 x 6 1) and NaCrS2. For NaCr(Te1-xSex)2, the strong perpendicular magnetic anisotropy of NaCrTe2 can be gradually tuned to be a nearly isotropic one by Se-do**. Meanwhile, a systematic change in the conductivity with increasing x is observed, displaying a do**-induced metal-insulator-like transition. Under magnetic field larger than 30 koe, both NaCrTe2 and NaCrSe2 can be polarized to a ferromagnetic state. While for NaCrS2, robust antiferromagnetism is observed up to 70 kOe and two field-induced metamagnetic transitions are identified along H||ab. These intriguing properties together with the potential to be exfoliated down to few-layer thickness make NaCrX2 (X=Te,Se,S) promising for exploring spintronic applications. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Journal ref: Phys. Rev. Materials 6, 094013(2022)

arXiv:2210.03109 [pdf, other]

Real-World Robot Learning with Masked Visual Pre-training

Authors: Ilija Radosavovic, Tete Xiao, Stephen James, Pieter Abbeel, Jitendra Malik, Trevor Darrell

Abstract: In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic ta… ▽ More In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic tasks and embodiments. We find that our encoder consistently outperforms CLIP (up to 75%), supervised ImageNet pre-training (up to 81%), and training from scratch (up to 81%). Finally, we train a 307M parameter vision transformer on a massive collection of 4.5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: CoRL 2022; Project page: https://tetexiao.com/projects/real-mvp

arXiv:2209.15180 [pdf, other]

SCI: A Spectrum Concentrated Implicit Neural Compression for Biomedical Data

Authors: Runzhao Yang, Tingxiong Xiao, Yuxiao Cheng, Qianni Cao, **yuan Qu, **li Suo, Qionghai Dai

Abstract: Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural repres… ▽ More Massive collection and explosive growth of biomedical data, demands effective compression for efficient storage, transmission and sharing. Readily available visual data compression techniques have been studied extensively but tailored for natural images/videos, and thus show limited performance on biomedical data which are of different features and larger diversity. Emerging implicit neural representation (INR) is gaining momentum and demonstrates high promise for fitting diverse visual data in target-data-specific manner, but a general compression scheme covering diverse biomedical data is so far absent. To address this issue, we firstly derive a mathematical explanation for INR's spectrum concentration property and an analytical insight on the design of INR based compressor. Further, we propose a Spectrum Concentrated Implicit neural compression (SCI) which adaptively partitions the complex biomedical data into blocks matching INR's concentrated spectrum envelop, and design a funnel shaped neural network capable of representing each block with a small number of parameters. Based on this design, we conduct compression via optimization under given budget and allocate the available parameters with high representation accuracy. The experiments show SCI's superior performance to state-of-the-art methods including commercial compressors, data-driven ones, and INR based counterparts on diverse biomedical data. The source code can be found at https://github.com/RichealYoung/ImplicitNeuralCompression.git. △ Less

Submitted 23 November, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: accepted to AAAI2023

ACM Class: I.4.2; I.2.10

arXiv:2209.14860 [pdf, other]

Bridging the Gap to Real-World Object-Centric Learning

Authors: Maximilian Seitzer, Max Horn, Andrii Zadaianchuk, Dominik Zietlow, Tianjun Xiao, Carl-Johann Simon-Gabriel, Tong He, Zheng Zhang, Bernhard Schölkopf, Thomas Brox, Francesco Locatello

Abstract: Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully d… ▽ More Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature. △ Less

Submitted 6 March, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: ICLR 2023 camera-ready version

arXiv:2209.09513 [pdf, other]

Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering

Authors: Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan

Abstract: When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system… ▽ More When answering a question, humans utilize the information available across different modalities to synthesize a consistent and complete chain of thought (CoT). This process is normally a black box in the case of deep learning models like large-scale language models. Recently, science question benchmarks have been used to diagnose the multi-hop reasoning ability and interpretability of an AI system. However, existing datasets fail to provide annotations for the answers, or are restricted to the textual-only modality, small scales, and limited domain diversity. To this end, we present Science Question Answering (ScienceQA), a new benchmark that consists of ~21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations. We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions. ScienceQA demonstrates the utility of CoT in language models, as CoT improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA. We also explore the upper bound for models to leverage explanations by feeding those in the input; we observe that it improves the few-shot performance of GPT-3 by 18.96%. Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data. The data and code are available at https://scienceqa.github.io. △ Less

Submitted 17 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022. 22 pages, 17 figures, 9 tables. Project: https://scienceqa.github.io

arXiv:2209.08218 [pdf, ps, other]

Quantum Non-Demolition Measurement on the Spin Precession of Laser-Trapped $^{171}$Yb Atoms

Authors: Y. A. Yang, T. A. Zheng, S. -Z. Wang, W. -K. Hu, Chang-Ling Zou, T. Xia, Z. -T. Lu

Abstract: Quantum non-demolition (QND) measurement enhances the detection efficiency and measurement fidelity, and is highly desired for its applications in precision measurements and quantum information processing. We propose and demonstrate a QND measurement scheme for the spin states of laser-trapped atoms. On $^{171}$Yb atoms held in an optical dipole trap, a transition that is simultaneously cycling, s… ▽ More Quantum non-demolition (QND) measurement enhances the detection efficiency and measurement fidelity, and is highly desired for its applications in precision measurements and quantum information processing. We propose and demonstrate a QND measurement scheme for the spin states of laser-trapped atoms. On $^{171}$Yb atoms held in an optical dipole trap, a transition that is simultaneously cycling, spin-state selective, and spin-state preserving is created by introducing a circularly polarized beam of control laser to optically dress the spin states in the excited level, while leaving the spin states in the ground level unperturbed. We measure the phase of spin precession of $5\times10^{4}$ atoms in a bias magnetic field of 20 mG. This QND approach reduces the optical absorption detection noise by $\sim$19 dB, to a level of 2.3 dB below the atomic quantum projection noise. In addition to providing a general approach for efficient spin-state readout, this all-optical technique allows quick switching and real-time programming for quantum sensing and quantum information processing. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2209.07691 [pdf, other]

doi 10.3847/1538-4357/ac9ccb

Conditional HI mass functions and the HI-to-halo mass relation in the local Universe

Authors: Xiao Li, Cheng Li, H. J. Mo, Ting Xiao, **g Wang

Abstract: We present a new HI mass estimator which relates the HI-to-stellar mass ratio to four galaxy properties: stellar surface mass density, color index $u-r$, stellar mass and concentration index, with the scatter of individual galaxies around the mean HI mass modeled with a Gaussian distribution. We calibrate the estimator using the xGASS sample, including both HI detection and non-detection, and cons… ▽ More We present a new HI mass estimator which relates the HI-to-stellar mass ratio to four galaxy properties: stellar surface mass density, color index $u-r$, stellar mass and concentration index, with the scatter of individual galaxies around the mean HI mass modeled with a Gaussian distribution. We calibrate the estimator using the xGASS sample, including both HI detection and non-detection, and constrain the model parameters through Bayesian inferences. Tests with mock catalogs demonstrate that our estimator provides unbiased HI masses for optical samples like the SDSS, thus suitable for statistical studies of HI gas contents in galaxies and dark matter halos. We apply our estimator to the SDSS spectroscopic sample to estimate the local HI mass function (HIMF), the conditional HI mass function (CHIMF) in galaxy groups and the HI-halo mass (HIHM) relation. Our HIMF agrees with the ALFALFA measurements at $M_{HI}\gtrsim 5\times 10^9M_{\odot}$, but with higher amplitude and a steeper slope at lower masses. We show that this discrepancy is caused primarily by the cosmic variance which is corrected for the SDSS sample but not for the ALFALFA. The CHIMFs for all halo masses can be described by a single Schechter function, and this is true for red, blue and satellite galaxies. For central galaxies the CHIMFs show a double-Gaussian profile, with the two components contributed by the red and blue galaxies, respectively. The total HI mass in a group increases monotonically with halo mass. The HI mass of central galaxies in galaxy groups increases rapidly with halo mass only at $M_h\lesssim10^{12}M_{\odot}$, while the mass dependence becomes much weaker at higher halo masses. The observed HI-halo mass relation is not reproduced by current hydrodynamic simulations and semi-analytic models of galaxy formation. △ Less

Submitted 13 December, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: 10 figures, 2 tables, published in ApJ. $\mathbf{Note}$ : The version published in ApJ has a typo. In the last paragraph of section 3.2, the maximum posterior value of c_a should be c_a = 0.10 \pm 0.08, not c_a = 0.16 \pm 0.10

arXiv:2209.02713 [pdf, other]

doi 10.1103/PhysRevLett.130.091402

Blazar constraints on neutrino-dark matter scattering

Authors: James M. Cline, Shan Gao, Fangyi Guo, Zhongan Lin, Shiyan Liu, Matteo Puel, Phillip Todd, Tianzhuo Xiao

Abstract: Neutrino emission in coincidence with gamma rays has been observed from the blazar TXS 0506+056 by the IceCube telescope. Neutrinos from the blazar had to pass through a dense spike of dark matter (DM) surrounding the central black hole. The observation of such a neutrino implies new upper bounds on the neutrino-DM scattering cross section as a function of DM mass. The constraint is stronger than… ▽ More Neutrino emission in coincidence with gamma rays has been observed from the blazar TXS 0506+056 by the IceCube telescope. Neutrinos from the blazar had to pass through a dense spike of dark matter (DM) surrounding the central black hole. The observation of such a neutrino implies new upper bounds on the neutrino-DM scattering cross section as a function of DM mass. The constraint is stronger than existing ones for a range of DM masses, if the cross section rises linearly with energy. For constant cross sections, competitive bounds are also possible, depending on details of the DM spike. △ Less

Submitted 19 January, 2023; v1 submitted 6 September, 2022; originally announced September 2022.

Comments: 8 pages, 4 figures; v2: more detailed analysis accounting for neutrino oscillations, neutrino emission region, different choices of initial spectrum, additional constraints on Z' model. Modified figs. 1, 2 and 4 accordingly, and improved version with clarifications

arXiv:2208.09997 [pdf, other]

doi 10.1121/10.0019336

Spatially Selective Active Noise Control Systems

Authors: Tong Xiao, Buye Xu, Chuming Zhao

Abstract: Active noise control (ANC) systems are commonly designed to achieve maximal sound reduction regardless of the incident direction of the sound. When desired sound is present, the state-of-the-art methods add a separate system to reconstruct it. This can result in distortion and latency. In this work, we propose a multi-channel ANC system that only reduces sound from undesired directions, and the sy… ▽ More Active noise control (ANC) systems are commonly designed to achieve maximal sound reduction regardless of the incident direction of the sound. When desired sound is present, the state-of-the-art methods add a separate system to reconstruct it. This can result in distortion and latency. In this work, we propose a multi-channel ANC system that only reduces sound from undesired directions, and the system truly preserves the desired sound instead of reproducing it. The proposed algorithm imposes a spatial constraint on the hybrid ANC cost function to achieve spatial selectivity. Based on a six-channel microphone array on a pair of augmented eyeglasses, results show that the system minimized only noise coming from undesired directions. The control performance could be maintained even when the array was heavily perturbed. The proposed algorithm was also compared with the existing methods in the literature. Not only did the proposed system provide better noise reduction, but it also required much less effort. The binaural localization cues did not need to be reconstructed since the system preserved the physical sound wave from the desired source. △ Less

Submitted 12 May, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

Comments: The following article has been submitted to the Journal of the Acoustical Society of America (JASA). It has been accepted and published in https://doi.org/10.1121/10.0019336

Journal ref: J. Acoust. Soc. Am., Vol. 153, No. 5, pp. 2733-2744, 2023

arXiv:2208.08004 [pdf, other]

Field-wise Embedding Size Search via Structural Hard Auxiliary Mask Pruning for Click-Through Rate Prediction

Authors: Tesi Xiao, Xia Xiao, Ming Chen, Youlong Chen

Abstract: Feature embeddings are one of the most essential steps when training deep learning based Click-Through Rate prediction models, which map high-dimensional sparse features to dense embedding vectors. Classic human-crafted embedding size selection methods are shown to be "sub-optimal" in terms of the trade-off between memory usage and model capacity. The trending methods in Neural Architecture Search… ▽ More Feature embeddings are one of the most essential steps when training deep learning based Click-Through Rate prediction models, which map high-dimensional sparse features to dense embedding vectors. Classic human-crafted embedding size selection methods are shown to be "sub-optimal" in terms of the trade-off between memory usage and model capacity. The trending methods in Neural Architecture Search (NAS) have demonstrated their efficiency to search for embedding sizes. However, most existing NAS-based works suffer from expensive computational costs, the curse of dimensionality of the search space, and the discrepancy between continuous search space and discrete candidate space. Other works that prune embeddings in an unstructured manner fail to reduce the computational costs explicitly. In this paper, to address those limitations, we propose a novel strategy that searches for the optimal mixed-dimension embedding scheme by structurally pruning a super-net via Hard Auxiliary Mask. Our method aims to directly search candidate models in the discrete space using a simple and efficient gradient-based method. Furthermore, we introduce orthogonal regularity on embedding tables to reduce correlations within embedding columns and enhance representation capacity. Extensive experiments demonstrate it can effectively remove redundant embedding dimensions without great performance loss. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2208.03842 [pdf, other]

doi 10.1093/mnras/stac2193

The cold gas and dust properties of red star-forming galaxies

Authors: Ryan Chown, Laura C. Parker, Christine D. Wilson, Toby Brown, Fraser A. Evans, Yang Gao, Ho Seong Hwang, Lihwai Lin, Amelie Saintonge, Mark Sargent, Matthew W. L. Smith, Ting Xiao

Abstract: We study the cold gas and dust properties for a sample of red star forming galaxies called "red misfits." We collect single-dish CO observations and HI observations from representative samples of low-redshift galaxies, as well as our own JCMT CO observations of red misfits. We also obtain SCUBA-2 850 um observations for a subset of these galaxies. With these data we compare the molecular gas, tota… ▽ More We study the cold gas and dust properties for a sample of red star forming galaxies called "red misfits." We collect single-dish CO observations and HI observations from representative samples of low-redshift galaxies, as well as our own JCMT CO observations of red misfits. We also obtain SCUBA-2 850 um observations for a subset of these galaxies. With these data we compare the molecular gas, total cold gas, and dust properties of red misfits against those of their blue counterparts ("blue actives") taking non-detections into account using a survival analysis technique. We compare these properties at fixed position in the log SFR-log M* plane, as well as versus offset from the star-forming main sequence. Compared to blue actives, red misfits have slightly longer molecular gas depletion times, similar total gas depletion times, significantly lower molecular- and total-gas mass fractions, lower dust-to-stellar mass ratios, similar dust-to-gas ratios, and a significantly flatter slope in the $\log M_\mathrm{mol}$-$\log M_\star$ plane. Our results suggest that red misfits as a population are likely quenching due to a shortage in gas supply. △ Less

Submitted 24 August, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

Comments: 16 pages, 7 Figures, accepted to MNRAS

arXiv:2207.13553 [pdf, other]

doi 10.1364/OE.441981

Type of Non-reciprocity in Fiber Sagnac Interferometer Induced by Geometric Phases

Authors: Dongzi Zhao, **g-Zheng Huang, Tailong Xiao, Hong**g Li, Xiaoyan Wu, Guihua Zeng

Abstract: The non-reciprocity of Sagnac interferometer provides ultra-high sensitivity for parameter estimation and offers a wide range of applications, especially for optical fiber sensing. In this work, we study a new type of non-reciprocity existed in optical fiber Sagnac interferometer where the polarization dependent loss is taken into consideration. In particular, this non-reciprocity is irrelevant to… ▽ More The non-reciprocity of Sagnac interferometer provides ultra-high sensitivity for parameter estimation and offers a wide range of applications, especially for optical fiber sensing. In this work, we study a new type of non-reciprocity existed in optical fiber Sagnac interferometer where the polarization dependent loss is taken into consideration. In particular, this non-reciprocity is irrelevant to the physical effects that being considered in previous studies, which originates from the geometric phases induced by continuous-weak-measurement. In consequence, it has a unique phenomenon of sudden phase transition, which may open a new way for the future design of high precision optical fiber sensors. △ Less

Submitted 28 July, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: 10 pages, 11 figures. The wrong author list in v1 has been corrected

arXiv:2207.13196 [pdf, other]

doi 10.1093/mnras/stac2108

Dust grain size evolution in local galaxies: a comparison between observations and simulations

Authors: M. Relano, I. De Looze, A. Saintonge, K. -C. Hou, L. Romano, K. Nagamine, H. Hirashita, S. Aoyama, I. Lamperti, U. Lisenfeld, M. Smith, J. Chastenet, T. Xiao, Y. Gao, M. Sargent, S. A. van der Giessen

Abstract: The evolution of the dust grain size distribution has been studied in recent years with great detail in cosmological hydrodynamical simulations taking into account all the channels under which dust evolves in the interstellar medium. We present a systematic analysis of the observed spectral energy distribution of a large sample of galaxies in the local universe in order to derive not only the tota… ▽ More The evolution of the dust grain size distribution has been studied in recent years with great detail in cosmological hydrodynamical simulations taking into account all the channels under which dust evolves in the interstellar medium. We present a systematic analysis of the observed spectral energy distribution of a large sample of galaxies in the local universe in order to derive not only the total dust masses but also the relative mass fraction between small and large dust grains (DS/DL). Simulations reproduce fairly well the observations except for the high stellar mass regime where dust masses tend to be overestimated. We find that ~45% of galaxies exhibit DS/DL consistent with the expectations of simulations, while there is a sub-sample of massive galaxies presenting high DS/DL (log(DS/DL)~-0.5), and deviating from the prediction in simulations. For these galaxies, which also have high molecular gas mass fractions and metallicities, coagulation is not an important mechanism affecting the dust evolution. Including diffusion, transporting large grains from dense regions to a more diffuse medium where they can be easily shattered, would explain the observed high DS/DL values in these galaxies. With this study we reinforce the use of the small-to-large grain mass ratio to study the relative importance of the different mechanisms in the dust life cycle. Multi-phase hydrodynamical simulations with detailed feedback prescriptions and more realistic subgrid models for the dense phase could help to reproduce the evolution of the dust grain size distribution traced by observations. △ Less

Submitted 26 July, 2022; originally announced July 2022.

Comments: 32 pages, 22 figures, 4 tables. Accepted in MNRAS

arXiv:2207.11755 [pdf, other]

Revisiting the central limit theorems for the SGD-type methods

Authors: Tiejun Li, Tiannan Xiao, Guoguo Yang

Abstract: We revisited the central limit theorem (CLT) for stochastic gradient descent (SGD) type methods, including the vanilla SGD, momentum SGD and Nesterov accelerated SGD methods with constant or vanishing dam** parameters. By taking advantage of Lyapunov function technique and $L^p$ bound estimates, we established the CLT under more general conditions on learning rates for broader classes of SGD met… ▽ More We revisited the central limit theorem (CLT) for stochastic gradient descent (SGD) type methods, including the vanilla SGD, momentum SGD and Nesterov accelerated SGD methods with constant or vanishing dam** parameters. By taking advantage of Lyapunov function technique and $L^p$ bound estimates, we established the CLT under more general conditions on learning rates for broader classes of SGD methods compared with previous results. The CLT for the time average was also investigated, and we found that it held in the linear case, while it was not generally true in nonlinear situation. Numerical tests were also carried out to verify our theoretical analysis. △ Less

Submitted 9 June, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 23 pages, 2 figures

MSC Class: 60F05; 60J22; 37N40

arXiv:2207.10413 [pdf, other]

On applicability of von Karman's momentum theory in predicting the water entry load of V-shaped structures with varying initial velocity

Authors: Yu** Lu, Alessandro Del Buono, Tianhang Xiao, Alessandro Iafrati, Shuanghou Deng, **fa Xu

Abstract: The water landing of an amphibious aircraft is a complicated problem that can lead to uncomfortable riding situation and structural damage due to large vertical accelerations and the consequent dynamic responses. The problem herein is investigated by solving unsteady incompressible Reynolds-averaged Navier-Stokes equations with a standard k-omega turbulence closure model. The theoretical solutions… ▽ More The water landing of an amphibious aircraft is a complicated problem that can lead to uncomfortable riding situation and structural damage due to large vertical accelerations and the consequent dynamic responses. The problem herein is investigated by solving unsteady incompressible Reynolds-averaged Navier-Stokes equations with a standard k-omega turbulence closure model. The theoretical solutions established by the von Karman's momentum theory are also employed. In order to validate the relationships between the initial vertical velocity and the peak value of vertical acceleration, free fall test cases of 2D symmetric wedge oblique entry and 3D cabin section vertical entry are presented first. The other parameters at which the maximum acceleration occurs, such as time, penetration depth, velocity, are also evaluated. Hence, the quantitative relations are investigated to water landing event for amphibious aircraft. Detailed results in terms of free surface shape and pressure distribution are provided to show the slamming effects. The results show that a linear dependence of the maximal acceleration from the square of initial vertical velocity can be derived for two-dimensional wedge, three-dimensional cabin section and seaplane with V-shaped hull. Moreover, the ratio between the corresponding velocity and the initial vertical velocity tends to a constant threshold value, 5/6, derived from the theoretical solution, when increasing the initial vertical velocity in all three cases. △ Less

Submitted 25 August, 2022; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 25 pages, 28 figures, 4 tables

Journal ref: Ocean Engineering, Volume 262, October 2022

arXiv:2207.08140 [pdf, ps, other]

doi 10.1103/PhysRevLett.129.083001

Measurement of the Electric Dipole Moment of $^{171}$Yb Atoms in an Optical Dipole Trap

Authors: T. A. Zheng, Y. A. Yang, S. -Z. Wang, J. T. Singh, Z. -X. Xiong, T. Xia, Z. -T. Lu

Abstract: The permanent electric dipole moment (EDM) of the $^{171}$Yb $(I=1/2)$ atom is measured with atoms held in an optical dipole trap (ODT). By enabling a cycling transition that is simultaneously spin-selective and spin-preserving, a quantum non-demolition measurement with a spin-detection efficiency of 50$\%$ is realized. A systematic effect due to parity mixing induced by a static E field is observ… ▽ More The permanent electric dipole moment (EDM) of the $^{171}$Yb $(I=1/2)$ atom is measured with atoms held in an optical dipole trap (ODT). By enabling a cycling transition that is simultaneously spin-selective and spin-preserving, a quantum non-demolition measurement with a spin-detection efficiency of 50$\%$ is realized. A systematic effect due to parity mixing induced by a static E field is observed, and is suppressed by averaging between measurements with ODTs in opposite directions. The coherent spin precession time is found to be much longer than 300 s. The EDM is determined to be $d({\rm^{171}Yb})={\color{black}(-6.8\pm5.1_{\rm stat}\pm1.2_{\rm syst})\times10^{-27}\ e\ \rm cm}$, leading to an upper limit of $|d({\rm^{171}Yb})|<{\color{black}1.5\times10^{-26}\ e\ \rm cm}$ ($95\%$ C.L.). These measurement techniques can be adapted to search for the EDM of $^{225}$Ra. △ Less

Submitted 17 July, 2022; originally announced July 2022.

arXiv:2207.05608 [pdf, other]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Authors: Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Abstract: Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to… ▽ More Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Project website: https://innermonologue.github.io

arXiv:2207.01271 [pdf, other]

FlowNAS: Neural Architecture Search for Optical Flow Estimation

Authors: Zhiwei Lin, Tingting Liang, Taihong Xiao, Yongtao Wang, Zhi Tang, Ming-Hsuan Yang

Abstract: Existing optical flow estimators usually employ the network architectures typically designed for image classification as the encoder to extract per-pixel features. However, due to the natural difference between the tasks, the architectures designed for image classification may be sub-optimal for flow estimation. To address this issue, we propose a neural architecture search method named FlowNAS to… ▽ More Existing optical flow estimators usually employ the network architectures typically designed for image classification as the encoder to extract per-pixel features. However, due to the natural difference between the tasks, the architectures designed for image classification may be sub-optimal for flow estimation. To address this issue, we propose a neural architecture search method named FlowNAS to automatically find the better encoder architecture for flow estimation task. We first design a suitable search space including various convolutional operators and construct a weight-sharing super-network for efficiently evaluating the candidate architectures. Then, for better training the super-network, we propose Feature Alignment Distillation, which utilizes a well-trained flow estimator to guide the training of super-network. Finally, a resource-constrained evolutionary algorithm is exploited to find an optimal architecture (i.e., sub-network). Experimental results show that the discovered architecture with the weights inherited from the super-network achieves 4.67\% F1-all error on KITTI, an 8.4\% reduction of RAFT baseline, surpassing state-of-the-art handcrafted models GMA and AGFlow, while reducing the model complexity and latency. The source code and trained models will be released in https://github.com/VDIGPKU/FlowNAS. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.15474 [pdf, other]

Forecasting Future World Events with Neural Networks

Authors: Andy Zou, Tristan Xiao, Ryan Jia, Joe Kwon, Mantas Mazeika, Richard Li, Dawn Song, Jacob Steinhardt, Owain Evans, Dan Hendrycks

Abstract: Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing tho… ▽ More Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus. Questions are taken from forecasting tournaments, ensuring high quality, real-world importance, and diversity. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts (avoiding leakage from the future). Motivated by the difficulty of forecasting numbers across orders of magnitude (e.g. global cases of COVID-19 in 2022), we also curate IntervalQA, a dataset of numerical questions and metrics for calibration. We test language models on our forecasting task and find that performance is far below a human expert baseline. However, performance improves with increased model size and incorporation of relevant information from the news corpus. In sum, Autocast poses a novel challenge for large language models and improved performance could bring large practical benefits. △ Less

Submitted 9 October, 2022; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022; our dataset is available at https://github.com/andyzoujm/autocast

arXiv:2206.09337 [pdf, other]

Learning Multiscale Transformer Models for Sequence Generation

Authors: Bei Li, Tong Zheng, Yi **g, Chengbo Jiao, Tong Xiao, **gbo Zhu

Abstract: Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly mo… ▽ More Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency. △ Less

Submitted 19 June, 2022; originally announced June 2022.

Comments: accepted by ICML2022

arXiv:2206.03955 [pdf, other]

Out-of-Distribution Detection with Class Ratio Estimation

Authors: Mingtian Zhang, Andi Zhang, Tim Z. Xiao, Yitong Sun, Steven McDonagh

Abstract: Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and… ▽ More Density-based Out-of-distribution (OOD) detection has recently been shown unreliable for the task of detecting OOD images. Various density ratio based approaches achieve good empirical performance, however methods typically lack a principled probabilistic modelling explanation. In this work, we propose to unify density ratio based methods under a novel framework that builds energy-based models and employs differing base distributions. Under our framework, the density ratio can be viewed as the unnormalized density of an implicit semantic distribution. Further, we propose to directly estimate the density ratio of a data sample through class ratio estimation. We report competitive results on OOD image problems in comparison with recent work that alternatively requires training of deep generative models for the task. Our approach enables a simple and yet effective path towards solving the OOD detection problem. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2206.03851 [pdf, other]

Reconsidering Learning Objectives in Unbiased Recommendation with Unobserved Confounders

Authors: Teng Xiao, Zhengyu Chen, Suhang Wang

Abstract: This work studies the problem of learning unbiased algorithms from biased feedback for recommendation. We address this problem from a novel distribution shift perspective. Recent works in unbiased recommendation have advanced the state-of-the-art with various techniques such as re-weighting, multi-task learning, and meta-learning. Despite their empirical successes, most of them lack theoretical gu… ▽ More This work studies the problem of learning unbiased algorithms from biased feedback for recommendation. We address this problem from a novel distribution shift perspective. Recent works in unbiased recommendation have advanced the state-of-the-art with various techniques such as re-weighting, multi-task learning, and meta-learning. Despite their empirical successes, most of them lack theoretical guarantees, forming non-negligible gaps between theories and recent algorithms. In this paper, we propose a theoretical understanding of why existing unbiased learning objectives work for unbiased recommendation. We establish a close connection between unbiased recommendation and distribution shift, which shows that existing unbiased learning objectives implicitly align biased training and unbiased test distributions. Built upon this connection, we develop two generalization bounds for existing unbiased learning methods and analyze their learning behavior. Besides, as a result of the distribution shift, we further propose a principled framework, Adversarial Self-Training (AST), for unbiased recommendation. Extensive experiments on real-world and semi-synthetic datasets demonstrate the effectiveness of AST. △ Less

Submitted 3 October, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: KDD2023

arXiv:2206.03601 [pdf, other]

Decoupled Self-supervised Learning for Non-Homophilous Graphs

Authors: Teng Xiao, Zhengyu Chen, Zhimeng Guo, Zeyang Zhuang, Suhang Wang

Abstract: This paper studies the problem of conducting self-supervised learning for node representation learning on graphs. Most existing self-supervised learning methods assume the graph is homophilous, where linked nodes often belong to the same class or have similar features. However, such assumptions of homophily do not always hold in real-world graphs. We address this problem by develo** a decoupled… ▽ More This paper studies the problem of conducting self-supervised learning for node representation learning on graphs. Most existing self-supervised learning methods assume the graph is homophilous, where linked nodes often belong to the same class or have similar features. However, such assumptions of homophily do not always hold in real-world graphs. We address this problem by develo** a decoupled self-supervised learning (DSSL) framework for graph neural networks. DSSL imitates a generative process of nodes and links from latent variable modeling of the semantic structure, which decouples different underlying semantics between different neighborhoods into the self-supervised learning process. Our DSSL framework is agnostic to the encoders and does not need prefabricated augmentations, thus is flexible to different graphs. To effectively optimize the framework, we derive the evidence lower bound of the self-supervised objective and develop a scalable training algorithm with variational inference. We provide a theoretical analysis to justify that DSSL enjoys the better downstream performance. Extensive experiments on various types of graph benchmarks demonstrate that our proposed framework can achieve better performance compared with competitive baselines. △ Less

Submitted 1 October, 2023; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.03405 [pdf, ps, other]

doi 10.1002/adfm.202208616

Observation of surface superconductivity in a three-dimensional Dirac material

Authors: Qi Liu, Peng-Jie Guo, Xiao-Yu Yue, Zhe-Kai Yi, Qing-Xin Dong, Hui Liang, Dan-Dan Wu, Yan Sun, Qiu-Ju Li, Wen-Liang Zhu, Tian-Long Xia, Xue-Feng Sun, Yi-Yan Wang

Abstract: Superconductivity becomes more interesting when it encounters dimensional constraint or topology, because it is of importance for exploring exotic quantum phenomena or develo** superconducting electronics. Here we report the coexistence of naturally formed surface superconducting state and three-dimensional topological Dirac state in single crystals of BaMg$_2$Bi$_2$. The electronic structure ob… ▽ More Superconductivity becomes more interesting when it encounters dimensional constraint or topology, because it is of importance for exploring exotic quantum phenomena or develo** superconducting electronics. Here we report the coexistence of naturally formed surface superconducting state and three-dimensional topological Dirac state in single crystals of BaMg$_2$Bi$_2$. The electronic structure obtained from the first-principles calculations demonstrates that BaMg$_2$Bi$_2$ is an ideal Dirac material, in which the Dirac point is very close to the Fermi level and no other energy band crosses the Fermi level. Superconductivity up to 4.77 K can be observed under ambient pressure in the measurements of resistivity. The angle dependent magnetoresistance reveals the two-dimensional characteristic of superconductivity, indicating that superconductivity occurs on the surface of the sample and is absent in the bulk state. Our study not only provides BaMg$_2$Bi$_2$ as a suitable platform to study the interplay between superconductivity and topological Dirac state, but also indicates that MgBi-based materials may be a promising system for exploring new superconductors. △ Less

Submitted 7 June, 2022; originally announced June 2022.

Journal ref: Adv. Funct. Mater. 32, 2208616, 2022

arXiv:2205.14539 [pdf, other]

Improving VAE-based Representation Learning

Authors: Mingtian Zhang, Tim Z. Xiao, Brooks Paige, David Barber

Abstract: Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In th… ▽ More Latent variable models like the Variational Auto-Encoder (VAE) are commonly used to learn representations of images. However, for downstream tasks like semantic classification, the representations learned by VAE are less competitive than other non-latent variable models. This has led to some speculations that latent variable models may be fundamentally unsuitable for representation learning. In this work, we study what properties are required for good representations and how different VAE structure choices could affect the learned properties. We show that by using a decoder that prefers to learn local features, the remaining global features can be well captured by the latent, which significantly improves performance of a downstream classification task. We further apply the proposed model to semi-supervised learning tasks and demonstrate improvements in data efficiency. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.11402 [pdf, other]

Causal Machine Learning for Healthcare and Precision Medicine

Authors: Pedro Sanchez, Jeremy P. Voisey, Tian Xia, Hannah I. Watson, Alison Q. ONeil, Sotirios A. Tsaftaris

Abstract: Causal machine learning (CML) has experienced increasing popularity in healthcare. Beyond the inherent capabilities of adding domain knowledge into learning systems, CML provides a complete toolset for investigating how a system would react to an intervention (e.g.\ outcome given a treatment). Quantifying effects of interventions allows actionable decisions to be made whilst maintaining robustness… ▽ More Causal machine learning (CML) has experienced increasing popularity in healthcare. Beyond the inherent capabilities of adding domain knowledge into learning systems, CML provides a complete toolset for investigating how a system would react to an intervention (e.g.\ outcome given a treatment). Quantifying effects of interventions allows actionable decisions to be made whilst maintaining robustness in the presence of confounders. Here, we explore how causal inference can be incorporated into different aspects of clinical decision support (CDS) systems by using recent advances in machine learning. Throughout this paper, we use Alzheimer's disease (AD) to create examples for illustrating how CML can be advantageous in clinical scenarios. Furthermore, we discuss important challenges present in healthcare applications such as processing high-dimensional and unstructured data, generalisation to out-of-distribution samples, and temporal relationships, that despite the great effort from the research community remain to be solved. Finally, we review lines of research within causal representation learning, causal discovery and causal reasoning which offer the potential towards addressing the aforementioned challenges. △ Less

Submitted 31 May, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 19 pages, 4 figures, 1 table

arXiv:2205.08417 [pdf, other]

KiT-RT: An extendable framework for radiative transfer and therapy

Authors: Jonas Kusch, Steffen Schotthöfer, Pia Stammer, Jannick Wolters, Tianbai Xiao

Abstract: In this paper we present KiT-RT (Kinetic Transport Solver for Radiation Therapy), an open-source C++ based framework for solving kinetic equations in radiation therapy applications. The aim of this code framework is to provide a collection of classical deterministic solvers for unstructured meshes that allow for easy extendability. Therefore, KiT-RT is a convenient base to test new numerical metho… ▽ More In this paper we present KiT-RT (Kinetic Transport Solver for Radiation Therapy), an open-source C++ based framework for solving kinetic equations in radiation therapy applications. The aim of this code framework is to provide a collection of classical deterministic solvers for unstructured meshes that allow for easy extendability. Therefore, KiT-RT is a convenient base to test new numerical methods in various applications and compare them against conventional solvers. The implementation includes spherical-harmonics, minimal entropy, neural minimal entropy and discrete ordinates methods. Solution characteristics and efficiency are presented through several test cases ranging from radiation transport to electron radiation therapy. Due to the variety of included numerical methods and easy extendability, the presented open source code is attractive for both developers, who want a basis to build their own numerical solvers and users or application engineers, who want to gain experimental insights without directly interfering with the codebase. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 28 pages, 15 figures, journal submission

MSC Class: 65M08 ACM Class: G.4; J.2

arXiv:2205.03857 [pdf]

doi 10.1103/PhysRevLett.128.216403

Tracking valley topology with synthetic Weyl paths

Authors: Xiying Fan, Tianzhi Xia, Huahui Qiu, Qicheng Zhang, Chunyin Qiu

Abstract: Inspired by the newly emergent valleytronics, great interest has been attracted to the topological valley transport in classical metacrystals. The presence of nontrivial domain-wall states is interpreted with a concept of valley Chern number, which is well defined only in the limit of small bandgap. Here, we propose a new visual angle to track the intricate valley topology in classical systems. Be… ▽ More Inspired by the newly emergent valleytronics, great interest has been attracted to the topological valley transport in classical metacrystals. The presence of nontrivial domain-wall states is interpreted with a concept of valley Chern number, which is well defined only in the limit of small bandgap. Here, we propose a new visual angle to track the intricate valley topology in classical systems. Benefiting from the controllability of our acoustic metacrystals, we construct Weyl points in synthetic three-dimensional momentum space through introducing an extra structural parameter (rotation angle here). As such, the two-dimensional valley-projected band topology can be tracked with the strictly quantized topological charge in three-dimensional Weyl crystal, which features open surface arcs connecting the synthetic Weyl points and gapless chiral surface states along specific Weyl paths. All theoretical predictions are conclusively identified by our acoustic experiments. Our findings may promote the development of topological valley physics, which is less well-defined yet under hot debate in multiple physical disciplines. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: Phys.Rev.Lett. Accepted

arXiv:2205.03528 [pdf, other]

Titanium Nitride Film on Sapphire Substrate with Low Dielectric Loss for Superconducting Qubits

Authors: Hao Deng, Zhijun Song, Ran Gao, Tian Xia, Feng Bao, Xun Jiang, Hsiang-Sheng Ku, Zhisheng Li, Xizheng Ma, ** Qin, Hantao Sun, Chengchun Tang, Tenghui Wang, Feng Wu, Wenlong Yu, Gengyan Zhang, Xiaohang Zhang, **gwei Zhou, Xing Zhu, Yaoyun Shi, Hui-Hai Zhao, Chunqing Deng

Abstract: Dielectric loss is one of the major decoherence sources of superconducting qubits. Contemporary high-coherence superconducting qubits are formed by material systems mostly consisting of superconducting films on substrate with low dielectric loss, where the loss mainly originates from the surfaces and interfaces. Among the multiple candidates for material systems, a combination of titanium nitride… ▽ More Dielectric loss is one of the major decoherence sources of superconducting qubits. Contemporary high-coherence superconducting qubits are formed by material systems mostly consisting of superconducting films on substrate with low dielectric loss, where the loss mainly originates from the surfaces and interfaces. Among the multiple candidates for material systems, a combination of titanium nitride (TiN) film and sapphire substrate has good potential because of its chemical stability against oxidization, and high quality at interfaces. In this work, we report a TiN film deposited onto sapphire substrate achieving low dielectric loss at the material interface. Through the systematic characterizations of a series of transmon qubits fabricated with identical batches of TiN base layers, but different geometries of qubit shunting capacitors with various participation ratios of the material interface, we quantitatively extract the loss tangent value at the substrate-metal interface smaller than $8.9 \times 10^{-4}$ in 1-nm disordered layer. By optimizing the interface participation ratio of the transmon qubit, we reproducibly achieve qubit lifetimes of up to 300 $μ$s and quality factors approaching 8 million. We demonstrate that TiN film on sapphire substrate is an ideal material system for high-coherence superconducting qubits. Our analyses further suggest that the interface dielectric loss around the Josephson junction part of the circuit could be the dominant limitation of lifetimes for state-of-the-art transmon qubits. △ Less

Submitted 6 May, 2022; originally announced May 2022.

arXiv:2205.00665 [pdf, other]

Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

Authors: Rui Shu, Tianpei Xia, Huy Tu, Laurie Williams, Tim Menzies

Abstract: Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expen… ▽ More Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expensive to acquire. Goal: To help security practitioners train useful security classification models when few labeled training data and many unlabeled training data are available. Method: We propose an adaptive framework called Dapper, which optimizes 1) semi-supervised learning algorithms to assign pseudo-labels to unlabeled data in a propagation paradigm and 2) the machine learning classifier (i.e., random forest). When the dataset class is highly imbalanced, Dapper then adaptively integrates and optimizes a data oversampling method called SMOTE. We use the novel Bayesian Optimization to search a large hyperparameter space of these tuning targets. Result: We evaluate Dapper with three security datasets, i.e., the Twitter spam dataset, the malware URLs dataset, and the CIC-IDS-2017 dataset. Experimental results indicate that we can use as low as 10% of original labeled data but achieve close or even better classification performance than using 100% labeled data in a supervised way. Conclusion: Based on those results, we would recommend using hyperparameter optimization with semi-supervised learning when dealing with shortages of labeled security data. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.09420 [pdf, other]

doi 10.1038/s41467-022-35184-7

Ferromagnetic-antiferromagnetic coexisting ground states and exchange bias effects in $\bf{MnBi_4Te_7}$ and $\bf{MnBi_6Te_{10}}$

Authors: Xiaolong Xu, Shiqi Yang, Huan Wang, Roger Guzman, Yaozheng Zhu, Yuxuan Peng, Zhihao Zang, Ming Xi, Shangjie Tian, Yan** Li, Hechang Lei, Zhaochu Luo, **bo Yang, Tianlong Xia, Wu Zhou, Yuan Huang, Yu Ye

Abstract: Natural superlattice structures $\rm{(MnBi_2Te_4)(Bi_2Te_3)}$$_n$ ($n$ = 1, 2,...), in which magnetic $\rm{MnBi_2Te_4}$ layers are separated by nonmagnetic $\rm{Bi_2Te_3}$ layers, hold band topology, magnetism and reduced interlayer coupling, providing a promising platform for the realization of exotic topological quantum states. However, their magnetism in the two-dimensional limit, which is cruc… ▽ More Natural superlattice structures $\rm{(MnBi_2Te_4)(Bi_2Te_3)}$$_n$ ($n$ = 1, 2,...), in which magnetic $\rm{MnBi_2Te_4}$ layers are separated by nonmagnetic $\rm{Bi_2Te_3}$ layers, hold band topology, magnetism and reduced interlayer coupling, providing a promising platform for the realization of exotic topological quantum states. However, their magnetism in the two-dimensional limit, which is crucial for further exploration of quantum phenomena, remains elusive. Here, complex ferromagnetic (FM)-antiferromagnetic (AFM) coexisting ground states that persist up to the 2-septuple layers (SLs) limit are observed and comprehensively investigated in $\rm{MnBi_4Te_7}$ ($n$ = 1) and $\rm{MnBi_6Te_{10}}$ ($n$ = 2). The ubiquitous Mn-Bi site mixing modifies or even changes the sign of the subtle inter-SL magnetic interactions, yielding a spatially inhomogeneous interlayer coupling. Further, a tunable exchange bias effect is observed in $\rm{(MnBi_2Te_4)(Bi_2Te_3)}$$_n$ ($n$ = 1, 2), arising from the coupling between the FM and AFM components in the ground state. Our work highlights a new approach toward the fine-tuning of magnetism and paves the way for further study of quantum phenomena in $\rm{(MnBi_2Te_4)(Bi_2Te_3)}$$_n$ ($n$ = 1, 2,...) as well as their magnetic applications. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: 9 pages, 4 figures

arXiv:2204.07912 [pdf]

Nonlinear study of local ballooning mode near the separatrix

Authors: T. F. Tang, X. Q. Xu, K. Li, M. Q. Wu, X. X. Zhang, X. Gao, G. Q. Li, T. Y. Xia, D. Z. Wang

Abstract: Small edge-localized-mode (ELM), similar to the quasi-continuous exhaust (QCE), has been achieved by increasing the density at the separatrix. Starting from the Type-I ELM experimental data in EAST, we have performed a numerical separatrix density scan to study the formation of the small ELM using BOUT++ 6-field 2-fluid module. In the high separatrix density case, localized collapse near the separ… ▽ More Small edge-localized-mode (ELM), similar to the quasi-continuous exhaust (QCE), has been achieved by increasing the density at the separatrix. Starting from the Type-I ELM experimental data in EAST, we have performed a numerical separatrix density scan to study the formation of the small ELM using BOUT++ 6-field 2-fluid module. In the high separatrix density case, localized collapse near the separatrix has been found. The corresponding ELM size, dominant mode number, and filament transport match the experimental observations of the QCE. Local ballooning mode near separatrix has been identified in the nonlinear simulation. The mode is driven by the local pressure gradient and the mode structure is constrained by the ExB shear. △ Less

Submitted 16 April, 2022; originally announced April 2022.

arXiv:2204.04746 [pdf, other]

doi 10.1016/j.media.2023.102803

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Authors: Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan, Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang, Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla, Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao , et al. (37 additional authors not shown)

Abstract: Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in… ▽ More Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery. △ Less

Submitted 29 December, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: CholecTriplet2021 challenge report. Paper accepted at Elsevier journal of Medical Image Analysis. 22 pages, 8 figures, 11 tables. Challenge website: https://cholectriplet2021.grand-challenge.org

Journal ref: Medical Image Analysis 86 (2023) 102803

arXiv:2204.02472 [pdf, other]

Equivalence of coupled parametric oscillator dynamics to Lagrange multiplier primal-dual optimization

Authors: Sri Krishna Vadlamani, Tianyao Patrick Xiao, Eli Yablonovitch

Abstract: There has been a recent surge of interest in physics-based solvers for combinatorial optimization problems. We present a dynamical solver for the Ising problem that is comprised of a network of coupled parametric oscillators and show that it implements Lagrange multiplier constrained optimization. We show that the pump depletion effect, which is intrinsic to parametric oscillators, enforces binary… ▽ More There has been a recent surge of interest in physics-based solvers for combinatorial optimization problems. We present a dynamical solver for the Ising problem that is comprised of a network of coupled parametric oscillators and show that it implements Lagrange multiplier constrained optimization. We show that the pump depletion effect, which is intrinsic to parametric oscillators, enforces binary constraints and enables the system's continuous analog variables to converge to the optimal binary solutions to the optimization problem. Moreover, there is an exact correspondence between the equations of motion for the coupled oscillators and the update rules in the primal-dual method of Lagrange multipliers. Though our analysis is performed using electrical LC oscillators, it can be generalized to any system of coupled parametric oscillators. We simulate the dynamics of the coupled oscillator system and demonstrate that the performance of the solver on a set of benchmark problems is comparable to the best-known results obtained by digital algorithms in the literature. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Showing 201–250 of 648 results for author: Xiao, T