Search | arXiv e-print repository

arXiv:2210.15149 [pdf]

Fully Automated Deep Learning-enabled Detection for Hepatic Steatosis on Computed Tomography: A Multicenter International Validation Study

Authors: Zhongyi Zhang, Guixia Li, Ziqiang Wang, Feng Xia, Ning Zhao, Huibin Nie, Zezhong Ye, Joshua Lin, Yiyi Hui, Xiangchun Liu

Abstract: Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely… ▽ More Despite high global prevalence of hepatic steatosis, no automated diagnostics demonstrated generalizability in detecting steatosis on multiple international datasets. Traditionally, hepatic steatosis detection relies on clinicians selecting the region of interest (ROI) on computed tomography (CT) to measure liver attenuation. ROI selection demands time and expertise, and therefore is not routinely performed in populations. To automate the process, we validated an existing artificial intelligence (AI) system for 3D liver segmentation and used it to purpose a novel method: AI-ROI, which could automatically select the ROI for attenuation measurements. AI segmentation and AI-ROI method were evaluated on 1,014 non-contrast enhanced chest CT images from eight international datasets: LIDC-IDRI, NSCLC-Lung1, RIDER, VESSEL12, RICORD-1A, RICORD-1B, COVID-19-Italy, and COVID-19-China. AI segmentation achieved a mean dice coefficient of 0.957. Attenuations measured by AI-ROI showed no significant differences (p = 0.545) and a reduction of 71% time compared to expert measurements. The area under the curve (AUC) of the steatosis classification of AI-ROI is 0.921 (95% CI: 0.883 - 0.959). If performed as a routine screening method, our AI protocol could potentially allow early non-invasive, non-pharmacological preventative interventions for hepatic steatosis. 1,014 expert-annotated liver segmentations of patients with hepatic steatosis annotations can be downloaded here: https://drive.google.com/drive/folders/1-g_zJeAaZXYXGqL1OeF6pUjr6KB0igJX. △ Less

Submitted 6 November, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.10865 [pdf, other]

Robotic Table Wi** via Reinforcement Learning and Whole-body Trajectory Optimization

Authors: Thomas Lew, Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, Jie Tan, Montserrat Gonzalez

Abstract: We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wi** actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in… ▽ More We propose a framework to enable multipurpose assistive mobile robots to autonomously wipe tables to clean spills and crumbs. This problem is challenging, as it requires planning wi** actions while reasoning over uncertain latent dynamics of crumbs and spills captured via high-dimensional visual observations. Simultaneously, we must guarantee constraints satisfaction to enable safe deployment in unstructured cluttered environments. To tackle this problem, we first propose a stochastic differential equation to model crumbs and spill dynamics and absorption with a robot wiper. Using this model, we train a vision-based policy for planning wi** actions in simulation using reinforcement learning (RL). To enable zero-shot sim-to-real deployment, we dovetail the RL policy with a whole-body trajectory optimization framework to compute base and arm joint trajectories that execute the desired wi** motions while guaranteeing constraints satisfaction. We extensively validate our approach in simulation and on hardware. Video: https://youtu.be/inORKP4F3EI △ Less

Submitted 19 October, 2022; originally announced October 2022.

arXiv:2210.07372 [pdf, other]

SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds

Authors: Pei Sun, Mingxing Tan, Weiyue Wang, Chenxi Liu, Fei Xia, Zhaoqi Leng, Dragomir Anguelov

Abstract: 3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point… ▽ More 3D object detection in point clouds is a core component for modern robotics and autonomous driving systems. A key challenge in 3D object detection comes from the inherent sparse nature of point occupancy within the 3D scene. In this paper, we propose Sparse Window Transformer (SWFormer ), a scalable and accurate model for 3D object detection, which can take full advantage of the sparsity of point clouds. Built upon the idea of window-based Transformers, SWFormer converts 3D points into sparse voxels and windows, and then processes these variable-length sparse windows efficiently using a bucketing scheme. In addition to self-attention within each spatial window, our SWFormer also captures cross-window correlation with multi-scale feature fusion and window shifting operations. To further address the unique challenge of detecting 3D objects accurately from sparse features, we propose a new voxel diffusion technique. Experimental results on the Waymo Open Dataset show our SWFormer achieves state-of-the-art 73.36 L2 mAPH on vehicle and pedestrian for 3D object detection on the official test set, outperforming all previous single-stage and two-stage models, while being much more efficient. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Journal ref: ECCV 2022

arXiv:2210.06210 [pdf, other]

Pruning Pre-trained Language Models Without Fine-Tuning

Authors: Ting Jiang, Deqing Wang, Fuzhen Zhuang, Ruobing Xie, Feng Xia

Abstract: To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while f… ▽ More To overcome the overparameterized problem in Pre-trained Language Models (PLMs), pruning is widely used as a simple and straightforward compression method by directly removing unimportant weights. Previous first-order methods successfully compress PLMs to extremely high sparsity with little performance drop. These methods, such as movement pruning, use first-order information to prune PLMs while fine-tuning the remaining weights. In this work, we argue fine-tuning is redundant for first-order pruning, since first-order pruning is sufficient to converge PLMs to downstream tasks without fine-tuning. Under this motivation, we propose Static Model Pruning (SMP), which only uses first-order pruning to adapt PLMs to downstream tasks while achieving the target sparsity level. In addition, we also design a new masking function and training objective to further improve SMP. Extensive experiments at various sparsity levels show SMP has significant improvements over first-order and zero-order methods. Unlike previous first-order methods, SMP is also applicable to low sparsity and outperforms zero-order methods. Meanwhile, SMP is more parameter efficient than other methods due to it does not require fine-tuning. △ Less

Submitted 16 May, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

Comments: Accepted to ACL 2023; Code and models are available at https://github.com/kongds/SMP

arXiv:2210.04616 [pdf, other]

Hilbert expansion of the Boltzmann equation in the incompressible Euler level in a channel

Authors: Feimin Huang, Weiqiang Wang, Yong Wang, Feng Xiao

Abstract: The study of hydrodynamic limit of the Boltzmann equation with physical boundary is a challenging problem due to appearance of the viscous and Knudsen boundary layers. In this paper, the hydrodynamic limit from the Boltzmann equation with specular reflection boundary condition to the incompressible Euler in a channel is investigated. Based on the multiscaled Hilbert expansion, the equations with b… ▽ More The study of hydrodynamic limit of the Boltzmann equation with physical boundary is a challenging problem due to appearance of the viscous and Knudsen boundary layers. In this paper, the hydrodynamic limit from the Boltzmann equation with specular reflection boundary condition to the incompressible Euler in a channel is investigated. Based on the multiscaled Hilbert expansion, the equations with boundary conditions and compatibility conditions for interior solutions, viscous and Knudsen boundary layers are derived under different scaling, respectively. Then some uniform estimates for the interior solutions, viscous and Knudsen boundary layers are established. With the help of $L^2-L^\infty$ framework and the uniform estimates obtained above, the solutions to the Boltzmann equation are constructed by the truncated Hilbert expansion with multiscales, and hence the hydrodynamic limit in the incompressible Euler level is justified. △ Less

Submitted 8 September, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

Comments: 53 pages.Accepted for publication in SCIENCE CHINA Mathematics

arXiv:2210.00515 [pdf, other]

Deep-OCTA: Ensemble Deep Learning Approaches for Diabetic Retinopathy Analysis on OCTA Images

Authors: Junlin Hou, Fan Xiao, Jilan Xu, Yuejie Zhang, Haidong Zou, Rui Feng

Abstract: The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segment… ▽ More The ultra-wide optical coherence tomography angiography (OCTA) has become an important imaging modality in diabetic retinopathy (DR) diagnosis. However, there are few researches focusing on automatic DR analysis using ultra-wide OCTA. In this paper, we present novel and practical deep-learning solutions based on ultra-wide OCTA for the Diabetic Retinopathy Analysis Challenge (DRAC). In the segmentation of DR lesions task, we utilize UNet and UNet++ to segment three lesions with strong data augmentation and model ensemble. In the image quality assessment task, we create an ensemble of InceptionV3, SE-ResNeXt, and Vision Transformer models. Pre-training on the large dataset as well as the hybrid MixUp and CutMix strategy are both adopted to boost the generalization ability of our model. In the DR grading task, we build a Vision Transformer (ViT) and fnd that the ViT model pre-trained on color fundus images serves as a useful substrate for OCTA images. Our proposed methods ranked 4th, 3rd, and 5th on the three leaderboards of DRAC, respectively. The source code will be made available at https://github.com/FDU-VTS/DRAC. △ Less

Submitted 2 October, 2022; originally announced October 2022.

arXiv:2209.13880 [pdf, ps, other]

A machine learning based column-and-row generation approach for integrated air cargo recovery problem

Authors: Lei Huang, Fan Xiao, Zhe Liang

Abstract: Freighter airlines need to recover both aircraft and cargo schedules when disruptions happen. This process is usually divided into three sequential decisions to recovery flights, aircraft, and cargoes. This study focuses on the integrated recovery problem that makes aircraft and cargo recovery decisions simultaneously. We formulate two integrated models based on the flight connection network, one… ▽ More Freighter airlines need to recover both aircraft and cargo schedules when disruptions happen. This process is usually divided into three sequential decisions to recovery flights, aircraft, and cargoes. This study focuses on the integrated recovery problem that makes aircraft and cargo recovery decisions simultaneously. We formulate two integrated models based on the flight connection network, one is the arc-based model, and the other is the string-based model. The arc-based model makes the flight delay decisions by duplicating flight copies, and is solved directly by commercial solvers such as Cplex. The string-based model makes the flight delay decisions in the variable generation process. The main difficulty of the string-based model is that the number of constraints grows with the newly generated flight delay decisions. Therefore, the traditional column generation method can not be applied directly. To tackle this challenge, we propose a machine learning based column-and-row generation approach. The machine learning method is used to uncover the critical delay decisions of short through connections in each column-and-row generation iteration by eliminating the poor flight delay decisions. We also propose a set of valid inequality constraints which can greatly improve the objective of LP relaxation solution and reduce the integral gap. The effectiveness and efficiency of our model is tested by simulated scenarios based on real operational data from the largest Chinese freighter airlines. The computational results show that a significant cost reduction can be achieved with the proposed string-based model in reasonable time. △ Less

Submitted 28 September, 2022; originally announced September 2022.

arXiv:2209.10780 [pdf, other]

Learning Model Predictive Controllers with Real-Time Attention for Real-World Navigation

Authors: Xuesu Xiao, Tingnan Zhang, Krzysztof Choromanski, Edward Lee, Anthony Francis, Jake Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, Sven Mikael Persson, Dmitry Kalashnikov, Leila Takayama, Roy Frostig, Jie Tan, Carolina Parada, Vikas Sindhwani

Abstract: Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach… ▽ More Despite decades of research, existing navigation systems still face real-world challenges when deployed in the wild, e.g., in cluttered home environments or in human-occupied public spaces. To address this, we present a new class of implicit control policies combining the benefits of imitation learning with the robust handling of system constraints from Model Predictive Control (MPC). Our approach, called Performer-MPC, uses a learned cost function parameterized by vision context embeddings provided by Performers -- a low-rank implicit-attention Transformer. We jointly train the cost function and construct the controller relying on it, effectively solving end-to-end the corresponding bi-level optimization problem. We show that the resulting policy improves standard MPC performance by leveraging a few expert demonstrations of the desired navigation behavior in different challenging real-world scenarios. Compared with a standard MPC policy, Performer-MPC achieves >40% better goal reached in cluttered environments and >65% better on social metrics when navigating around humans. △ Less

Submitted 23 September, 2022; v1 submitted 22 September, 2022; originally announced September 2022.

arXiv:2209.09874 [pdf, other]

Open-vocabulary Queryable Scene Representations for Real World Planning

Authors: Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S. Ryoo, Austin Stone, Daniel Kappler

Abstract: Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are limited by the lack of grounding in the surrounding scene. In this paper, we develop NLMap, an open-vocabulary and queryable scene representation to address this problem. NLMap serves as a framework to gather and integrate conte… ▽ More Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are limited by the lack of grounding in the surrounding scene. In this paper, we develop NLMap, an open-vocabulary and queryable scene representation to address this problem. NLMap serves as a framework to gather and integrate contextual information into LLM planners, allowing them to see and query available objects in the scene before generating a context-conditioned plan. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. NLMap allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods. Project website: https://nlmap-saycan.github.io △ Less

Submitted 15 October, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

Comments: v2, added references to concurrent work and acknowledgments

arXiv:2209.09000 [pdf, other]

Reweighting Clicks with Dwell Time in Recommendation

Authors: Ruobing Xie, Lin Ma, Shaoliang Zhang, Feng Xia, Leyu Lin

Abstract: The click behavior is the most widely-used user positive feedback in recommendation. However, simply considering each click equally in training may suffer from clickbaits and title-content mismatching, and thus fail to precisely capture users' real satisfaction on items. Dwell time could be viewed as a high-quality quantitative indicator of user preferences on each click, while existing recommenda… ▽ More The click behavior is the most widely-used user positive feedback in recommendation. However, simply considering each click equally in training may suffer from clickbaits and title-content mismatching, and thus fail to precisely capture users' real satisfaction on items. Dwell time could be viewed as a high-quality quantitative indicator of user preferences on each click, while existing recommendation models do not fully explore the modeling of dwell time. In this work, we focus on reweighting clicks with dwell time in recommendation. Precisely, we first define a new behavior named valid read, which helps to select high-quality click instances for different users and items via dwell time. Next, we propose a normalized dwell time function to reweight click signals in training for recommendation. The Click reweighting model achieves significant improvements on both offline and online evaluations in real-world systems. △ Less

Submitted 27 February, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

Comments: 5 pages, accepted by WWW-2023 Companion

Journal ref: WWW-2023 Companion

arXiv:2209.08774 [pdf, other]

Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance

Authors: Dichucheng Li, Yulun Wu, Qinyu Li, Jiahao Zhao, Yi Yu, Fan Xia, Wei Li

Abstract: The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we… ▽ More The Guzheng is a kind of traditional Chinese instruments with diverse playing techniques. Instrument playing techniques (IPT) play an important role in musical performance. However, most of the existing works for IPT detection show low efficiency for variable-length audio and provide no assurance in the generalization as they rely on a single sound bank for training and testing. In this study, we propose an end-to-end Guzheng playing technique detection system using Fully Convolutional Networks that can be applied to variable-length audio. Because each Guzheng playing technique is applied to a note, a dedicated onset detector is trained to divide an audio into several notes and its predictions are fused with frame-wise IPT predictions. During fusion, we add the IPT predictions frame by frame inside each note and get the IPT with the highest probability within each note as the final output of that note. We create a new dataset named GZ_IsoTech from multiple sound banks and real-world recordings for Guzheng performance analysis. Our approach achieves 87.97% in frame-level accuracy and 80.76% in note-level F1-score, outperforming existing works by a large margin, which indicates the effectiveness of our proposed method in IPT detection. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: Accepted to ISMIR 2022

arXiv:2209.07753 [pdf, other]

Code as Policies: Language Model Programs for Embodied Control

Authors: Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng

Abstract: Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) a… ▽ More Large language models (LLMs) trained on code completion have been shown to be capable of synthesizing simple Python programs from docstrings [1]. We find that these code-writing LLMs can be re-purposed to write robot policy code, given natural language commands. Specifically, policy code can express functions or feedback loops that process perception outputs (e.g.,from object detectors [2], [3]) and parameterize control primitive APIs. When provided as input several example language commands (formatted as comments) followed by corresponding policy code (via few-shot prompting), LLMs can take in new commands and autonomously re-compose API calls to generate new policy code respectively. By chaining classic logic structures and referencing third-party libraries (e.g., NumPy, Shapely) to perform arithmetic, LLMs used in this way can write robot policies that (i) exhibit spatial-geometric reasoning, (ii) generalize to new instructions, and (iii) prescribe precise values (e.g., velocities) to ambiguous descriptions ("faster") depending on context (i.e., behavioral commonsense). This paper presents code as policies: a robot-centric formulation of language model generated programs (LMPs) that can represent reactive policies (e.g., impedance controllers), as well as waypoint-based policies (vision-based pick and place, trajectory-based control), demonstrated across multiple real robot platforms. Central to our approach is prompting hierarchical code-gen (recursively defining undefined functions), which can write more complex code and also improves state-of-the-art to solve 39.8% of problems on the HumanEval [1] benchmark. Code and videos are available at https://code-as-policies.github.io △ Less

Submitted 24 May, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2208.12613 [pdf, other]

Image augmentation improves few-shot classification performance in plant disease recognition

Authors: Frank Xiao

Abstract: With the world population projected to near 10 billion by 2050, minimizing crop damage and guaranteeing food security has never been more important. Machine learning has been proposed as a solution to quickly and efficiently identify diseases in crops. Convolutional Neural Networks typically require large datasets of annotated data which are not available on demand. Collecting this data is a long… ▽ More With the world population projected to near 10 billion by 2050, minimizing crop damage and guaranteeing food security has never been more important. Machine learning has been proposed as a solution to quickly and efficiently identify diseases in crops. Convolutional Neural Networks typically require large datasets of annotated data which are not available on demand. Collecting this data is a long and arduous process which involves manually picking, imaging, and annotating each individual leaf. I tackle the problem of plant image data scarcity by exploring the efficacy of various data augmentation techniques when used in conjunction with transfer learning. I evaluate the impact of various data augmentation techniques both individually and combined on the performance of a ResNet. I propose an augmentation scheme utilizing a sequence of different augmentations which consistently improves accuracy through many trials. Using only 10 total seed images, I demonstrate that my augmentation framework can increase model accuracy by upwards of 25\%. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: 11 pages, 3 figures, 3 tables

arXiv:2208.10029 [pdf, other]

doi 10.3847/2041-8213/ac8afe

Plasma heating and nanoflare caused by slow-mode wave in a coronal loop

Authors: Fanxiaoyu Xia, Tongjiang Wang, Yang Su, Jie Zhao, Qingmin Zhang, Astrid M. Veronig, Weiqun Gan

Abstract: We present a detailed analysis of a reflecting intensity perturbation in a large coronal loop that appeared as sloshing oscillation and lasted for at least one and a half periods. The perturbation is initiated by a microflare at one footpoint of the loop, propagates along the loop and is eventually reflected at the remote footpoint where significant brightenings are observed in all the AIA extreme… ▽ More We present a detailed analysis of a reflecting intensity perturbation in a large coronal loop that appeared as sloshing oscillation and lasted for at least one and a half periods. The perturbation is initiated by a microflare at one footpoint of the loop, propagates along the loop and is eventually reflected at the remote footpoint where significant brightenings are observed in all the AIA extreme-ultraviolet (EUV) channels. This unique observation provides us with the opportunity to better understand not only the thermal properties and dam** mechanisms of the sloshing oscillation, but also the energy transfer at the remote footpoint. Based on differential emission measures (DEM) analysis and the technique of coronal seismology, we find that 1) the calculated local sound speed is consistent with the observed propagation speed of the perturbation during the oscillation, which is suggestive of a slow magnetoacoustic wave; 2) thermal conduction is the major dam** mechanism of the wave but additional dam** mechanism such as anomalous enhancement of compressive viscosity or wave leakage is also required to account for the rapid decay of the observed waves; 3) the wave produced a nanoflare at the remote footpoint, with a peak thermal energy of $\thicksim10^{24}-10^{25}$ erg. This work provides a consistent picture of the magnetoacoustic wave propagation and reflection in a coronal loop, and reports the first solid evidence of a wave-induced nanoflare. The results reveal new clues for further simulation studies and may help solving the coronal heating problem. △ Less

Submitted 21 August, 2022; originally announced August 2022.

Comments: 13pages, 5 figures; Accepted by ApJL

arXiv:2208.06958 [pdf, other]

doi 10.1021/acs.nanolett.2c00401

Tunable Strong Magnetic Anisotropy in Two-Dimensional van der Waals Antiferromagnets

Authors: Qingjun Tong

Abstract: We show that anisotropic energy of a 2D antiferromagnet is greatly enhanced via stacking on a magnetic substrate layer, arising from the sublattice-dependent interlayer magnetic interaction that defines an effective anisotropic energy. Interestingly, this effective energy couples strongly with the interlayer stacking order and the magnetic order of the substrate layer, providing unique mechanical… ▽ More We show that anisotropic energy of a 2D antiferromagnet is greatly enhanced via stacking on a magnetic substrate layer, arising from the sublattice-dependent interlayer magnetic interaction that defines an effective anisotropic energy. Interestingly, this effective energy couples strongly with the interlayer stacking order and the magnetic order of the substrate layer, providing unique mechanical and magnetic means to control the antiferromagnetic order. These two types of control methods affect distinctly the sublattice magnetization dynamics, with a change of the ratio of sublattice precession amplitudes in the former and its chirality in the later. In moiré superlattices formed by a relative twist or strain between the layers, the coupling with stacking order introduces a landscape of effective anisotropic energy across the moiré, which can be utilized to create nonuniform antiferromagnetic textures featuring periodically localized low-energy magnons. △ Less

Submitted 14 August, 2022; originally announced August 2022.

Comments: 8 pages, 4 figures

Journal ref: Nano Lett. 22, 3946 (2022)

arXiv:2207.14521 [pdf, other]

Self-organized Polygon Formation Control based on Distributed Estimation

Authors: Qingkai Yang, Fan Xiao, **gshuo Lyu, Bo Zhou, Hao Fang

Abstract: This paper studies the problem of controlling a multi-robot system to achieve a polygon formation in a self-organized manner. Different from the typical formation control strategies where robots are steered to satisfy the predefined control variables, such as pairwise distances, relative positions and bearings, the foremost idea of this paper is to achieve polygon formations by injecting control i… ▽ More This paper studies the problem of controlling a multi-robot system to achieve a polygon formation in a self-organized manner. Different from the typical formation control strategies where robots are steered to satisfy the predefined control variables, such as pairwise distances, relative positions and bearings, the foremost idea of this paper is to achieve polygon formations by injecting control inputs randomly to a few robots (say, vertex robots) of the group, and the rest follow the simple principles of moving towards the midpoint of their two nearest neighbors in the ring graph without any external inputs. In our problem, a fleet of robots is initially distributed in the plane. The socalled vertex robots take the responsibility of determining the geometric shape of the entire formation and its overall size, while the others move so as to minimize the differences with two direct neighbors. In the first step, each vertex robot estimates the number of robots in its associated chain. Two types of control inputs that serve for the estimation are designed using the measurements from the latest and the last two time instants respectively. In the second step, the self-organized formation control law is proposed where only vertex robots receive external information. Comparisons between the two estimation strategies are carried out in terms of the convergence speed and robustness. The effectiveness of the whole control framework is further validated in both simulation and physical experiments. △ Less

Submitted 2 April, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.13073 [pdf]

Three-dimensional operando optical imaging of single particle and electrolyte heterogeneities inside Li-ion batteries

Authors: Raj Pandya, Lorenzo Valzania, Florian Dorchies, Fei Xia, Jeffrey Mc Hugh, Angus Mathieson, Jien Hwee Tan, Thomas G. Parton, Michael De Volder, Jean-Marie Tarascon, Sylvain Gigan, Hilton B. de Aguiar, Alexis Grimaud

Abstract: Understanding (de)lithiation heterogeneities in battery materials is key to ensuring optimal electrochemical performance and develo** better energy storage devices. However, this remains challenging due to the complex three dimensional morphology of microscopic electrode particles, the involvement of both solid and liquid phase reactants, and range of relevant timescales (seconds to hours). Here… ▽ More Understanding (de)lithiation heterogeneities in battery materials is key to ensuring optimal electrochemical performance and develo** better energy storage devices. However, this remains challenging due to the complex three dimensional morphology of microscopic electrode particles, the involvement of both solid and liquid phase reactants, and range of relevant timescales (seconds to hours). Here, we overcome this problem and demonstrate the use of bench-top laser scanning confocal microscopy for simultaneous three-dimensional operando measurement of lithium ion dynamics in single particles, and the electrolyte, in batteries. We examine two technologically important cathode materials that are known to suffer from intercalation heterogeneities: LixCoO2 and LixNi0.8Mn0.1Co0.1O2. The single-particle surface-to-core transport velocity of Li-phase fronts, and volume changes - as well as their inter-particle heterogeneity - are captured as a function of C-rate, and benchmarked to previous ensemble measurements. Additionally, we visualise heterogeneities in the bulk and at the surface of particles during cycling, and image the formation of spatially non-uniform concentration gradients within the liquid electrolyte. Importantly, the conditions under which optical imaging can be performed inside absorbing and multiply scattering materials such as battery intercalation compounds are outlined. △ Less

Submitted 27 June, 2022; originally announced July 2022.

Comments: 29 pages, 6 figures

arXiv:2207.09920 [pdf, ps, other]

doi 10.1145/3534678.3539198

DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation

Authors: Kailiang Zhong, Fengtong Xiao, Yan Ren, Yaorong Liang, Wenqing Yao, Xiaofeng Yang, Ling Cen

Abstract: Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in prac… ▽ More Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in practice, i.e. divergent distribution between treated and control groups due to treatment bias, and significant sample imbalance of their population sizes. This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective. DESCN captures the integrated information of the treatment propensity, the response, and the hidden treatment effect through a cross network in a multi-task learning manner. Our method jointly learns the treatment and response functions in the entire sample space to avoid treatment bias and employs an intermediate pseudo treatment effect prediction network to relieve sample imbalance. Extensive experiments are conducted on a synthetic dataset and a large-scaled production dataset from the E-commerce voucher distribution business. The results indicate that DESCN can successfully enhance the accuracy of ITE estimation and improve the uplift ranking performance. A sample of the production dataset and the source code are released to facilitate future research in the community, which is, to the best of our knowledge, the first large-scale public biased treatment dataset for causal inference. △ Less

Submitted 19 October, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: Accepted by SIGKDD 2022 Applied Data Science Track

ACM Class: I.2.m

Journal ref: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '22), August 14-18, 2022, Washington, DC, USA

arXiv:2207.06333 [pdf, other]

6D Camera Relocalization in Visually Ambiguous Extreme Environments

Authors: Yang Zheng, Tolga Birdal, Fei Xia, Yanchao Yang, Yueqi Duan, Leonidas J. Guibas

Abstract: We propose a novel method to reliably estimate the pose of a camera given a sequence of images acquired in extreme environments such as deep seas or extraterrestrial terrains. Data acquired under these challenging conditions are corrupted by textureless surfaces, image degradation, and presence of repetitive and highly ambiguous structures. When naively deployed, the state-of-the-art methods can f… ▽ More We propose a novel method to reliably estimate the pose of a camera given a sequence of images acquired in extreme environments such as deep seas or extraterrestrial terrains. Data acquired under these challenging conditions are corrupted by textureless surfaces, image degradation, and presence of repetitive and highly ambiguous structures. When naively deployed, the state-of-the-art methods can fail in those scenarios as confirmed by our empirical analysis. In this paper, we attempt to make camera relocalization work in these extreme situations. To this end, we propose: (i) a hierarchical localization system, where we leverage temporal information and (ii) a novel environment-aware image enhancement method to boost the robustness and accuracy. Our extensive experimental results demonstrate superior performance in favor of our method under two extreme settings: localizing an autonomous underwater vehicle and localizing a planetary rover in a Mars-like desert. In addition, our method achieves comparable performance with state-of-the-art methods on the indoor benchmark (7-Scenes dataset) using only 20% training data. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.06030 [pdf, other]

Cost-Effective Online Contextual Model Selection

Authors: Xuefeng Liu, Fangfang Xia, Rick L. Stevens, Yuxin Chen

Abstract: How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an exce… ▽ More How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an excessive amount of labels. In particular, we focus on the task of selecting pre-trained classifiers, and propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection. In comparison to prior art, our algorithm does not assume a globally optimal model. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Our experiments on several benchmark classification datasets demonstrate the algorithm's effectiveness in terms of both regret and query complexity. Notably, to achieve the same accuracy, CAMS incurs less than 10% of the label cost when compared to the best online model selection baselines on CIFAR10. △ Less

Submitted 17 February, 2023; v1 submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.05608 [pdf, other]

Inner Monologue: Embodied Reasoning through Planning with Language Models

Authors: Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Abstract: Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to… ▽ More Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: Project website: https://innermonologue.github.io

arXiv:2207.02409 [pdf]

Sub-monolayer Biolasers: Lower Gain, Higher Sensitivity

Authors: C. Gong, X. Yang, S. J. Tang, Q. Q. Zhang, Y. Wang, Y. L. Liu, Y. C. Chen, G. D. Peng, X. Fan, Y. F. Xiao, Y. J. Rao, Y. Gong

Abstract: Biomarker detection is the key to identifying health risks. However, designing sensitive biosensors in a single-use mode for disease diagnosis remains a major challenge. Here, we report sub-monolayer biolasers with remarkable repeatability for ultrasensitive and disposable biomarker detection. The biolaser sensors are designed by employing the telecom optical fibers as distributed optical microcav… ▽ More Biomarker detection is the key to identifying health risks. However, designing sensitive biosensors in a single-use mode for disease diagnosis remains a major challenge. Here, we report sub-monolayer biolasers with remarkable repeatability for ultrasensitive and disposable biomarker detection. The biolaser sensors are designed by employing the telecom optical fibers as distributed optical microcavities and pushing the gain molecules down to the sub-monolayer level. We observe a status transition from the monolayer biolaser to the sub-monolayer biolaser by tuning the specific conjugation. By reducing the fluorophores down to the threshold density (~ 3.2 x 10-13 mol/cm2), we demonstrate an ultimate sensitivity of sub-monolayer biolaser with six orders of magnitude enhancement compared with the monolayer biolasers. We further achieved ultrasensitive immunoassay for Parkinson's disease biomarker, alpha-synuclein, with a lower limit of detection of 0.32 pM in serum. This biosensor with massive fabrication capability at ultralow cost provides a general method for the ultrasensitive disposable biodetection of disease biomarkers. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: 27 pages, 15 figures

MSC Class: 78A70

arXiv:2206.13688 [pdf]

doi 10.1088/1361-6560/ac9950

A clinically relevant online patient QA solution with daily CT scans and EPID-based in vivo dosimetry: A feasible study on rectal cancer

Authors: Liyuan Chen, Zhiyuan Zhang, Lei Yu, Jiyou Peng, Bin Feng, Jun Zhao, Yanfang Liu, Fan Xia, Zhen Zhang, Weigang Hu, Jiazhou Wang

Abstract: Adaptive radiation therapy (ART) could protect organs at risk (OARs) while maintain high dose coverage to targets. However, there still lack efficient online patient QA methods. We aim to develop a clinically relevant online patient quality assurance (QA) solution for ART using daily CT scans and electronic portal imaging device (EPID)-based in vivo dosimetry. Ten patients with rectal cancer at ou… ▽ More Adaptive radiation therapy (ART) could protect organs at risk (OARs) while maintain high dose coverage to targets. However, there still lack efficient online patient QA methods. We aim to develop a clinically relevant online patient quality assurance (QA) solution for ART using daily CT scans and electronic portal imaging device (EPID)-based in vivo dosimetry. Ten patients with rectal cancer at our center were included. Patients' daily CT scans and portal images were collected to generate reconstructed 3D dose distributions. Contours of targets and OARs were recontoured on these daily CT scans by a clinician or an auto-segmentation algorithm, then dose-volume indices were calculated, and the percent deviation of these indices to their original plans were determined. This deviation was regarded as the metric for clinically relevant patient QA. The tolerance level was obtained using a 95% interval of the QA metric distribution. These deviations could be further divided into anatomically relevant or delivery relevant indicators for error source analysis. Finally, our QA solution was validated on an additional six clinical patients. In rectal cancer, the lower and upper tolerance of the QA metric for PTV ΔD95 (%) were [-3.11%, 2.35%], and for PTV ΔD2 (%) were [-0.78%, 3.23%]. In validation, the 68% for PTV ΔD95 (%) and the 79% for PTV ΔD2 ({%)of the 28 fractions are within tolerances of the QA metrics. By using four or more out-of-tolerance QA metrics as an action level, there were 5 fractions (18%) have four or more out-of-tolerance QA metrics in validation patient dataset. The online patient QA solution using daily CT scans and EPID-based in vivo dosimetry is clinically feasible. Source of error analysis has the potential for distinguishing sources of error and guiding ART for future treatments. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.13090 [pdf, ps, other]

Variance Reduced Random Relaxed Projection Method for Constrained Finite-sum Minimization Problems

Authors: Zhichun Yang, Fu-quan Xia, Kai Tu, Man-Chung Yue

Abstract: For many applications in signal processing and machine learning, we are tasked with minimizing a large sum of convex functions subject to a large number of convex constraints. In this paper, we devise a new random projection method (RPM) to efficiently solve this problem. Compared with existing RPMs, our proposed algorithm features two useful algorithmic ideas. First, at each iteration, instead of… ▽ More For many applications in signal processing and machine learning, we are tasked with minimizing a large sum of convex functions subject to a large number of convex constraints. In this paper, we devise a new random projection method (RPM) to efficiently solve this problem. Compared with existing RPMs, our proposed algorithm features two useful algorithmic ideas. First, at each iteration, instead of projecting onto the subset defined by one of the constraints, our algorithm only requires projecting onto a half-space approximation of the subset, which significantly reduces the computational cost as it admits a closed-form formula. Second, to exploit the structure that the objective is a sum, variance reduction is incorporated into our algorithm to further improve the performance. As theoretical contributions, under a novel error bound condition and other standard assumptions, we prove that the proposed RPM converges to an optimal solution and that both optimality and feasibility gaps vanish at a sublinear rate. In particular, via a new analysis framework, we show that our RPM attains a faster convergence rate in optimality gap than existing RPMs when the objective function has a Lipschitz continuous gradient, capitalizing the benefit of the variance reduction. We also provide sufficient conditions for the error bound condition to hold. Experiments on a beamforming problem and a robust classification problem are also presented to demonstrate the superiority of our RPM over existing ones. △ Less

Submitted 5 April, 2024; v1 submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.06489 [pdf, other]

BEHAVIOR in Habitat 2.0: Simulator-Independent Logical Task Description for Benchmarking Embodied AI Agents

Authors: Ziang Liu, Roberto Martín-Martín, Fei Xia, Jiajun Wu, Li Fei-Fei

Abstract: Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks. Inspired by the catalyzing effect that benchmarks have played in the AI fields such as computer vision and natural language processing, the community is looking for new benchmar… ▽ More Robots excel in performing repetitive and precision-sensitive tasks in controlled environments such as warehouses and factories, but have not been yet extended to embodied AI agents providing assistance in household tasks. Inspired by the catalyzing effect that benchmarks have played in the AI fields such as computer vision and natural language processing, the community is looking for new benchmarks for embodied AI. Prior work in embodied AI benchmark defines tasks using a different formalism, often specific to one environment, simulator or domain, making it hard to develop general and comparable solutions. In this work, we bring a subset of BEHAVIOR activities into Habitat 2.0 to benefit from its fast simulation speed, as a first step towards demonstrating the ease of adapting activities defined in the logic space into different simulators. △ Less

Submitted 13 June, 2022; originally announced June 2022.

arXiv:2206.04615 [pdf, other]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting. △ Less

Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

arXiv:2206.00487 [pdf, other]

Physics-based neural network for non-invasive control of coherent light in scattering media

Authors: Alexandra d'Arco, Fei Xia, Antoine Boniface, Jonathan Dong, Sylvain Gigan

Abstract: Optical imaging through complex media, such as biological tissues or fog, is challenging due to light scattering. In the multiple scattering regime, wavefront sha** provides an effective method to retrieve information; it relies on measuring how the propagation of different optical wavefronts are impacted by scattering. Based on this principle, several wavefront sha** techniques were successfu… ▽ More Optical imaging through complex media, such as biological tissues or fog, is challenging due to light scattering. In the multiple scattering regime, wavefront sha** provides an effective method to retrieve information; it relies on measuring how the propagation of different optical wavefronts are impacted by scattering. Based on this principle, several wavefront sha** techniques were successfully developed, but most of them are highly invasive and limited to proof-of-principle experiments. Here, we propose to use a neural network approach to non-invasively characterize and control light scattering inside the medium and also to retrieve information of hidden objects buried within it. Unlike most of the recently-proposed approaches, the architecture of our neural network with its layers, connected nodes and activation functions has a true physical meaning as it mimics the propagation of light in our optical system. It is trained with an experimentally-measured input/output dataset built from a series of incident light patterns and corresponding camera snapshots. We apply our physics-based neural network to a fluorescence microscope in epi-configuration and demonstrate its performance through numerical simulations and experiments. This flexible method can include physical priors and we show that it can be applied to other systems as, for example, non-linear or coherent contrast mechanisms. △ Less

Submitted 1 June, 2022; originally announced June 2022.

Comments: 15 pages, 11 figures

arXiv:2205.11710 [pdf, other]

SCVRL: Shuffled Contrastive Video Representation Learning

Authors: Michael Dorkenwald, Fanyi Xiao, Biagio Brattoli, Joseph Tighe, Davide Modolo

Abstract: We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos. Differently from previous contrast learning based methods that mostly focus on learning visual semantics (e.g., CVRL), SCVRL is capable of learning both semantic and motion patterns. For that, we reformulate the popular shuffling pretext task within a modern contrastive learning paradigm. We show that ou… ▽ More We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos. Differently from previous contrast learning based methods that mostly focus on learning visual semantics (e.g., CVRL), SCVRL is capable of learning both semantic and motion patterns. For that, we reformulate the popular shuffling pretext task within a modern contrastive learning paradigm. We show that our transformer-based network has a natural capacity to learn motion in self-supervised settings and achieves strong performance, outperforming CVRL on four benchmarks. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: CVPR 2022 - L3DIVU workshop

arXiv:2205.11274 [pdf, other]

Single-cell gene regulatory network analysis for mixed cell populations with applications to COVID-19 single cell data

Authors: Junjie Tang, Changhu Wang, Feiyi Xiao, Ruibin Xi

Abstract: Gene regulatory network (GRN) refers to the complex network formed by regulatory interactions between genes in living cells. In this paper, we consider inferring GRNs in single cells based on single cell RNA sequencing (scRNA-seq) data. In scRNA-seq, single cells are often profiled from mixed populations and their cell identities are unknown. A common practice for single cell GRN analysis is to fi… ▽ More Gene regulatory network (GRN) refers to the complex network formed by regulatory interactions between genes in living cells. In this paper, we consider inferring GRNs in single cells based on single cell RNA sequencing (scRNA-seq) data. In scRNA-seq, single cells are often profiled from mixed populations and their cell identities are unknown. A common practice for single cell GRN analysis is to first cluster the cells and infer GRNs for every cluster separately. However, this two-step procedure ignores uncertainty in the clustering step and thus could lead to inaccurate estimation of the networks. To address this problem, we propose to model scRNA-seq by the mixture multivariate Poisson log-normal (MPLN) distribution. The precision matrices of the MPLN are the GRNs of different cell types and can be jointly estimated by maximizing MPLN's lasso-penalized log-likelihood. We show that the MPLN model is identifiable and the resulting penalized log-likelihood estimator is consistent. To avoid the intractable optimization of the MPLN's log-likelihood, we develop an algorithm called VMPLN based on the variational inference method. Comprehensive simulation and real scRNA-seq data analyses reveal that VMPLN performs better than the state-of-the-art single cell GRN methods. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: 95 pages,28 figures

arXiv:2205.04232 [pdf, other]

doi 10.1093/mnras/stac1305

Arecibo and FAST Timing Follow-up of twelve Millisecond Pulsars Discovered in Commensal Radio Astronomy FAST Survey

Authors: C. C. Miao, W. W. Zhu, D. Li, P. C. C. Freire, J. R. Niu, P. Wang, J. P. Yuan, M. Y. Xue, A. D. Cameron, D. J. Champion, M. Cruces, Y. T. Chen, M. M. Chi, X. F. Cheng, S. J. Dang, M. F. Ding, Y. Feng, Z. Y. Gan, G. Hobbs, M. Kramer, Z. J. Liu, Y. X. Li, Z. K. Luo, X. L. Miao, L. Q. Meng , et al. (24 additional authors not shown)

Abstract: We report the phase-connected timing ephemeris, polarization pulse profiles, Faraday rotation measurements, and Rotating-Vector-Model (RVM) fitting results of twelve millisecond pulsars (MSPs) discovered with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal radio Astronomy FAST survey (CRAFTS). The timing campaigns were carried out with FAST and Arecibo over three… ▽ More We report the phase-connected timing ephemeris, polarization pulse profiles, Faraday rotation measurements, and Rotating-Vector-Model (RVM) fitting results of twelve millisecond pulsars (MSPs) discovered with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) in the Commensal radio Astronomy FAST survey (CRAFTS). The timing campaigns were carried out with FAST and Arecibo over three years. Eleven of the twelve pulsars are in neutron star - white dwarf binary systems, with orbital periods between 2.4 and 100 d. Ten of them have spin periods, companion masses, and orbital eccentricities that are consistent with the theoretical expectations for MSP - Helium white dwarf (He WD) systems. The last binary pulsar (PSR J1912$-$0952) has a significantly smaller spin frequency and a smaller companion mass, the latter could be caused by a low orbital inclination for the system. Its orbital period of 29 days is well within the range of orbital periods where some MSP - He WD systems have shown anomalous eccentricities, however, the eccentricity of PSR J1912$-$0952 is typical of what one finds for the remaining MSP - He WD systems. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: 11 pages, 5 figures, MNRAS accepted

arXiv:2205.00226 [pdf, other]

doi 10.1140/epjc/s10052-022-10338-5

Chaotic dynamics of string around the conformal black hole

Authors: Da-Zhu Ma, Fang Xia, Dan Zhang, Guoyang Fu, Jian-Pin Wu

Abstract: In this paper, we make a systematical and in-depth study on the chaotic dynamics of the string around the conformal black hole. Depending on the characteristic parameter of the conformal black hole and the initial position of the string, there are three kinds of dynamical behaviors: ordered, chaotic and being captured, chaotic but not being captured. A particular interesting observation is that th… ▽ More In this paper, we make a systematical and in-depth study on the chaotic dynamics of the string around the conformal black hole. Depending on the characteristic parameter of the conformal black hole and the initial position of the string, there are three kinds of dynamical behaviors: ordered, chaotic and being captured, chaotic but not being captured. A particular interesting observation is that there is a sharp transition in chaotic dynamics when the black hole horizon disappears, which is indepent of the initial position of the string. It provides a possible way to probe the horizon structure of the massive body. We also examine the generalized MSS (Maldacena, Shenker and Stanford) inequality, which is proposed in holographic dual field theory, and find that the generalized MSS inequality holds even in the asymptotically flat black hole background. Especially, as the initial position of the string approaches the black hole horizon, the Lyapunov exponent also approaches the upper bound of the generalized MSS inequality. △ Less

Submitted 30 April, 2022; originally announced May 2022.

Comments: 20 pages, 7 figure

Journal ref: Eur. Phys. J. C (2022) 82:372

arXiv:2204.10773 [pdf]

doi 10.1016/j.compbiomed.2022.106295

Denoising of Three-Dimensional Fast Spin Echo Magnetic Resonance Images of Knee Joints using Spatial-Variant Noise-Relevant Residual Learning of Convolution Neural Network

Authors: Shutian Zhao, Donal G. Cahill, Siyue Li, Fan Xiao, Thierry Blu, James F Griffith, Weitian Chen

Abstract: Two-dimensional (2D) fast spin echo (FSE) techniques play a central role in the clinical magnetic resonance imaging (MRI) of knee joints. Moreover, three-dimensional (3D) FSE provides high-isotropic-resolution magnetic resonance (MR) images of knee joints, but it has a reduced signal-to-noise ratio compared to 2D FSE. Deep-learning denoising methods are a promising approach for denoising MR images… ▽ More Two-dimensional (2D) fast spin echo (FSE) techniques play a central role in the clinical magnetic resonance imaging (MRI) of knee joints. Moreover, three-dimensional (3D) FSE provides high-isotropic-resolution magnetic resonance (MR) images of knee joints, but it has a reduced signal-to-noise ratio compared to 2D FSE. Deep-learning denoising methods are a promising approach for denoising MR images, but they are often trained using synthetic noise due to challenges in obtaining true noise distributions for MR images. In this study, inherent true noise information from 2-NEX acquisition was used to develop a deep-learning model based on residual learning of convolutional neural network (CNN), and this model was used to suppress the noise in 3D FSE MR images of knee joints. The proposed CNN used two-step residual learning over parallel transporting and residual blocks and was designed to comprehensively learn real noise features from 2-NEX training data. The results of an ablation study validated the network design. The new method achieved improved denoising performance of 3D FSE knee MR images compared with current state-of-the-art methods, based on the peak signal-to-noise ratio and structural similarity index measure. The improved image quality after denoising using the new method was verified by radiological evaluation. A deep CNN using the inherent spatial-varying noise information in 2-NEX acquisitions was developed. This method showed promise for clinical MRI assessments of the knee, and has potential applications for the assessment of other anatomical structures. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: 6 figures, abstract accepted by Joint Annual Meeting ISMRM-ESMRMB & ISMRT 31st Annual Meeting

Journal ref: Computers in Biology and Medicine, Volume 151, Part A, 2022, 106295, ISSN 0010-4825

arXiv:2204.10762 [pdf, other]

doi 10.24963/ijcai.2022/153

Dite-HRNet: Dynamic Lightweight High-Resolution Network for Human Pose Estimation

Authors: Qun Li, Ziyi Zhang, Fu Xiao, Feng Zhang, Bir Bhanu

Abstract: A high-resolution network exhibits remarkable capability in extracting multi-scale features for human pose estimation, but fails to capture long-range interactions between joints and has high computational complexity. To address these problems, we present a Dynamic lightweight High-Resolution Network (Dite-HRNet), which can efficiently extract multi-scale contextual information and model long-rang… ▽ More A high-resolution network exhibits remarkable capability in extracting multi-scale features for human pose estimation, but fails to capture long-range interactions between joints and has high computational complexity. To address these problems, we present a Dynamic lightweight High-Resolution Network (Dite-HRNet), which can efficiently extract multi-scale contextual information and model long-range spatial dependency for human pose estimation. Specifically, we propose two methods, dynamic split convolution and adaptive context modeling, and embed them into two novel lightweight blocks, which are named dynamic multi-scale context block and dynamic global context block. These two blocks, as the basic component units of our Dite-HRNet, are specially designed for the high-resolution networks to make full use of the parallel multi-resolution architecture. Experimental results show that the proposed network achieves superior performance on both COCO and MPII human pose estimation datasets, surpassing the state-of-the-art lightweight networks. Code is available at: https://github.com/ZiyiZhang27/Dite-HRNet. △ Less

Submitted 24 May, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted by IJCAI-ECAI 2022

arXiv:2204.09220 [pdf, other]

LingYi: Medical Conversational Question Answering System based on Multi-modal Knowledge Graphs

Authors: Fei Xia, Bin Li, Yixuan Weng, Shizhu He, Kang Liu, Bin Sun, Shutao Li, Jun Zhao

Abstract: The medical conversational system can relieve the burden of doctors and improve the efficiency of healthcare, especially during the pandemic. This paper presents a medical conversational question answering (CQA) system based on the multi-modal knowledge graph, namely "LingYi", which is designed as a pipeline framework to maintain high flexibility. Our system utilizes automated medical procedures i… ▽ More The medical conversational system can relieve the burden of doctors and improve the efficiency of healthcare, especially during the pandemic. This paper presents a medical conversational question answering (CQA) system based on the multi-modal knowledge graph, namely "LingYi", which is designed as a pipeline framework to maintain high flexibility. Our system utilizes automated medical procedures including medical triage, consultation, image-text drug recommendation and record. To conduct knowledge-grounded dialogues with patients, we first construct a Chinese Medical Multi-Modal Knowledge Graph (CM3KG) and collect a large-scale Chinese Medical CQA (CMCQA) dataset. Compared with the other existing medical question-answering systems, our system adopts several state-of-the-art technologies including medical entity disambiguation and medical dialogue generation, which is more friendly to provide medical services to patients. In addition, we have open-sourced our codes which contain back-end models and front-end web pages at https://github.com/WENGSYX/LingYi. The datasets including CM3KG at https://github.com/WENGSYX/CM3KG and CMCQA at https://github.com/WENGSYX/CMCQA are also released to further promote future research. △ Less

Submitted 20 April, 2022; originally announced April 2022.

Comments: 9 pages, 4 figures, 5 tables

arXiv:2204.04746 [pdf, other]

doi 10.1016/j.media.2023.102803

CholecTriplet2021: A benchmark challenge for surgical action triplet recognition

Authors: Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan, Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang, Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla, Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao , et al. (37 additional authors not shown)

Abstract: Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in… ▽ More Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery. △ Less

Submitted 29 December, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

Comments: CholecTriplet2021 challenge report. Paper accepted at Elsevier journal of Medical Image Analysis. 22 pages, 8 figures, 11 tables. Challenge website: https://cholectriplet2021.grand-challenge.org

Journal ref: Medical Image Analysis 86 (2023) 102803

arXiv:2204.04344 [pdf, other]

Towards Better Chinese-centric Neural Machine Translation for Low-resource Languages

Authors: Bin Li, Yixuan Weng, Fei Xia, Hanjun Deng

Abstract: The last decade has witnessed enormous improvements in science and technology, stimulating the growing demand for economic and cultural exchanges in various countries. Building a neural machine translation (NMT) system has become an urgent trend, especially in the low-resource setting. However, recent work tends to study NMT systems for low-resource languages centered on English, while few works f… ▽ More The last decade has witnessed enormous improvements in science and technology, stimulating the growing demand for economic and cultural exchanges in various countries. Building a neural machine translation (NMT) system has become an urgent trend, especially in the low-resource setting. However, recent work tends to study NMT systems for low-resource languages centered on English, while few works focus on low-resource NMT systems centered on other languages such as Chinese. To achieve this, the low-resource multilingual translation challenge of the 2021 iFLYTEK AI Developer Competition provides the Chinese-centric multilingual low-resource NMT tasks, where participants are required to build NMT systems based on the provided low-resource samples. In this paper, we present the winner competition system that leverages monolingual word embeddings data enhancement, bilingual curriculum learning, and contrastive re-ranking. In addition, a new Incomplete-Trust (In-trust) loss function is proposed to replace the traditional cross-entropy loss when training. The experimental results demonstrate that the implementation of these ideas leads better performance than other state-of-the-art methods. All the experimental codes are released at: https://github.com/WENGSYX/Low-resource-text-translation. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 7pages, 4 figures, 4 tables

arXiv:2204.03101 [pdf, other]

Hierarchical Self-supervised Representation Learning for Movie Understanding

Authors: Fanyi Xiao, Kaustav Kundu, Joseph Tighe, Davide Modolo

Abstract: Most self-supervised video representation learning approaches focus on action recognition. In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level of our hierarchical movie understanding model (based on [37]). Specifically, we propose to pretrain the low-… ▽ More Most self-supervised video representation learning approaches focus on action recognition. In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining strategy that separately pretrains each level of our hierarchical movie understanding model (based on [37]). Specifically, we propose to pretrain the low-level video backbone using a contrastive learning objective, while pretrain the higher-level video contextualizer using an event mask prediction task, which enables the usage of different data sources for pretraining different levels of the hierarchy. We first show that our self-supervised pretraining strategies are effective and lead to improved performance on all tasks and metrics on VidSitu benchmark [37] (e.g., improving on semantic role prediction from 47% to 61% CIDEr scores). We further demonstrate the effectiveness of our contextualized event features on LVU tasks [54], both when used alone and when combined with instance features, showing their complementarity. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: CVPR 2022

arXiv:2204.02676 [pdf, other]

Detecting Outlier Patterns with Query-based Artificially Generated Searching Conditions

Authors: Shuo Yu, Feng Xia, Yuchen Sun, Tao Tang, Xiaoran Yan, Ivan Lee

Abstract: In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, national security, etc. However, sub-graph matching remains a computationally challenging problem, let alone identifying special motifs among them. This… ▽ More In the age of social computing, finding interesting network patterns or motifs is significant and critical for various areas such as decision intelligence, intrusion detection, medical diagnosis, social network analysis, fake news identification, national security, etc. However, sub-graph matching remains a computationally challenging problem, let alone identifying special motifs among them. This is especially the case in large heterogeneous real-world networks. In this work, we propose an efficient solution for discovering and ranking human behavior patterns based on network motifs by exploring a user's query in an intelligent way. Our method takes advantage of the semantics provided by a user's query, which in turn provides the mathematical constraint that is crucial for faster detection. We propose an approach to generate query conditions based on the user's query. In particular, we use meta paths between nodes to define target patterns as well as their similarities, leading to efficient motif discovery and ranking at the same time. The proposed method is examined on a real-world academic network, using different similarity measures between the nodes. The experiment result demonstrates that our method can identify interesting motifs, and is robust to the choice of similarity measures. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.02667 [pdf, other]

Familiarity-based Collaborative Team Recognition in Academic Social Networks

Authors: Shuo Yu, Feng Xia, Chen Zhang, Kathleen Keogh, Honglong Chen

Abstract: Collaborative teamwork is key to major scientific discoveries. However, the prevalence of collaboration among researchers makes team recognition increasingly challenging. Previous studies have demonstrated that people are more likely to collaborate with individuals they are familiar with. In this work, we employ the definition of familiarity and then propose MOTO (faMiliarity-based cOllaborative T… ▽ More Collaborative teamwork is key to major scientific discoveries. However, the prevalence of collaboration among researchers makes team recognition increasingly challenging. Previous studies have demonstrated that people are more likely to collaborate with individuals they are familiar with. In this work, we employ the definition of familiarity and then propose MOTO (faMiliarity-based cOllaborative Team recOgnition algorithm) to recognize collaborative teams. MOTO calculates the shortest distance matrix within the global collaboration network and the local density of each node. Central team members are initially recognized based on local density. Then MOTO recognizes the remaining team members by using the familiarity metric and shortest distance matrix. Extensive experiments have been conducted upon a large-scale data set. The experimental results show that compared with baseline methods, MOTO can recognize the largest number of teams. The teams recognized by MOTO possess more cohesive team structures and lower team communication costs compared with other methods. MOTO utilizes familiarity in team recognition to identify cohesive academic teams. The recognized teams are in line with real-world collaborative teamwork patterns. Based on team recognition using MOTO, the research team structure and performance are further analyzed for given time periods. The number of teams that consist of members from different institutions increases gradually. Such teams are found to perform better in comparison with those whose members are from the same institution. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.02656 [pdf, other]

CHIEF: Clustering with Higher-order Motifs in Big Networks

Authors: Feng Xia, Shuo Yu, Chengfei Liu, Ivan Lee

Abstract: Clustering a group of vertices in networks facilitates applications across different domains, such as social computing and Internet of Things. However, challenges arises for clustering networks with increased scale. This paper proposes a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms first find the m… ▽ More Clustering a group of vertices in networks facilitates applications across different domains, such as social computing and Internet of Things. However, challenges arises for clustering networks with increased scale. This paper proposes a solution which consists of two motif clustering techniques: standard acceleration CHIEF-ST and approximate acceleration CHIEF-AP. Both algorithms first find the maximal k-edge-connected subgraphs within the target networks to lower the network scale, then employ higher-order motifs in clustering. In the first procedure, we propose to lower the network scale by optimizing the network structure with maximal k-edge-connected subgraphs. For CHIEF-ST, we illustrate that all target motifs will be kept after this procedure when the minimum node degree of the target motif is equal or greater than k. For CHIEF-AP, we prove that the eigenvalues of the adjacency matrix and the Laplacian matrix are relatively stable after this step. That is, CHIEF-ST has no influence on motif clustering, whereas CHIEF-AP introduces limited yet acceptable impact. In the second procedure, we employ higher-order motifs, i.e., heterogeneous four-node motifs clustering in higher-order dense networks. The contributions of CHIEF are two-fold: (1) improved efficiency of motif clustering for big networks; (2) verification of higher-order motif significance. The proposed solutions are found to outperform baseline approaches according to experiments on real and synthetic networks, which demonstrates CHIEF's strength in large network analysis. Meanwhile, higher-order motifs are proved to perform better than traditional triangle motifs in clustering. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.01691 [pdf, other]

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances

Authors: Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Chuyuan Fu, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee , et al. (20 additional authors not shown)

Abstract: Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embo… ▽ More Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/. △ Less

Submitted 16 August, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: See website at https://say-can.github.io/ V1. Initial Upload. V2. Added PaLM results. Added study about new capabilities (drawer manipulation, chain of thought prompting, multilingual instructions). Added an ablation study of language model size. Added an open-source version of \algname on a simulated tabletop environment. Improved readability

arXiv:2203.16319 [pdf, other]

Multi-Robot Active Map** via Neural Bipartite Graph Matching

Authors: Kai Ye, Siyan Dong, Qingnan Fan, He Wang, Li Yi, Fei Xia, Jue Wang, Baoquan Chen

Abstract: We study the problem of multi-robot active map**, which aims for complete scene map construction in minimum time steps. The key to this problem lies in the goal position estimation to enable more efficient robot movements. Previous approaches either choose the frontier as the goal position via a myopic solution that hinders the time efficiency, or maximize the long-term value via reinforcement l… ▽ More We study the problem of multi-robot active map**, which aims for complete scene map construction in minimum time steps. The key to this problem lies in the goal position estimation to enable more efficient robot movements. Previous approaches either choose the frontier as the goal position via a myopic solution that hinders the time efficiency, or maximize the long-term value via reinforcement learning to directly regress the goal position, but does not guarantee the complete map construction. In this paper, we propose a novel algorithm, namely NeuralCoMap**, which takes advantage of both approaches. We reduce the problem to bipartite graph matching, which establishes the node correspondences between two graphs, denoting robots and frontiers. We introduce a multiplex graph neural network (mGNN) that learns the neural distance to fill the affinity matrix for more effective graph matching. We optimize the mGNN with a differentiable linear assignment layer by maximizing the long-term values that favor time efficiency and map completeness via reinforcement learning. We compare our algorithm with several state-of-the-art multi-robot active map** approaches and adapted reinforcement-learning baselines. Experimental results demonstrate the superior performance and exceptional generalization ability of our algorithm on various indoor scenes and unseen number of robots, when only trained with 9 indoor scenes. △ Less

Submitted 1 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.13498 [pdf]

Strain-dependent structural and electronic reconstructions in long-wavelength WS$_{2}$ moiré superlattices

Authors: Kai-Hui Li, Fei-** Xiao, Wen Guan, Yu-Long Xiao, Chang Xu, **-Ding Zhang, Chen-Fang Lin, Dong Li, Qing-Jun Tong, Si-Yu Li, An-Lian Pan

Abstract: In long-wavelength moiré superlattices of stacked transition metal dichalcogenides (TMDs), structural reconstruction ubiquitously occurs, which has reported to impact significantly their electronic properties. However, complete microscopic understandings of the interplay between the lattice reconstruction and alteration of electronic properties, and their further response to external perturbations… ▽ More In long-wavelength moiré superlattices of stacked transition metal dichalcogenides (TMDs), structural reconstruction ubiquitously occurs, which has reported to impact significantly their electronic properties. However, complete microscopic understandings of the interplay between the lattice reconstruction and alteration of electronic properties, and their further response to external perturbations in the reconstructed TMDs moiré superlattice are still lacking. Here, using scanning tunneling microscopy (STM) and scanning tunneling spectroscopy (STS) combined with first-principles calculation, we study the strain-dependent structural reconstruction and its correlated electronic reconstruction in long-wavelength H-type WS$_{2}$ moiré superlattice at nanometer scale. We observe that the long-wavelength WS$_{2}$ moiré superlattices experiencing strong atomic reconstruction transform into a hexagonal array of screw dislocations separating large-sized H-stacked domains. Both the geometry and the moiré wavelength of the moiré superlattice are dramatically tuned by external intralayer heterostrain in our experiment. Remarkably, the STS measurements further demonstrate that the location of the K point in conduction band is modulated sensitively by strain-induced lattice deformation at nanometer scale in this system, with the maximum energy shift reaching up to 300 meV. Our results highlight that intralayer strain plays a vital role in determining structural and electronic properties in TMD moiré superlattice. △ Less

Submitted 25 March, 2022; originally announced March 2022.

arXiv:2203.09020 [pdf, other]

Graph Augmentation Learning

Authors: Shuo Yu, Huafei Huang, Minh N. Dao, Feng Xia

Abstract: Graph Augmentation Learning (GAL) provides outstanding solutions for graph learning in handling incomplete data, noise data, etc. Numerous GAL methods have been proposed for graph-based applications such as social network analysis and traffic flow forecasting. However, the underlying reasons for the effectiveness of these GAL methods are still unclear. As a consequence, how to choose optimal graph… ▽ More Graph Augmentation Learning (GAL) provides outstanding solutions for graph learning in handling incomplete data, noise data, etc. Numerous GAL methods have been proposed for graph-based applications such as social network analysis and traffic flow forecasting. However, the underlying reasons for the effectiveness of these GAL methods are still unclear. As a consequence, how to choose optimal graph augmentation strategy for a certain application scenario is still in black box. There is a lack of systematic, comprehensive, and experimentally validated guideline of GAL for scholars. Therefore, in this survey, we in-depth review GAL techniques from macro (graph), meso (subgraph), and micro (node/edge) levels. We further detailedly illustrate how GAL enhance the data quality and the model performance. The aggregation mechanism of augmentation strategies and graph learning models are also discussed by different application scenarios, i.e., data-specific, model-specific, and hybrid scenarios. To better show the outperformance of GAL, we experimentally validate the effectiveness and adaptability of different GAL strategies in different downstream tasks. Finally, we share our insights on several open issues of GAL, including heterogeneity, spatio-temporal dynamics, scalability, and generalization. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: 14 pages, 4 figures, Accepted in The First International Workshop on Graph Learning in IW3C2

arXiv:2203.07999 [pdf, other]

MSCET: A Multi-Scenario Offloading Schedule for Biomedical Data Processing and Analysis in Cloud-Edge-Terminal Collaborative Vehicular Networks

Authors: Zhichen Ni, Honglong Chen, Zhe Li, Xiaomeng Wang, Na Yan, Weifeng Liu, Feng Xia

Abstract: With the rapid development of Artificial Intelligence (AI) and Internet of Things (IoTs), an increasing number of computation intensive or delay sensitive biomedical data processing and analysis tasks are produced in vehicles, bringing more and more challenges to the biometric monitoring of drivers. Edge computing is a new paradigm to solve these challenges by offloading tasks from the resource-li… ▽ More With the rapid development of Artificial Intelligence (AI) and Internet of Things (IoTs), an increasing number of computation intensive or delay sensitive biomedical data processing and analysis tasks are produced in vehicles, bringing more and more challenges to the biometric monitoring of drivers. Edge computing is a new paradigm to solve these challenges by offloading tasks from the resource-limited vehicles to Edge Servers (ESs) in Road Side Units (RSUs). However, most of the traditional offloading schedules for vehicular networks concentrate on the edge, while some tasks may be too complex for ESs to process. To this end, we consider a collaborative vehicular network in which the cloud, edge and terminal can cooperate with each other to accomplish the tasks. The vehicles can offload the computation intensive tasks to the cloud to save the resource of edge. We further construct the virtual resource pool which can integrate the resource of multiple ESs since some regions may be covered by multiple RSUs. In this paper, we propose a Multi-Scenario offloading schedule for biomedical data processing and analysis in Cloud-Edge-Terminal collaborative vehicular networks called MSCET. The parameters of the proposed MSCET are optimized to maximize the system utility. We also conduct extensive simulations to evaluate the proposed MSCET and the results illustrate that MSCET outperforms other existing schedules. △ Less

Submitted 16 February, 2022; originally announced March 2022.

arXiv:2202.13368 [pdf, other]

A nonhydrostatic atmospheric dynamical core on cubed sphere using hybrid multi-moment finite-volume/finite difference methods: formulations and preliminary tests

Authors: Chungang Chen, Xingliang Li, Feng Xiao, Xueshun Shen

Abstract: A nonhydrostatic dynamical core has been developed by using the multi-moment finite volume method that ensures the rigorous numerical conservation. To represent the spherical geometry free of polar problems, the cubed-sphere grid is adopted. A fourth-order multi-moment discretization formulation is applied to solve the governing equations cast in the local curvilinear coordinates on each patch of… ▽ More A nonhydrostatic dynamical core has been developed by using the multi-moment finite volume method that ensures the rigorous numerical conservation. To represent the spherical geometry free of polar problems, the cubed-sphere grid is adopted. A fourth-order multi-moment discretization formulation is applied to solve the governing equations cast in the local curvilinear coordinates on each patch of cubed sphere through a gnomonic projection. In vertical direction, the height-based terrain-following grid is used to deal with the topography and a conservative finite difference scheme is adopted for the spatial discretization. The dynamical core adopts the nonhydrostatic governing equations. To get around the CFL stability restriction imposed by sound wave propagation and relatively small grid spacing in the vertical direction, the dimensional-splitting time integration algorithm using the HEVI (horizontally-explicit and vertically-implicit) strategy is implemented by applying the IMEX (implicit-explicit) Runge-Kutta method. The proposed model was checked by the widely-used benchmark tests in this study. The numerical results show that the multi-moment model has superior solution quality and great practical potential as a numerical platform for development of the atmospheric general circulation models. △ Less

Submitted 27 February, 2022; originally announced February 2022.

Comments: 37 pages, 5 figures. arXiv admin note: text overlap with arXiv:2004.06290

arXiv:2202.12151 [pdf, other]

doi 10.1103/PhysRevB.107.L020404

Magnon corner states in twisted bilayer honeycomb magnets

Authors: Chun-Bo Hua, Zheng-Rong Liu, **-Hua Sun, **-Hua Gao, Chui-Zhen Chen, Qingjun Tong, Bin Zhou, Dong-Hui Xu

Abstract: Search for higher-order topological insulators, characterized by topologically protected gapless boundary states of codimension higher than one, in bosonic systems has attracted growing interest. Here, we establish twisted bilayer honeycomb magnets as a new platform for hosting second-order topological magnon insulators (SOTMIs) without fine-tuning. We employ a simple, minimal Heisenberg spin mode… ▽ More Search for higher-order topological insulators, characterized by topologically protected gapless boundary states of codimension higher than one, in bosonic systems has attracted growing interest. Here, we establish twisted bilayer honeycomb magnets as a new platform for hosting second-order topological magnon insulators (SOTMIs) without fine-tuning. We employ a simple, minimal Heisenberg spin model to describe misaligned bilayer sheets of honeycomb ferromagnetic magnets with a large commensurate twist angle. We found that the higher-order topology in this bilayer system shows a significant dependence on the interlayer exchange coupling. The SOTMI, featuring topologically protected magnon corner states, appears for ferromagnetic interlayer couplings, while the twisted bilayer exhibits a nodal phase in the case of antiferromagnetic interlayer coupling. △ Less

Submitted 3 January, 2023; v1 submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted for publication as a Letter in Phys. Rev. B

Journal ref: Phys. Rev. B 107, L020404 (2023)

arXiv:2202.12009 [pdf]

First implementation of full-workflow automation in radiotherapy: the All-in-One solution on rectal cancer

Authors: Lei Yu, Jun Zhao, Fan Xia, Zhiyuan Zhang, Yanfang Liu, Wei Zhang, **gjie Zhou, Jiazhou Wang, Weigang Hu, Zhen Zhang

Abstract: The aim of this work is to describe the technical characteristics of an AI-powered radiotherapy workflow that enables full-process automation (All-in-One), evaluate its performance implemented for on-couch initial treatment of rectal cancer, and provide insight into the behavior of full-workflow automation in the specialty of radiotherapy. The All-in-One workflow was developed based on a CT-integr… ▽ More The aim of this work is to describe the technical characteristics of an AI-powered radiotherapy workflow that enables full-process automation (All-in-One), evaluate its performance implemented for on-couch initial treatment of rectal cancer, and provide insight into the behavior of full-workflow automation in the specialty of radiotherapy. The All-in-One workflow was developed based on a CT-integrated linear accelerator. It incorporates routine radiotherapy procedures from simulation, autosegmentation, autoplanning, image guidance, beam delivery, and in vivo quality assurance (QA) into one scheme, with critical decision points involved, while the patient is on the treatment couch during the whole process. For the enrolled ten patients with rectal cancer, minor modifications of the autosegmented target volumes were required, and the Dice similarity coefficient and 95% Hausdorff distance before and after modifications were 0.892{\pm}0.061 and 18.2{\pm}13.0 mm, respectively. The autosegmented normal tissues and automatic plans were clinically acceptable without any modifications or reoptimization. The pretreatment IGRT corrections were within 2 mm in all directions, and the EPID-based in vivo QA showed a γ passing rate better than 97{\%} (3{\%}/3 mm/10{\%} threshold). The duration of the whole process was 23.2{\pm}3.5 minutes, depending mostly on the time required for manual modification and plan evaluation. The All-in-One workflow enables full automation of the entire radiotherapy process by seamlessly integrating multiple routine procedures. The one-stop solution shortens the time scale it takes to ready the first treatment from days to minutes, significantly improving the patient experience and the efficiency of the workflow, and shows potential to facilitate the clinical application of online adaptive replanning. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.11608 [pdf, other]

How to optimize an academic team when the outlier member is leaving?

Authors: Shuo Yu, Jiaying Liu, Feng Xia, Haoran Wei, Hanghang Tong

Abstract: An academic team is a highly-cohesive collaboration group of scholars, which has been recognized as an effective way to improve scientific output in terms of both quality and quantity. However, the high staff turnover brings about a series of problems that may have negative influence on team performance. To address this challenge, we first detect the tendency of the member who may potentially leav… ▽ More An academic team is a highly-cohesive collaboration group of scholars, which has been recognized as an effective way to improve scientific output in terms of both quality and quantity. However, the high staff turnover brings about a series of problems that may have negative influence on team performance. To address this challenge, we first detect the tendency of the member who may potentially leave. Here the outlierness is defined with respect to familiarity, which is quantified by using collaboration intensity. It is assumed that if a team member has a higher familiarity with scholars outside the team, then this member might probably leave the team. To minimize the influence caused by the leaving of such an outlier member, we propose an optimization solution to find a proper candidate who can replace the outlier member. Based on random walk with graph kernel, our solution involves familiarity matching, skill matching, as well as structure matching. The proposed approach proves to be effective and outperforms existing methods when applied to computer science academic teams. △ Less

Submitted 23 February, 2022; originally announced February 2022.

arXiv:2202.11435 [pdf, other]

Data-Driven Decision Making in COVID-19 Response: A Survey

Authors: Shuo Yu, Qing Qing, Chen Zhang, Ahsan Shehzad, Giles Oatley, Feng Xia

Abstract: COVID-19 has spread all over the world, having an enormous effect on our daily life and work. In response to the epidemic, a lot of important decisions need to be taken to save communities and economies worldwide. Data clearly plays a vital role in effective decision making. Data-driven decision making uses data related evidence and insights to guide the decision making process and to verify the p… ▽ More COVID-19 has spread all over the world, having an enormous effect on our daily life and work. In response to the epidemic, a lot of important decisions need to be taken to save communities and economies worldwide. Data clearly plays a vital role in effective decision making. Data-driven decision making uses data related evidence and insights to guide the decision making process and to verify the plan of action before it is committed. To better handle the epidemic, governments and policy making institutes have investigated abundant data originating from COVID-19. These data include those related to medicine, knowledge, media, etc. Based on these data, many prevention and control policies are made. In this survey paper, we summarize the progress of data-driven decision making in the response to COVID-19, including COVID-19 prevention and control, psychological counselling, financial aid, work resumption, and school re-opening. We also propose some current challenges and open issues in data-driven decision making, including data collection and quality, complex data analysis, and fairness in decision making. This survey paper sheds light on current policy making driven by data, which also provides a feasible direction for further scientific research. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Showing 151–200 of 579 results for author: Xia, F