Search | arXiv e-print repository

Greedy randomized Bregman-Kaczmarz method for constrained nonlinear systems of equations

Abstract: A greedy randomized nonlinear Bregman-Kaczmarz method by sampling the working index with residual information is developed for the solution of the constrained nonlinear system of equations. Theoretical analyses prove the convergence of the greedy randomized nonlinear Bregman-Kaczmarz method and its relaxed version. Numerical experiments verify the effectiveness of the proposed method,which converg… ▽ More A greedy randomized nonlinear Bregman-Kaczmarz method by sampling the working index with residual information is developed for the solution of the constrained nonlinear system of equations. Theoretical analyses prove the convergence of the greedy randomized nonlinear Bregman-Kaczmarz method and its relaxed version. Numerical experiments verify the effectiveness of the proposed method,which converges faster than the existing nonlinear Bregman-Kaczmarz methods. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.09813 [pdf, other]

Diffuse X-ray Explorer: a high-resolution X-ray spectroscopic sky surveyor on the China Space Station

Authors: Hai **, Junjie Mao, Liubiao Chen, Naihui Chen, Wei Cui, Bo Gao, **** Li, Xinfeng Li, Jiejia Liu, Jia Quan, Chunyang Jiang, Guole Wang, Le Wang, Qian Wang, Sifan Wang, Aimin Xiao, Shuo Zhang

Abstract: DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan… ▽ More DIffuse X-ray Explorer (DIXE) is a proposed high-resolution X-ray spectroscopic sky surveyor on the China Space Station (CSS). DIXE will focus on studying hot baryons in the Milky Way. Galactic hot baryons like the X-ray emitting Milky Way halo and eROSITA bubbles are best observed in the sky survey mode with a large field of view. DIXE will take advantage of the orbital motion of the CSS to scan a large fraction of the sky. High-resolution X-ray spectroscopy, enabled by superconducting microcalorimeters based on the transition-edge sensor (TES) technology, will probe the physical properties (e.g., temperature, density, elemental abundances, kinematics) of the Galactic hot baryons. This will complement the high-resolution imaging data obtained with the eROSITA mission. Here we present the preliminary design of DIXE. The payload consists mainly of a detector assembly and a cryogenic cooling system. The key components of the detector assembly are a microcalorimeter array and frequency-domain multiplexing readout electronics. To provide a working temperature for the detector assembly, the cooling system consists of an adiabatic demagnetization refrigerator and a mechanical cryocooler system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 12 pages, 6 figures, the full version is published by Journal of Low Temperature Physics

arXiv:2405.02794 [pdf, other]

Octopi: Object Property Reasoning with Large Tactile-Language Models

Authors: Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, Harold Soh

Abstract: Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they a… ▽ More Physical reasoning is important for effective robot manipulation. Recent work has investigated both vision and language modalities for physical reasoning; vision can reveal information about objects in the environment and language serves as an abstraction and communication medium for additional context. Although these works have demonstrated success on a variety of physical reasoning tasks, they are limited to physical properties that can be inferred from visual or language inputs. In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply commonsense reasoning. We contribute a new dataset PhysiCLeAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce Octopi, a system that leverages both tactile representation learning and large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. Our evaluations on PhysiCLeAR show that Octopi is able to effectively use intermediate physical property predictions to improve its performance on various tactile-related tasks. PhysiCLeAR and Octopi are available at https://github.com/clear-nus/octopi. △ Less

Submitted 4 June, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Accepted at Robotics: Science and Systems (R:SS 2024)

arXiv:2404.14953 [pdf, other]

Dynamic pricing with Bayesian updates from online reviews

Authors: José Correa, Mathieu Mari, Andrew Xia

Abstract: When launching new products, firms face uncertainty about market reception. Online reviews provide valuable information not only to consumers but also to firms, allowing firms to adjust the product characteristics, including its selling price. In this paper, we consider a pricing model with online reviews in which the quality of the product is uncertain, and both the seller and the buyers Bayesian… ▽ More When launching new products, firms face uncertainty about market reception. Online reviews provide valuable information not only to consumers but also to firms, allowing firms to adjust the product characteristics, including its selling price. In this paper, we consider a pricing model with online reviews in which the quality of the product is uncertain, and both the seller and the buyers Bayesianly update their beliefs to make purchasing & pricing decisions. We model the seller's pricing problem as a basic bandits' problem and show a close connection with the celebrated Catalan numbers, allowing us to efficiently compute the overall future discounted reward of the seller. With this tool, we analyze and compare the optimal static and dynamic pricing strategies in terms of the probability of effectively learning the quality of the product. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2402.03631 [pdf, other]

Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

Authors: Aoran Xiao, Weihao Xuan, Heli Qi, Yun Xing, Ruijie Ren, Xiaoqin Zhang, Ling Shao, Shijian Lu

Abstract: The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just f… ▽ More The recent Segment Anything Model (SAM) has demonstrated remarkable zero-shot capability and flexible geometric prompting in general image segmentation. However, SAM often struggles when handling various unconventional images, such as aerial, medical, and non-RGB images. This paper presents CAT-SAM, a ConditionAl Tuning network that adapts SAM toward various unconventional target tasks with just few-shot target samples. CAT-SAM freezes the entire SAM and adapts its mask decoder and image encoder simultaneously with a small number of learnable parameters. The core design is a prompt bridge structure that enables decoder-conditioned joint tuning of the heavyweight image encoder and the lightweight mask decoder. The bridging maps the prompt token of the mask decoder to the image encoder, fostering synergic adaptation of the encoder and the decoder with mutual benefits. We develop two representative tuning strategies for the image encoder which leads to two CAT-SAM variants: one injecting learnable prompt tokens in the input space and the other inserting lightweight adapter networks. Extensive experiments over 11 unconventional tasks show that both CAT-SAM variants achieve superior target segmentation performance consistently even under the very challenging one-shot adaptation setup. Project page: https://xiaoaoran.github.io/projects/CAT-SAM △ Less

Submitted 21 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Project page: https://xiaoaoran.github.io/projects/CAT-SAM

arXiv:2401.08407 [pdf, other]

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Authors: Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu

Abstract: Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fin… ▽ More Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA. △ Less

Submitted 13 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted by CVPR 2024

arXiv:2401.08344 [pdf, other]

Large-population asymptotics for the maximum of diffusive particles with mean-field interaction in the noises

Authors: Nikolaos Kolliopoulos, David Sanchez, Amy Xiao

Abstract: We study the $N \to \infty$ limit of the normalized largest component in some systems of $N$ diffusive particles with mean-field interaction. By applying a universal time change, the interaction in noises is transferred to the drift terms, and the asymptotic behavior of the maximum becomes well-understood due to existing results in the literature. We expect that the normalized maximum in the origi… ▽ More We study the $N \to \infty$ limit of the normalized largest component in some systems of $N$ diffusive particles with mean-field interaction. By applying a universal time change, the interaction in noises is transferred to the drift terms, and the asymptotic behavior of the maximum becomes well-understood due to existing results in the literature. We expect that the normalized maximum in the original setting has the same limiting distribution as that of i.i.d copies of a solution to the corresponding McKean-Vlasov SDE and we present some results and numerical simulations that support this conjecture. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: 12 pages

MSC Class: 60K35; 60H10; 60F05; 60G70

arXiv:2311.17406 [pdf, other]

LLM-State: Open World State Representation for Long-horizon Task Planning with Large Language Model

Authors: Siwei Chen, Anxing Xiao, David Hsu

Abstract: This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides… ▽ More This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose an open state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows continuously updating world model to enhance context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning. (Video\footnote{Video demonstration: \url{https://youtu.be/QkN-8pxV3Mo}.}) △ Less

Submitted 22 April, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.06711 [pdf, ps, other]

Optimal $L^\infty(L^2)$ and $L^1(L^2)$ a posteriori error estimates for the fully discrete approximations of time fractional parabolic differential equations

Authors: Jiliang Cao, Wansheng Wang, Aiguo Xiao

Abstract: We derive optimal order a posteriori error estimates in the $L^\infty(L^2)$ and $L^1(L^2)$-norms for the fully discrete approximations of time fractional parabolic differential equations. For the discretization in time, we use the $L1$ methods, while for the spatial discretization, we use standard conforming finite element methods. The linear and quadratic space-time reconstructions are introduced… ▽ More We derive optimal order a posteriori error estimates in the $L^\infty(L^2)$ and $L^1(L^2)$-norms for the fully discrete approximations of time fractional parabolic differential equations. For the discretization in time, we use the $L1$ methods, while for the spatial discretization, we use standard conforming finite element methods. The linear and quadratic space-time reconstructions are introduced, which are generalizations of the elliptic space reconstruction. Then the related a posteriori error estimates for the linear and quadratic space-time reconstructions play key roles in deriving global and pointwise final error estimates. Numerical experiments verify and complement our theoretical results. △ Less

Submitted 11 November, 2023; originally announced November 2023.

Comments: 22 pages

arXiv:2310.12997 [pdf]

doi 10.1117/12.2677526

Parking Spot Classification based on surround view camera system

Authors: Andy Xiao, Deep Doshi, Lihao Wang, Harsha Gorantla, Thomas Heitzmann, Peter Groth

Abstract: Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360° around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy s… ▽ More Surround-view fisheye cameras are commonly used for near-field sensing in automated driving scenarios, including urban driving and auto valet parking. Four fisheye cameras, one on each side, are sufficient to cover 360° around the vehicle capturing the entire near-field region. Based on surround view cameras, there has been much research on parking slot detection with main focus on the occupancy status in recent years, but little work on whether the free slot is compatible with the mission of the ego vehicle or not. For instance, some spots are handicap or electric vehicles accessible only. In this paper, we tackle parking spot classification based on the surround view camera system. We adapt the object detection neural network YOLOv4 with a novel polygon bounding box model that is well-suited for various shaped parking spaces, such as slanted parking slots. To the best of our knowledge, we present the first detailed study on parking spot detection and classification on fisheye cameras for auto valet parking scenarios. The results prove that our proposed classification approach is effective to distinguish between regular, electric vehicle, and handicap parking spots. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: SPIE Optical Engineering + Applications, 2023, San Diego, California, United States. Proc. SPIE 12675, Applications of Machine Learning 2023

arXiv:2310.12141 [pdf, other]

A phase transition and critical phenomenon for the two-dimensional random field Ising model

Authors: Jian Ding, Fenglin Huang, Aoteng Xia

Abstract: We study the random field Ising model in a two-dimensional box with side length $N$ where the external field is given by independent normal variables with mean $0$ and variance $ε^2$. Our primary result is the following phase transition at $T = T_c$: for $ε\ll N^{-7/8}$ the boundary influence (i.e., the difference between the spin averages at the center of the box with the plus and the minus bound… ▽ More We study the random field Ising model in a two-dimensional box with side length $N$ where the external field is given by independent normal variables with mean $0$ and variance $ε^2$. Our primary result is the following phase transition at $T = T_c$: for $ε\ll N^{-7/8}$ the boundary influence (i.e., the difference between the spin averages at the center of the box with the plus and the minus boundary conditions) decays as $N^{-1/8}$ and thus the disorder essentially has no effect on the boundary influence; for $ε\gg N^{-7/8}$, the boundary influence decays as $N^{-\frac{1}{8}}e^{-Θ(ε^{8/7}\, N)}$ (i.e., the disorder contributes a factor of $e^{-Θ(ε^{8/7}\, N)}$ to the decay rate). For a natural notion of the correlation length, i.e., the minimal size of the box where the boundary influence shrinks by a factor of $2$ from that with no external field, we also prove the following: as $ε\downarrow 0$ the correlation length transits from $Θ(ε^{-8/7})$ at $T_c$ to $e^{Θ(ε^{-4/3}\,\,)}$ for $T < T_c$. △ Less

Submitted 4 March, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

Comments: 65 pages; minor revision throughout over previous version

MSC Class: 60K35; 82B44

arXiv:2310.09078 [pdf, other]

DNFS-VNE: Deep Neuro Fuzzy System Driven Virtual Network Embedding

Authors: Ailing Xiao, Ning Chen, Sheng Wu, Peiying Zhang, Linling Kuang, Chunxiao Jiang

Abstract: By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the exi… ▽ More By decoupling substrate resources, network virtualization (NV) is a promising solution for meeting diverse demands and ensuring differentiated quality of service (QoS). In particular, virtual network embedding (VNE) is a critical enabling technology that enhances the flexibility and scalability of network deployment by addressing the coupling of Internet processes and services. However, in the existing deep neural networks (DNNs)-based works, the black-box nature DNNs limits the analysis, development, and improvement of systems. For example, in the industrial Internet of Things (IIoT), there is a conflict between decision interpretability and the opacity of DNN-based methods. In recent times, interpretable deep learning (DL) represented by deep neuro fuzzy systems (DNFS) combined with fuzzy inference has shown promising interpretability to further exploit the hidden value in the data. Motivated by this, we propose a DNFS-based VNE algorithm that aims to provide an interpretable NV scheme. Specifically, data-driven convolutional neural networks (CNNs) are used as fuzzy implication operators to compute the embedding probabilities of candidate substrate nodes through entailment operations. And, the identified fuzzy rule patterns are cached into the weights by forward computation and gradient back-propagation (BP). Moreover, the fuzzy rule base is constructed based on Mamdani-type linguistic rules using linguistic labels. In addition, the DNFS-driven five-block structure-based policy network serves as the agent for deep reinforcement learning (DRL), which optimizes VNE decision-making through interaction with the environment. Finally, the effectiveness of evaluation indicators and fuzzy rules is verified by simulation experiments. △ Less

Submitted 3 July, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

arXiv:2309.13505 [pdf, other]

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Authors: Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu

Abstract: Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grou** solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear s… ▽ More Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grou** solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish a concept archive that maintains potential visually-matched concepts with our proposed vision-driven expansion and text-to-vision-guided ranking. Relevant concepts can thus be identified via cluster-guided sampling and fed into pre-training, thereby bridging the gap between visual and textual semantics. Extensive experiments over a broad suite of 8 segmentation benchmarks show that CoCu achieves superb zero-shot transfer performance and greatly boosts language-supervised segmentation baseline by a large margin, suggesting the value of bridging semantic gap in pre-training data. △ Less

Submitted 4 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: NeurIPS 2023. Code is available at https://github.com/xing0047/rewrite

arXiv:2309.06041 [pdf, other]

GVD-Exploration: An Efficient Autonomous Robot Exploration Framework Based on Fast Generalized Voronoi Diagram Extraction

Authors: Dingfeng Chen, Anxing Xiao, Meiyuan Zou, Wenzheng Chi, Jiankun Wang, Lining Sun

Abstract: Rapidly-exploring Random Trees (RRTs) are a popular technique for autonomous exploration of mobile robots. However, the random sampling used by RRTs can result in inefficient and inaccurate frontiers extraction, which affects the exploration performance. To address the issues of slow path planning and high path cost, we propose a framework that uses a generalized Voronoi diagram (GVD) based multi-… ▽ More Rapidly-exploring Random Trees (RRTs) are a popular technique for autonomous exploration of mobile robots. However, the random sampling used by RRTs can result in inefficient and inaccurate frontiers extraction, which affects the exploration performance. To address the issues of slow path planning and high path cost, we propose a framework that uses a generalized Voronoi diagram (GVD) based multi-choice strategy for robot exploration. Our framework consists of three components: a novel map** model that uses an end-to-end neural network to construct GVDs of the environments in real time; a GVD-based heuristic scheme that accelerates frontiers extraction and reduces frontiers redundancy; and a multi-choice frontiers assignment scheme that considers different types of frontiers and enables the robot to make rational decisions during the exploration process. We evaluate our method on simulation and real-world experiments and show that it outperforms RRT-based exploration methods in terms of efficiency and robustness. △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: 11 pages, 10 figures

arXiv:2309.03005 [pdf, ps, other]

On multi-step extended maximum residual Kaczmarz method for solving large inconsistent linear systems

Authors: Aqin Xiao, Junfeng Yin, Ning Zheng

Abstract: A multi-step extended maximum residual Kaczmarz method is presented for the solution of the large inconsistent linear system of equations by using the multi-step iterations technique. Theoretical analysis proves the proposed method is convergent and gives an upper bound on its convergence rate. Numerical experiments show that the proposed method is effective and outperforms the existing extended K… ▽ More A multi-step extended maximum residual Kaczmarz method is presented for the solution of the large inconsistent linear system of equations by using the multi-step iterations technique. Theoretical analysis proves the proposed method is convergent and gives an upper bound on its convergence rate. Numerical experiments show that the proposed method is effective and outperforms the existing extended Kaczmarz methods in terms of the number of iteration steps and the computational costs. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.02780 [pdf, other]

GRASS: Unified Generation Model for Speech-to-Semantic Tasks

Authors: Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

Abstract: This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our pro… ▽ More This paper explores the instruction fine-tuning technique for speech-to-semantic tasks by introducing a unified end-to-end (E2E) framework that generates target text conditioned on a task-related prompt for audio data. We pre-train the model using large and diverse data, where instruction-speech pairs are constructed via a text-to-speech (TTS) system. Extensive experiments demonstrate that our proposed model achieves state-of-the-art (SOTA) results on many benchmarks covering speech named entity recognition, speech sentiment analysis, speech question answering, and more, after fine-tuning. Furthermore, the proposed model achieves competitive performance in zero-shot and few-shot scenarios. To facilitate future work on instruction fine-tuning for speech-to-semantic tasks, we release our instruction dataset and code. △ Less

Submitted 11 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2307.15283 [pdf, ps, other]

On averaging block Kaczmarz methods for solving nonlinear systems of equations

Authors: Aqin Xiao, Junfeng Yin

Abstract: A class of averaging block nonlinear Kaczmarz methods is developed for the solution of the nonlinear system of equations. The convergence theory of the proposed method is established under suitable assumptions and the upper bounds of the convergence rate for the proposed method with both constant stepsize and adaptive stepsize are derived. Numerical experiments are presented to verify the efficien… ▽ More A class of averaging block nonlinear Kaczmarz methods is developed for the solution of the nonlinear system of equations. The convergence theory of the proposed method is established under suitable assumptions and the upper bounds of the convergence rate for the proposed method with both constant stepsize and adaptive stepsize are derived. Numerical experiments are presented to verify the efficiency of the proposed method, which outperforms the existing nonlinear Kaczmarz methods in terms of the number of iteration steps and computational costs. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2305.19812 [pdf, other]

A Survey of Label-Efficient Deep Learning for 3D Point Clouds

Authors: Aoran Xiao, Xiaoqin Zhang, Ling Shao, Shijian Lu

Abstract: In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated training data is extremely laborious and expensive, which hinders the scalability of existing point cloud datasets and poses a bottleneck for efficient exploration of point cloud data in various tasks and applications. Label-efficient learning off… ▽ More In the past decade, deep neural networks have achieved significant progress in point cloud learning. However, collecting large-scale precisely-annotated training data is extremely laborious and expensive, which hinders the scalability of existing point cloud datasets and poses a bottleneck for efficient exploration of point cloud data in various tasks and applications. Label-efficient learning offers a promising solution by enabling effective deep network training with much-reduced annotation efforts. This paper presents the first comprehensive survey of label-efficient learning of point clouds. We address three critical questions in this emerging research field: i) the importance and urgency of label-efficient learning in point cloud processing, ii) the subfields it encompasses, and iii) the progress achieved in this area. To achieve this, we propose a taxonomy that organizes label-efficient learning methods based on the data prerequisites provided by different types of labels. We categorize four typical label-efficient learning approaches that significantly reduce point cloud annotation efforts: data augmentation, domain transfer learning, weakly-supervised learning, and pretrained foundation models. For each approach, we outline the problem setup and provide an extensive literature review that showcases relevant progress and challenges. Finally, we share insights into current research challenges and potential future directions. A project associated with this survey has been built at https://github.com/xiaoaoran/3D_label_efficient_learning. △ Less

Submitted 17 June, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2304.00690 [pdf, other]

3D Semantic Segmentation in the Wild: Learning Generalized Models for Adverse-Condition Point Clouds

Authors: Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Abstract: Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations… ▽ More Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal the challenge while existing 3DSS methods encounter adverse-weather data, showing the great value of SemanticSTF in steering the future endeavor along this very meaningful research direction. In addition, we design a domain randomization technique that alternatively randomizes the geometry styles of point clouds and aggregates their embeddings, ultimately leading to a generalizable model that can improve 3DSS under various adverse weather effectively. The SemanticSTF and related codes are available at \url{https://github.com/xiaoaoran/SemanticSTF}. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: CVPR2023

arXiv:2303.10300 [pdf, other]

doi 10.1103/PhysRevE.108.034901

Designing the pressure-dependent shear modulus using tessellated granular metamaterials

Authors: Jerry Zhang, Dong Wang, Weiwei **, Annie Xia, Nidhi Pashine, Rebecca Kramer-Bottiglio, Mark D. Shattuck, Corey S. O'Hern

Abstract: Jammed packings of granular materials display complex mechanical response. For example, the ensemble-averaged shear modulus $\left\langle G \right\rangle$ increases as a power-law in pressure $p$ for static packings of soft spherical particles that can rearrange during compression. We seek to design granular materials with shear moduli that can either increase {\it or} decrease with pressure witho… ▽ More Jammed packings of granular materials display complex mechanical response. For example, the ensemble-averaged shear modulus $\left\langle G \right\rangle$ increases as a power-law in pressure $p$ for static packings of soft spherical particles that can rearrange during compression. We seek to design granular materials with shear moduli that can either increase {\it or} decrease with pressure without particle rearrangements even in the large-system limit. To do this, we construct {\it tessellated} granular metamaterials by joining multiple particle-filled cells together. We focus on cells that contain a small number of bidisperse disks in two dimensions. We first study the mechanical properties of individual disk-filled cells with three types of boundaries: periodic boundary conditions (PBC), fixed-length walls (FXW), and flexible walls (FLW). Hypostatic jammed packings are found for cells with FLW, but not in cells with PBC and FXW, and they are stabilized by quartic modes of the dynamical matrix. The shear modulus of a single cell depends linearly on $p$. We find that the slope of the shear modulus with pressure, $λ_c < 0$ for all packings in single cells with PBC where the number of particles per cell $N \ge 6$. In contrast, single cells with FXW and FLW can possess $λ_c > 0$, as well as $λ_c < 0$, for $N \le 16$. We show that we can force the mechanical properties of multi-cell granular metamaterials to possess those of single cells by constraining the endpoints of the outer walls and enforcing an affine shear response. These studies demonstrate that tessellated granular metamaterials provide a novel platform for the design of soft materials with specified mechanical properties. △ Less

Submitted 10 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Journal ref: Phys. Rev. E 108, 034901 (2023)

arXiv:2303.06624 [pdf, other]

Collaborative Trolley Transportation System with Autonomous Nonholonomic Robots

Authors: Bingyi Xia, Hao Luan, Ziqi Zhao, Xuheng Gao, Peijia Xie, Anxing Xiao, Jiankun Wang, Max Q. -H. Meng

Abstract: Cooperative object transportation using multiple robots has been intensively studied in the control and robotics literature, but most approaches are either only applicable to omnidirectional robots or lack a complete navigation and decision-making framework that operates in real time. This paper presents an autonomous nonholonomic multi-robot system and an end-to-end hierarchical autonomy framewor… ▽ More Cooperative object transportation using multiple robots has been intensively studied in the control and robotics literature, but most approaches are either only applicable to omnidirectional robots or lack a complete navigation and decision-making framework that operates in real time. This paper presents an autonomous nonholonomic multi-robot system and an end-to-end hierarchical autonomy framework for collaborative luggage trolley transportation. This framework finds kinematic-feasible paths, computes online motion plans, and provides feedback that enables the multi-robot system to handle long lines of luggage trolleys and navigate obstacles and pedestrians while dealing with multiple inherently complex and coupled constraints. We demonstrate the designed collaborative trolley transportation system through practical transportation tasks, and the experiment results reveal their effectiveness and reliability in complex and dynamic environments. △ Less

Submitted 21 July, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

arXiv:2303.05223 [pdf, other]

LEAP: The latent exchangeability prior for borrowing information from historical data

Authors: Ethan M. Alt, Xiuya Chang, Xun Jiang, Qing Liu, May Mo, H. Amy Xia, Joseph G. Ibrahim

Abstract: It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue,… ▽ More It is becoming increasingly popular to elicit informative priors on the basis of historical data. Popular existing priors, including the power prior, commensurate prior, and robust meta-analytic prior provide blanket discounting. Thus, if only a subset of participants in the historical data are exchangeable with the current data, these priors may not be appropriate. In order to combat this issue, propensity score (PS) approaches have been proposed. However, PS approaches are only concerned with the covariate distribution, whereas exchangeability is typically assessed with parameters pertaining to the outcome. In this paper, we introduce the latent exchangeability prior (LEAP), where observations in the historical data are classified into exchangeable and non-exchangeable groups. The LEAP discounts the historical data by identifying the most relevant subjects from the historical data. We compare our proposed approach against alternative approaches in simulations and present a case study using our proposed prior to augment a control arm in a phase 3 clinical trial in plaque psoriasis with an unbalanced randomization scheme. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2302.10654 [pdf, ps, other]

On the rate of normal approximation for Poisson continuum percolation

Authors: Tiffany Y. Y. Lo, Aihua Xia

Abstract: It is known that the number of points in the largest cluster of a percolating Poisson process restricted to a large finite box is asymptotically normal. In this note, we establish a rate of convergence for the statement. As each point in the largest cluster is determined by points as far as the diameter of the box, known results in the literature of normal approximation for Poisson functionals can… ▽ More It is known that the number of points in the largest cluster of a percolating Poisson process restricted to a large finite box is asymptotically normal. In this note, we establish a rate of convergence for the statement. As each point in the largest cluster is determined by points as far as the diameter of the box, known results in the literature of normal approximation for Poisson functionals cannot be directly applied. To disentangle the long-range dependence of the largest cluster, we use the fact that the second largest cluster has comparatively shorter range of dependence to restrict the range of dependence, apply a recently established result in Chen, Röllin and Xia (2021) to obtain a Berry-Esseen type bound for the normal approximation of the number of points belonging to clusters that have a restricted range of dependence, and then estimate the gap between this quantity and the number of points in the largest cluster. △ Less

Submitted 7 September, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: 10 pages. This version contains a correction to an error in Lemma 2.2 in the previous versions

MSC Class: primary 60K35; 60F05; secondary 60D05; 60G57; 82B43; 62E20

arXiv:2210.08818 [pdf]

doi 10.4271/2022-01-0107

The Digital Foundation Platform -- A Multi-layered SOA Architecture for Intelligent Connected Vehicle Operating System

Authors: David Yu, Andy Xiao

Abstract: Legacy AD/ADAS development from OEMs centers around develo** functions on ECUs using services provided by AUTOSAR Classic Platform (CP) to meet automotive-grade and mass-production requirements. The AUTOSAR CP couples hardware and software components statically and encounters challenges to provide sufficient capacities for the processing of high-level intelligent driving functions, whereas the n… ▽ More Legacy AD/ADAS development from OEMs centers around develo** functions on ECUs using services provided by AUTOSAR Classic Platform (CP) to meet automotive-grade and mass-production requirements. The AUTOSAR CP couples hardware and software components statically and encounters challenges to provide sufficient capacities for the processing of high-level intelligent driving functions, whereas the new platform, AUTOSAR Adaptive Platform (AP) is designed to support dynamically communication and provide richer services and function abstractions for those resource-intensive (memory, CPU) applications. Yet for both platforms, application development and the supporting system software are still closely coupled together, and this makes application development and the enhancement less scalable and flexible, resulting in longer development cycles and slower time-to-market. This paper presents a multi-layered, service-oriented intelligent driving operating system foundation (we named it as Digital Foundation Platform) that provides abstractions for easier adoption of heterogeneous computing hardware. It features a multi-layer SOA software architecture with each layer providing adaptive service API at north-bound for application developers. The proposed Digital Foundation Platform (DFP) has significant advantages of decoupling hardware, operating system core, middle-ware, functional software and application software development. It provides SOA at multiple layers and enables application developers from OEMs, to customize and develop new applications or enhance existing applications with new features, either in autonomous domain or intelligent cockpit domain, with great agility, and less code through re-usability, and thus reduce the time-to-market. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: WCX SAE World Congress Experience 2022

arXiv:2210.05128 [pdf, ps, other]

On fast greedy block Kaczmarz methods for solving large consistent linear systems

Authors: Aqin Xiao, Junfeng Yin, Ning Zheng

Abstract: A class of fast greedy block Kaczmarz methods combined with general greedy strategy and average technique are proposed for solving large consistent linear systems. Theoretical analysis of the convergence of the proposed method is given in detail. Numerical experiments show that the proposed methods are efficient and faster than the existing methods. A class of fast greedy block Kaczmarz methods combined with general greedy strategy and average technique are proposed for solving large consistent linear systems. Theoretical analysis of the convergence of the proposed method is given in detail. Numerical experiments show that the proposed methods are efficient and faster than the existing methods. △ Less

Submitted 16 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 11 pages, 1 figure

arXiv:2209.13998 [pdf, other]

Long range order for three-dimensional random field Ising model throughout the entire low temperature regime

Authors: Jian Ding, Yu Liu, Aoteng Xia

Abstract: For $d\geq 3$, we study the Ising model on $\mathbb Z^d$ with random field given by $\{εh_v: v\in \mathbb Z^d\}$ where $h_v$'s are independent normal variables with mean 0 and variance 1. We show that for any $T < T_c$ (here $T_c$ is the critical temperature without disorder), long range order exists as long as $ε$ is sufficiently small depending on $T$. Our work extends previous results of Imbrie… ▽ More For $d\geq 3$, we study the Ising model on $\mathbb Z^d$ with random field given by $\{εh_v: v\in \mathbb Z^d\}$ where $h_v$'s are independent normal variables with mean 0 and variance 1. We show that for any $T < T_c$ (here $T_c$ is the critical temperature without disorder), long range order exists as long as $ε$ is sufficiently small depending on $T$. Our work extends previous results of Imbrie (1985) and Bricmont--Kupiainen (1988) from the very low temperature regime to the entire low temperature regime. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 36 pages

MSC Class: 60K35; 82B44

arXiv:2208.00223 [pdf, other]

PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds

Authors: Aoran Xiao, Jiaxing Huang, Dayan Guan, Kaiwen Cui, Shijian Lu, Ling Shao

Abstract: LiDAR point clouds, which are usually scanned by rotating LiDAR sensors continuously, capture precise geometry of the surrounding environment and are crucial to many autonomous detection and navigation tasks. Though many 3D deep architectures have been developed, efficient collection and annotation of large amounts of point clouds remain one major challenge in the analytic and understanding of poi… ▽ More LiDAR point clouds, which are usually scanned by rotating LiDAR sensors continuously, capture precise geometry of the surrounding environment and are crucial to many autonomous detection and navigation tasks. Though many 3D deep architectures have been developed, efficient collection and annotation of large amounts of point clouds remain one major challenge in the analytic and understanding of point cloud data. This paper presents PolarMix, a point cloud augmentation technique that is simple and generic but can mitigate the data constraint effectively across different perception tasks and scenarios. PolarMix enriches point cloud distributions and preserves point cloud fidelity via two cross-scan augmentation strategies that cut, edit, and mix point clouds along the scanning direction. The first is scene-level swap** which exchanges point cloud sectors of two LiDAR scans that are cut along the azimuth axis. The second is instance-level rotation and paste which crops point instances from one LiDAR scan, rotates them by multiple angles (to create multiple copies), and paste the rotated point instances into other scans. Extensive experiments show that PolarMix achieves superior performance consistently across different perception tasks and scenarios. In addition, it can work as plug-and-play for various 3D deep architectures and also performs well for unsupervised domain adaptation. △ Less

Submitted 30 July, 2022; originally announced August 2022.

arXiv:2205.13211 [pdf, ps, other]

Convergence rate for geometric statistics of point processes with fast decay dependence

Authors: Tianshu Cong, Aihua Xia

Abstract: [Błaszczyszyn, Yogeshwaran and Yukich (2019)] established central limit theorems for geometric statistics of point processes having fast decay dependence. As limit theorems are of limited use unless we understand their errors involved in the approximation, in this paper, we consider the rates of a normal approximation in terms of the Wasserstein distance for statistics of point processes on… ▽ More [Błaszczyszyn, Yogeshwaran and Yukich (2019)] established central limit theorems for geometric statistics of point processes having fast decay dependence. As limit theorems are of limited use unless we understand their errors involved in the approximation, in this paper, we consider the rates of a normal approximation in terms of the Wasserstein distance for statistics of point processes on $\mathbb{R}^d$ satisfying fast decay dependence. We demonstrate the use of the theorems for statistics arising from two families of point processes: the rarified Gibbs point processes and the determinantal point processes with fast decay kernels. △ Less

Submitted 26 May, 2022; originally announced May 2022.

Comments: 42 pages

MSC Class: primary 60F05; secondary 60D05; 60G55; 62E20; 05C80

arXiv:2205.03967 [pdf, other]

doi 10.1111/rssc.12596

The saturated pairwise interaction Gibbs point process as a joint species distribution model

Authors: Ian Flint, Nick Golding, Peter Vesk, Yan Wang, Aihua Xia

Abstract: In an effort to effectively model observed patterns in the spatial configuration of individuals of multiple species in nature, we introduce the saturated pairwise interaction Gibbs point process. Its main strength lies in its ability to model both attraction and repulsion within and between species, over different scales. As such, it is particularly well-suited to the study of associations in… ▽ More In an effort to effectively model observed patterns in the spatial configuration of individuals of multiple species in nature, we introduce the saturated pairwise interaction Gibbs point process. Its main strength lies in its ability to model both attraction and repulsion within and between species, over different scales. As such, it is particularly well-suited to the study of associations in complex ecosystems. Based on the existing literature, we provide an easy to implement fitting procedure as well as a technique to make inference for the model parameters. We also prove that under certain hypotheses the point process is locally stable, which allows us to use the well-known `coupling from the past' algorithm to draw samples from the model. Different numerical experiments show the robustness of the model. We study three different ecological datasets, demonstrating in each one that our model helps disentangle competing ecological effects on species' distribution. △ Less

Submitted 20 August, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

Comments: 36 pages, 14 figures

Journal ref: Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(5), 2022, pages 1721-1752

arXiv:2204.06456 [pdf, other]

doi 10.1103/PhysRevA.107.L031302

Non-equilibrium dynamics of fluctuations in an ultra-cold atomic mixture

Authors: Apoorva Hegde, Robert Ott, Andy Xia, Valentin Kasper, Jürgen Berges, Fred Jendrzejewski

Abstract: We investigate an ultra-cold mixture of Bose gases interacting via spin-changing collisions by studying the dynamics of spin fluctuations. The experimental implementation employs $^{23}$Na and $^{7}$Li atoms, which are prepared out of equilibrium across a wide range of initial conditions. We identify three regimes in the dynamics of the system for different initial states: a long-lived metastable… ▽ More We investigate an ultra-cold mixture of Bose gases interacting via spin-changing collisions by studying the dynamics of spin fluctuations. The experimental implementation employs $^{23}$Na and $^{7}$Li atoms, which are prepared out of equilibrium across a wide range of initial conditions. We identify three regimes in the dynamics of the system for different initial states: a long-lived metastable regime, an instability range with strong growth of fluctuations, and a fast relaxing regime approaching thermal equilibrium. Theoretical modelling of the data allows us to reconstruct effective potentials which characterize the different dynamical regimes of the system. △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: 9 pages, 5 figures

arXiv:2204.03875 [pdf, other]

Deterministic, Near-Linear $\varepsilon$-Approximation Algorithm for Geometric Bipartite Matching

Authors: Pankaj K. Agarwal, Hsien-Chih Chang, Sharath Raghvendra, Allen Xiao

Abstract: Given point sets $A$ and $B$ in $\mathbb{R}^d$ where $A$ and $B$ have equal size $n$ for some constant dimension $d$ and a parameter $\varepsilon>0$, we present the first deterministic algorithm that computes, in $n\cdot(\varepsilon^{-1} \log n)^{O(d)}$ time, a perfect matching between $A$ and $B$ whose cost is within a $(1+\varepsilon)$ factor of the optimal under any $\smash{\ell_p}$-norm. Altho… ▽ More Given point sets $A$ and $B$ in $\mathbb{R}^d$ where $A$ and $B$ have equal size $n$ for some constant dimension $d$ and a parameter $\varepsilon>0$, we present the first deterministic algorithm that computes, in $n\cdot(\varepsilon^{-1} \log n)^{O(d)}$ time, a perfect matching between $A$ and $B$ whose cost is within a $(1+\varepsilon)$ factor of the optimal under any $\smash{\ell_p}$-norm. Although a Monte-Carlo algorithm with a similar running time is proposed by Raghvendra and Agarwal [J. ACM 2020], the best-known deterministic $\varepsilon$-approximation algorithm takes $Ω(n^{3/2})$ time. Our algorithm constructs a (refinement of a) tree cover of $\mathbb{R}^d$, and we develop several new tools to apply a tree-cover based approach to compute an $\varepsilon$-approximate perfect matching. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: The conference version of the paper is accepted to STOC 2022

arXiv:2203.10026 [pdf, other]

Unbiased Subclass Regularization for Semi-Supervised Semantic Segmentation

Authors: Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

Abstract: Semi-supervised semantic segmentation learns from small amounts of labelled images and large amounts of unlabelled images, which has witnessed impressive progress with the recent advance of deep neural networks. However, it often suffers from severe class-bias problem while exploring the unlabelled images, largely due to the clear pixel-wise class imbalance in the labelled images. This paper prese… ▽ More Semi-supervised semantic segmentation learns from small amounts of labelled images and large amounts of unlabelled images, which has witnessed impressive progress with the recent advance of deep neural networks. However, it often suffers from severe class-bias problem while exploring the unlabelled images, largely due to the clear pixel-wise class imbalance in the labelled images. This paper presents an unbiased subclass regularization network (USRN) that alleviates the class imbalance issue by learning class-unbiased segmentation from balanced subclass distributions. We build the balanced subclass distributions by clustering pixels of each original class into multiple subclasses of similar sizes, which provide class-balanced pseudo supervision to regularize the class-biased segmentation. In addition, we design an entropy-based gate mechanism to coordinate learning between the original classes and the clustered subclasses which facilitates subclass regularization effectively by suppressing unconfident subclass predictions. Extensive experiments over multiple public benchmarks show that USRN achieves superior performance as compared with the state-of-the-art. △ Less

Submitted 26 March, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022. Code is available at https://github.com/Dayan-Guan/USRN

arXiv:2203.04541 [pdf, other]

PUTN: A Plane-fitting based Uneven Terrain Navigation Framework

Authors: Zhuozhu Jian, Zihong Lu, Xiao Zhou, Bin Lan, Anxing Xiao, Xueqian Wang, Bin Liang

Abstract: Autonomous navigation of ground robots has been widely used in indoor structured 2D environments, but there are still many challenges in outdoor 3D unstructured environments, especially in rough, uneven terrains. This paper proposed a plane-fitting based uneven terrain navigation framework (PUTN) to solve this problem. The implementation of PUTN is divided into three steps. First, based on Rapidly… ▽ More Autonomous navigation of ground robots has been widely used in indoor structured 2D environments, but there are still many challenges in outdoor 3D unstructured environments, especially in rough, uneven terrains. This paper proposed a plane-fitting based uneven terrain navigation framework (PUTN) to solve this problem. The implementation of PUTN is divided into three steps. First, based on Rapidly-exploring Random Trees (RRT), an improved sample-based algorithm called Plane Fitting RRT* (PF-RRT*) is proposed to obtain a sparse trajectory. Each sampling point corresponds to a custom traversability index and a fitted plane on the point cloud. These planes are connected in series to form a traversable strip. Second, Gaussian Process Regression is used to generate traversability of the dense trajectory interpolated from the sparse trajectory, and the sampling tree is used as the training set. Finally, local planning is performed using nonlinear model predictive control (NMPC). By adding the traversability index and uncertainty to the cost function, and adding obstacles generated by the real-time point cloud to the constraint function, a safe motion planning algorithm with smooth speed and strong robustness is available. Experiments in real scenarios are conducted to verify the effectiveness of the method. The source code is released for the reference of the community. △ Less

Submitted 27 September, 2022; v1 submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2022

arXiv:2203.03927 [pdf, other]

Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

Authors: Yanbo Chen, Zhengzhe Xu, Zhuozhu Jian, Gengpan Tang, Yunong Yangli, Anxing Xiao, Xueqian Wang, Bin Liang

Abstract: Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the tracti… ▽ More Guidance robots that can guide people and avoid various obstacles, could potentially be owned by more visually impaired people at a fairly low cost. Most of the previous guidance robots for the visually impaired ignored the human response behavior and comfort, treating the human as an appendage dragged by the robot, which can lead to imprecise guidance of the human and sudden changes in the traction force experienced by the human. In this paper, we propose a novel quadruped guidance robot system with a comfort-based concept. We design a controllable traction device that can adjust the length and force between human and robot to ensure comfort. To allow the human to be guided safely and comfortably to the target position in complex environments, our proposed human motion planner can plan the traction force with the force-based human motion model. To track the planned force, we also propose a robot motion planner that can generate the specific robot motion command and design the force control device. Our system has been deployed on Unitree Laikago quadrupedal platform and validated in real-world scenarios. △ Less

Submitted 23 June, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2202.13589 [pdf, other]

doi 10.1109/TPAMI.2023.3262786

Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey

Authors: Aoran Xiao, Jiaxing Huang, Dayan Guan, Xiaoqin Zhang, Shijian Lu, Ling Shao

Abstract: Point cloud data have been widely explored due to its superior accuracy and robustness under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved very impressive success in various applications such as surveillance and autonomous driving. The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale… ▽ More Point cloud data have been widely explored due to its superior accuracy and robustness under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved very impressive success in various applications such as surveillance and autonomous driving. The convergence of point cloud and DNNs has led to many deep point cloud models, largely trained under the supervision of large-scale and densely-labelled point cloud data. Unsupervised point cloud representation learning, which aims to learn general and useful point cloud representations from unlabelled point cloud data, has recently attracted increasing attention due to the constraint in large-scale point cloud labelling. This paper provides a comprehensive review of unsupervised point cloud representation learning using DNNs. It first describes the motivation, general pipelines as well as terminologies of the recent studies. Relevant background including widely adopted point cloud datasets and DNN architectures is then briefly presented. This is followed by an extensive discussion of existing unsupervised point cloud representation learning methods according to their technical approaches. We also quantitatively benchmark and discuss the reviewed methods over multiple widely adopted point cloud datasets. Finally, we share our humble opinion about several challenges and problems that could be pursued in future research in unsupervised point cloud representation learning. A project associated with this survey has been built at https://github.com/xiaoaoran/3d_url_survey. △ Less

Submitted 26 March, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2111.09983 [pdf, other]

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Authors: Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

Abstract: It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate the… ▽ More It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone. The entire corpus has been manually transcribed, allowing for detailed ASR evaluations across these metadata. Multiple ASR models are evaluated, including models trained on LibriSpeech, 14,000 hour transcribed, and over 2 million hour untranscribed social media videos. Significant differences in word error rate across gender and skin tone are observed at times for all models. We are releasing human transcripts from the Casual Conversations dataset to encourage the community to develop a variety of techniques to reduce these statistical biases. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: Submitted to ICASSP 2022. Our dataset will be publicly available at (https://ai.facebook.com/datasets/casual-conversations-downloads) for general use. We also would like to note that considering the limitations of our dataset, we limit the use of it for only evaluation purposes (see license agreement)

arXiv:2111.05948 [pdf, other]

Scaling ASR Improves Zero and Few Shot Learning

Authors: Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

Abstract: With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets. To efficiently scale model sizes, we leverage various optimizations such a… ▽ More With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to efficiently scale training data to find the most valuable samples in massive datasets. To efficiently scale model sizes, we leverage various optimizations such as sparse transducer loss and model sharding. By training 1-10B parameter universal English ASR models, we push the limits of speech recognition performance across many domains. Furthermore, our models learn powerful speech representations with zero and few-shot capabilities on novel domains and styles of speech, exceeding previous results across multiple in-house and public benchmarks. For speakers with disorders due to brain damage, our best zero-shot and few-shot models achieve 22% and 60% relative improvement on the AphasiaBank test set, respectively, while realizing the best performance on public social media videos. Furthermore, the same universal model reaches equivalent performance with 500x less in-domain data on the SPGISpeech financial-domain dataset. △ Less

Submitted 29 November, 2021; v1 submitted 10 November, 2021; originally announced November 2021.

arXiv:2110.06648 [pdf, other]

Robotic Autonomous Trolley Collection with Progressive Perception and Nonlinear Model Predictive Control

Authors: Anxing Xiao, Hao Luan, Ziqi Zhao, Yue Hong, Jieting Zhao, Weinan Chen, Jiankun Wang, Max Q. -H. Meng

Abstract: Autonomous mobile manipulation robots that can collect trolleys are widely used to liberate human resources and fight epidemics. Most prior robotic trolley collection solutions only detect trolleys with 2D poses or are merely based on specific marks and lack the formal design of planning algorithms. In this paper, we present a novel mobile manipulation system with applications in luggage trolley c… ▽ More Autonomous mobile manipulation robots that can collect trolleys are widely used to liberate human resources and fight epidemics. Most prior robotic trolley collection solutions only detect trolleys with 2D poses or are merely based on specific marks and lack the formal design of planning algorithms. In this paper, we present a novel mobile manipulation system with applications in luggage trolley collection. The proposed system integrates a compact hardware design and a progressive perception and planning framework, enabling the system to efficiently and robustly collect trolleys in dynamic and complex environments. For the perception, we first develop a 3D trolley detection method that combines object detection and keypoint estimation. Then, a docking process in a short distance is achieved with an accurate point cloud plane detection method and a novel manipulator design. On the planning side, we formulate the robot's motion planning under a nonlinear model predictive control framework with control barrier functions to improve obstacle avoidance capabilities while maintaining the target in the sensors' field of view at close distances. We demonstrate our design and framework by deploying the system on actual trolley collection tasks, and their effectiveness and robustness are experimentally validated. △ Less

Submitted 1 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted to the 2022 International Conference on Robotics and Automation (ICRA 2022)

arXiv:2110.05241 [pdf, other]

Streaming Transformer Transducer Based Speech Recognition Using Non-Causal Convolution

Authors: Yangyang Shi, Chunyang Wu, Dilin Wang, Alex Xiao, Jay Mahadeokar, Xiaohui Zhang, Chunxi Liu, Ke Li, Yuan Shangguan, Varun Nagaraja, Ozlem Kalinli, Mike Seltzer

Abstract: This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution. Many works apply the causal convolution to improve streaming transformer ignoring the lookahead context. We propose to use non-causal convolution to process the center block and lookahead context separately. This method leverages the lookahead context in convolution and maintains simila… ▽ More This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution. Many works apply the causal convolution to improve streaming transformer ignoring the lookahead context. We propose to use non-causal convolution to process the center block and lookahead context separately. This method leverages the lookahead context in convolution and maintains similar training and decoding efficiency. Given the similar latency, using the non-causal convolution with lookahead context gives better accuracy than causal convolution, especially for open-domain dictation scenarios. Besides, this paper applies talking-head attention and a novel history context compression scheme to further improve the performance. The talking-head attention improves the multi-head self-attention by transferring information among different heads. The history context compression method introduces more extended history context compactly. On our in-house data, the proposed methods improve a small Emformer baseline with lookahead context by relative WERR 5.1\%, 14.5\%, 8.4\% on open-domain dictation, assistant general scenarios, and assistant calling scenarios, respectively. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: 5 pages, 3 figures, submit to ICASSP 2022

arXiv:2110.03374 [pdf, other]

Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data

Authors: Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu

Abstract: Unsupervised domain adaptation aims to align a labeled source domain and an unlabeled target domain, but it requires to access the source data which often raises concerns in data privacy, data portability and data transmission efficiency. We study unsupervised model adaptation (UMA), or called Unsupervised Domain Adaptation without Source Data, an alternative setting that aims to adapt source-trai… ▽ More Unsupervised domain adaptation aims to align a labeled source domain and an unlabeled target domain, but it requires to access the source data which often raises concerns in data privacy, data portability and data transmission efficiency. We study unsupervised model adaptation (UMA), or called Unsupervised Domain Adaptation without Source Data, an alternative setting that aims to adapt source-trained models towards target distributions without accessing source data. To this end, we design an innovative historical contrastive learning (HCL) technique that exploits historical source hypothesis to make up for the absence of source data in UMA. HCL addresses the UMA challenge from two perspectives. First, it introduces historical contrastive instance discrimination (HCID) that learns from target samples by contrasting their embeddings which are generated by the currently adapted model and the historical models. With the historical models, HCID encourages UMA to learn instance-discriminative target representations while preserving the source hypothesis. Second, it introduces historical contrastive category discrimination (HCCD) that pseudo-labels target samples to learn category-discriminative target representations. Specifically, HCCD re-weights pseudo labels according to their prediction consistency across the current and historical models. Extensive experiments show that HCL outperforms and state-of-the-art methods consistently across a variety of visual tasks and setups. △ Less

Submitted 4 June, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

Comments: Accepted to Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

arXiv:2110.03174 [pdf, other]

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Authors: Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer

Abstract: Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (AED) process. Inspired by the observation that many human-centered acoustic events in daily life involve voice elements, this paper investigates the po… ▽ More Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (AED) process. Inspired by the observation that many human-centered acoustic events in daily life involve voice elements, this paper investigates the potential of transferring high-level voice representations extracted from a public speaker dataset to enrich an AED pipeline. Towards this end, we develop a dual-branch neural network architecture for the joint learning of voice and acoustic features during an AED process and conduct thorough empirical studies to examine the performance on the public AudioSet [1] with different types of inputs. Our main observations are that: 1) Joint learning of audio and voice inputs improves the AED performance (mean average precision) for both a CNN baseline (0.292 vs 0.134 mAP) and a TALNet [2] baseline (0.361 vs 0.351 mAP); 2) Augmenting the extra voice features is critical to maximize the model performance with dual inputs. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Submitted to ICASSP 2022

arXiv:2108.00177 [pdf, other]

Greedy Network Enlarging

Authors: Chuanjian Liu, Kai Han, An Xiao, Yi** Deng, Wei Zhang, Chun**g Xu, Yunhe Wang

Abstract: Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i.e., models with more MACs typically achieve better accuracy, such as EfficientNet and RegNet. These works try to enlarge all the stages in the model with one unified rule by sampling and statistical methods. However, we observe that some network architectures have similar MACs and accuracies, b… ▽ More Recent studies on deep convolutional neural networks present a simple paradigm of architecture design, i.e., models with more MACs typically achieve better accuracy, such as EfficientNet and RegNet. These works try to enlarge all the stages in the model with one unified rule by sampling and statistical methods. However, we observe that some network architectures have similar MACs and accuracies, but their allocations on computations for different stages are quite different. In this paper, we propose to enlarge the capacity of CNN models by improving their width, depth and resolution on stage level. Under the assumption that the top-performing smaller CNNs are a proper subcomponent of the top-performing larger CNNs, we propose an greedy network enlarging method based on the reallocation of computations. With step-by-step modifying the computations on different stages, the enlarged network will be equipped with optimal allocation and utilization of MACs. On EfficientNet, our method consistently outperforms the performance of the original scaling method. In particular, with application of our method on GhostNet, we achieve state-of-the-art 80.9% and 84.3% ImageNet top-1 accuracies under the setting of 600M and 4.4B MACs, respectively. △ Less

Submitted 25 November, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

arXiv:2107.11004 [pdf, other]

Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Authors: Dayan Guan, Jiaxing Huang, Aoran Xiao, Shijian Lu

Abstract: Video semantic segmentation is an essential task for the analysis and understanding of videos. Recent efforts largely focus on supervised video segmentation by learning from fully annotated data, but the learnt models often experience clear performance drop while applied to videos of a different domain. This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain… ▽ More Video semantic segmentation is an essential task for the analysis and understanding of videos. Recent efforts largely focus on supervised video segmentation by learning from fully annotated data, but the learnt models often experience clear performance drop while applied to videos of a different domain. This paper presents DA-VSN, a domain adaptive video segmentation network that addresses domain gaps in videos by temporal consistency regularization (TCR) for consecutive frames of target-domain videos. DA-VSN consists of two novel and complementary designs. The first is cross-domain TCR that guides the prediction of target frames to have similar temporal consistency as that of source frames (learnt from annotated source data) via adversarial learning. The second is intra-domain TCR that guides unconfident predictions of target frames to have similar temporal consistency as confident predictions of target frames. Extensive experiments demonstrate the superiority of our proposed domain adaptive video segmentation network which outperforms multiple baselines consistently by large margins. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: Accepted to ICCV 2021. Code is available at https://github.com/Dayan-Guan/DA-VSN

arXiv:2107.05399 [pdf, other]

Transfer Learning from Synthetic to Real LiDAR Point Cloud for Semantic Segmentation

Authors: Aoran Xiao, Jiaxing Huang, Dayan Guan, Fangneng Zhan, Shijian Lu

Abstract: Knowledge transfer from synthetic to real data has been widely studied to mitigate data annotation constraints in various computer vision tasks such as semantic segmentation. However, the study focused on 2D images and its counterpart in 3D point clouds segmentation lags far behind due to the lack of large-scale synthetic datasets and effective transfer methods. We address this issue by collecting… ▽ More Knowledge transfer from synthetic to real data has been widely studied to mitigate data annotation constraints in various computer vision tasks such as semantic segmentation. However, the study focused on 2D images and its counterpart in 3D point clouds segmentation lags far behind due to the lack of large-scale synthetic datasets and effective transfer methods. We address this issue by collecting SynLiDAR, a large-scale synthetic LiDAR dataset that contains point-wise annotated point clouds with accurate geometric shapes and comprehensive semantic classes. SynLiDAR was collected from multiple virtual environments with rich scenes and layouts which consists of over 19 billion points of 32 semantic classes. In addition, we design PCT, a novel point cloud translator that effectively mitigates the gap between synthetic and real point clouds. Specifically, we decompose the synthetic-to-real gap into an appearance component and a sparsity component and handle them separately which improves the point cloud translation greatly. We conducted extensive experiments over three transfer learning setups including data augmentation, semi-supervised domain adaptation and unsupervised domain adaptation. Extensive experiments show that SynLiDAR provides a high-quality data source for studying 3D transfer and the proposed PCT achieves superior point cloud translation consistently across the three setups. SynLiDAR project page: \url{https://github.com/xiaoaoran/SynLiDAR} △ Less

Submitted 1 December, 2021; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: Accepted by AAAI 2022

arXiv:2107.04140 [pdf, other]

First-Generation Inference Accelerator Deployment at Facebook

Authors: Michael Anderson, Benny Chen, Stephen Chen, Summer Deng, Jordan Fix, Michael Gschwind, Aravind Kalaiah, Changkyu Kim, Jaewon Lee, Jason Liang, Haixin Liu, Yinghai Lu, Jack Montgomery, Arun Moorthy, Satish Nadathur, Sam Naghshineh, Avinash Nayak, Jongsoo Park, Chris Petersen, Martin Schatz, Narayanan Sundaram, Bangsheng Tang, Peter Tang, Amy Yang, Jiecao Yu , et al. (90 additional authors not shown)

Abstract: In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in… ▽ More In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design. △ Less

Submitted 4 August, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

arXiv:2107.03021 [pdf, other]

Bi-level Feature Alignment for Versatile Image Translation and Manipulation

Authors: Fangneng Zhan, Yingchen Yu, Rongliang Wu, Jiahui Zhang, Kaiwen Cui, Aoran Xiao, Shijian Lu, Chunyan Miao

Abstract: Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a corresp… ▽ More Generative adversarial networks (GANs) have achieved great success in image translation and manipulation. However, high-fidelity image generation with faithful style control remains a grand challenge in computer vision. This paper presents a versatile image translation and manipulation framework that achieves accurate semantic and style guidance in image generation by explicitly building a correspondence. To handle the quadratic complexity incurred by building the dense correspondences, we introduce a bi-level feature alignment strategy that adopts a top-$k$ operation to rank block-wise features followed by dense attention between block features which reduces memory cost substantially. As the top-$k$ operation involves index swap** which precludes the gradient propagation, we approximate the non-differentiable top-$k$ operation with a regularized earth mover's problem so that its gradient can be effectively back-propagated. In addition, we design a novel semantic position encoding mechanism that builds up coordinate for each individual semantic region to preserve texture structures while building correspondences. Further, we design a novel confidence feature injection module which mitigates mismatch problem by fusing features adaptively according to the reliability of built correspondences. Extensive experiments show that our method achieves superior performance qualitatively and quantitatively as compared with the state-of-the-art. △ Less

Submitted 21 July, 2022; v1 submitted 7 July, 2021; originally announced July 2021.

Comments: Accepted to ECCV 2022

arXiv:2107.00773 [pdf, other]

Autonomous Navigation for Quadrupedal Robots with Optimized Jum** through Constrained Obstacles

Authors: Scott Gilroy, Derek Lau, Lizhi Yang, Ed Izaguirre, Kristen Biermayer, Anxing Xiao, Mengti Sun, Ayush Agrawal, Jun Zeng, Zhongyu Li, Koushil Sreenath

Abstract: Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jum** modes. To obtain a dynamic jum** maneuver while avoiding obstacles, dynamically-feasible trajectories are… ▽ More Quadrupeds are strong candidates for navigating challenging environments because of their agile and dynamic designs. This paper presents a methodology that extends the range of exploration for quadrupedal robots by creating an end-to-end navigation framework that exploits walking and jum** modes. To obtain a dynamic jum** maneuver while avoiding obstacles, dynamically-feasible trajectories are optimized offline through collocation-based optimization where safety constraints are imposed. Such optimization schematic allows the robot to jump through window-shaped obstacles by considering both obstacles in the air and on the ground. The resulted jum** mode is utilized in an autonomous navigation pipeline that leverages a search-based global planner and a local planner to enable the robot to reach the goal location by walking. A state machine together with a decision making strategy allows the system to switch behaviors between walking around obstacles or jum** through them. The proposed framework is experimentally deployed and validated on a quadrupedal robot, a Mini Cheetah, to enable the robot to autonomously navigate through an environment while avoiding obstacles and jum** over a maximum height of 13 cm to pass through a window-shaped opening in order to reach its goal. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE 2021)

arXiv:2106.15941 [pdf, other]

Augmented Shortcuts for Vision Transformers

Authors: Yehui Tang, Kai Han, Chang Xu, An Xiao, Yi** Deng, Chao Xu, Yunhe Wang

Abstract: Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases… ▽ More Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, i.e., feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1% accuracy increase of the state-of-the-art visual transformers without obviously increasing their parameters and FLOPs. △ Less

Submitted 30 June, 2021; originally announced June 2021.

arXiv:2106.03014 [pdf, ps, other]

Geometric sums, size biasing and zero biasing

Authors: Qingwei Liu, Aihua Xia

Abstract: The geometric sum plays a significant role in risk theory and reliability theory \cite{Kala97} and a prototypical example of the geometric sum is Rényi's theorem~\cite{Renyi56} saying a sequence of suitably parameterised geometric sums converges to the exponential distribution. There is extensive study of the accuracy of exponential distribution approximation to the geometric sum \cite{Sugakova95,… ▽ More The geometric sum plays a significant role in risk theory and reliability theory \cite{Kala97} and a prototypical example of the geometric sum is Rényi's theorem~\cite{Renyi56} saying a sequence of suitably parameterised geometric sums converges to the exponential distribution. There is extensive study of the accuracy of exponential distribution approximation to the geometric sum \cite{Sugakova95,Kala97,PekozRollin11} but there is little study on its natural counterpart of gamma distribution approximation to negative binomial sums. In this note, we show that a nonnegative random variable follows a gamma distribution if and only if its size biasing equals its zero biasing. We combine this characterisation with Stein's method to establish simple bounds for gamma distribution approximation to the sum of nonnegative independent random variables, a class of compound Poisson distributions and the negative binomial sum of random variables. △ Less

Submitted 16 October, 2021; v1 submitted 5 June, 2021; originally announced June 2021.

arXiv:2106.02885 [pdf, other]

Category Contrast for Unsupervised Domain Adaptation in Visual Tasks

Authors: Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu, Ling Shao

Abstract: Instance contrast for unsupervised representation learning has achieved great success in recent years. In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks. By considering instance contrastive learning as… ▽ More Instance contrast for unsupervised representation learning has achieved great success in recent years. In this work, we explore the idea of instance contrastive learning in unsupervised domain adaptation (UDA) and propose a novel Category Contrast technique (CaCo) that introduces semantic priors on top of instance discrimination for visual UDA tasks. By considering instance contrastive learning as a dictionary look-up operation, we construct a semantics-aware dictionary with samples from both source and target domains where each target sample is assigned a (pseudo) category label based on the category priors of source samples. This allows category contrastive learning (between target queries and the category-level dictionary) for category-discriminative yet domain-invariant feature representations: samples of the same category (from either source or target domain) are pulled closer while those of different categories are pushed apart simultaneously. Extensive UDA experiments in multiple visual tasks (e.g., segmentation, classification and detection) show that CaCo achieves superior performance as compared with state-of-the-art methods. The experiments also demonstrate that CaCo is complementary to existing UDA methods and generalizable to other learning setups such as unsupervised model adaptation, open-/partial-set adaptation etc. △ Less

Submitted 17 March, 2022; v1 submitted 5 June, 2021; originally announced June 2021.

Comments: CVPR2022 version

Showing 1–50 of 120 results for author: Xia, A