Search | arXiv e-print repository

PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

Authors: Tao Fan, Yan Kang, Wei**g Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed prompts to the server's LLM for rationale generation. The generated rationales are then decoded by the client and used to enrich the training of task-specific small language model(SLM) within a multi-task learning paradigm. PDSS introduces two privacy protection strategies: the Exponential Mechanism Strategy and the Encoder-Decoder Strategy, balancing prompt privacy and rationale usability. Experiments demonstrate the effectiveness of PDSS in various text generation tasks, enabling the training of task-specific SLM with enhanced performance while prioritizing data privacy protection. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.02224 [pdf, other]

FedMKT: Federated Mutual Knowledge Transfer for Large and Small Language Models

Authors: Tao Fan, Guoqiang Ma, Yan Kang, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

Abstract: Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bri… ▽ More Recent research in federated large language models (LLMs) has primarily focused on enabling clients to fine-tune their locally deployed homogeneous LLMs collaboratively or on transferring knowledge from server-based LLMs to small language models (SLMs) at downstream clients. However, a significant gap remains in the simultaneous mutual enhancement of both the server's LLM and clients' SLMs. To bridge this gap, we propose FedMKT, a parameter-efficient federated mutual knowledge transfer framework for large and small language models. This framework is designed to adaptively transfer knowledge from the server's LLM to clients' SLMs while concurrently enriching the LLM with clients' unique domain insights. We facilitate token alignment using minimum edit distance (MinED) and then selective mutual knowledge transfer between client-side SLMs and a server-side LLM, aiming to collectively enhance their performance. Through extensive experiments across three distinct scenarios, we evaluate the effectiveness of FedMKT using various public LLMs and SLMs on a range of NLP text generation tasks. Empirical results demonstrate that FedMKT simultaneously boosts the performance of both LLMs and SLMs. △ Less

Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2405.09357 [pdf, ps, other]

A universal optimization framework based on cycle ranking for influence maximization in complex networks

Authors: Wenfeng Shi, Tianlong Fan, Shuqi Xu, Rongmei Yang, Linyuan Lü

Abstract: Influence maximization aims to identify a set of influential individuals, referred to as influencers, as information sources to maximize the spread of information within networks, constituting a vital combinatorial optimization problem with extensive practical applications and sustained interdisciplinary interest. Diverse approaches have been devised to efficiently address this issue, one of which… ▽ More Influence maximization aims to identify a set of influential individuals, referred to as influencers, as information sources to maximize the spread of information within networks, constituting a vital combinatorial optimization problem with extensive practical applications and sustained interdisciplinary interest. Diverse approaches have been devised to efficiently address this issue, one of which involves selecting the influencers from a given centrality ranking. In this paper, we propose a novel optimization framework based on ranking basic cycles in networks, capable of selecting the influencers from diverse centrality measures. The experimental results demonstrate that, compared to directly selecting the top-k nodes from centrality sequences and other state-of-the-art optimization approaches, the new framework can expand the dissemination range by 1.5 to 3 times. Counterintuitively, it exhibits minimal hub property, with the average distance between influencers being only one-third of alternative approaches, regardless of the centrality metrics or network types. Our study not only paves the way for novel strategies in influence maximization but also underscores the unique potential of underappreciated cycle structures. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.11862 [pdf]

A Fast Maximum Clique Algorithm Based on Network Decomposition for Large Sparse Networks

Authors: Tianlong Fan, Wenjun Jiang, Yi-Cheng Zhang, Linyuan Lü

Abstract: Finding maximum cliques in large networks is a challenging combinatorial problem with many real-world applications. We present a fast algorithm to achieve the exact solution for the maximum clique problem in large sparse networks based on efficient graph decomposition. A bunch of effective techniques is being used to greatly prune the graph and a novel concept called Complete-Upper-Bound-Induced S… ▽ More Finding maximum cliques in large networks is a challenging combinatorial problem with many real-world applications. We present a fast algorithm to achieve the exact solution for the maximum clique problem in large sparse networks based on efficient graph decomposition. A bunch of effective techniques is being used to greatly prune the graph and a novel concept called Complete-Upper-Bound-Induced Subgraph (CUBIS) is proposed to ensure that the structures with the potential to form the maximum clique are retained in the process of graph decomposition. Our algorithm first pre-prunes peripheral nodes, subsequently, one or two small-scale CUBISs are constructed guided by the core number and current maximum clique size. Bron-Kerbosch search is performed on each CUBIS to find the maximum clique. Experiments on 50 empirical networks with a scale of up to 20 million show the CUBIS scales are largely independent of the original network scale. This enables an approximately linear runtime, making our algorithm amenable for large networks. Our work provides a new framework for effectively solving maximum clique problems on massive sparse graphs, which not only makes the graph scale no longer the bottleneck but also shows some light on solving other clique-related problems. △ Less

Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: 12 pages, 2 figures, 1 table

MSC Class: 05C82(Primary); 05C80; 91D30; 68P20(Secondary) ACM Class: H.3.3; F.2.2; J.2

arXiv:2403.01400 [pdf, other]

Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks

Authors: Tianyu Fan, Lirong Wu, Yufei Huang, Haitao Lin, Cheng Tan, Zhangyang Gao, Stan Z. Li

Abstract: Recent years have witnessed the great success of graph pre-training for graph representation learning. With hundreds of graph pre-training tasks proposed, integrating knowledge acquired from multiple pre-training tasks has become a popular research topic. In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a gi… ▽ More Recent years have witnessed the great success of graph pre-training for graph representation learning. With hundreds of graph pre-training tasks proposed, integrating knowledge acquired from multiple pre-training tasks has become a popular research topic. In this paper, we identify two important collaborative processes for this topic: (1) select: how to select an optimal task combination from a given task pool based on their compatibility, and (2) weigh: how to weigh the selected tasks based on their importance. While there currently has been a lot of work focused on weighing, comparatively little effort has been devoted to selecting. This paper proposes a novel instance-level framework for integrating multiple graph pre-training tasks, Weigh And Select (WAS), where the two collaborative processes, weighing and selecting, are combined by decoupled siamese networks. Specifically, it first adaptively learns an optimal combination of tasks for each instance from a given task pool, based on which a customized instance-level task weighing strategy is learned. Extensive experiments on 16 graph datasets across node-level and graph-level downstream tasks have demonstrated that by combining a few simple but classical tasks, WAS can achieve comparable performance to other leading counterparts. The code is available at https://github.com/TianyuFan0504/WAS. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2403.00027 [pdf, ps, other]

A Quick Framework for Evaluating Worst Robustness of Complex Networks

Authors: Wenjun Jiang, Peiyan Li, Tianlong Fan, Ting Li, Chuan-fu Zhang, Tao Zhang, Zong-fu Luo

Abstract: Robustness is pivotal for comprehending, designing, optimizing, and rehabilitating networks, with simulation attacks being the prevailing evaluation method. Simulation attacks are often time-consuming or even impractical, however, a more crucial yet persistently overlooked drawback is that any attack strategy merely provides a potential paradigm of disintegration. The key concern is: in the worst-… ▽ More Robustness is pivotal for comprehending, designing, optimizing, and rehabilitating networks, with simulation attacks being the prevailing evaluation method. Simulation attacks are often time-consuming or even impractical, however, a more crucial yet persistently overlooked drawback is that any attack strategy merely provides a potential paradigm of disintegration. The key concern is: in the worst-case scenario or facing the most severe attacks, what is the limit of robustness, referred to as ``Worst Robustness'', for a given system? Understanding a system's worst robustness is imperative for gras** its reliability limits, accurately evaluating protective capabilities, and determining associated design and security maintenance costs. To address these challenges, we introduce the concept of Most Destruction Attack (MDA) based on the idea of knowledge stacking. MDA is employed to assess the worst robustness of networks, followed by the application of an adapted CNN algorithm for rapid worst robustness prediction. We establish the logical validity of MDA and highlight the exceptional performance of the adapted CNN algorithm in predicting the worst robustness across diverse network topologies, encompassing both model and empirical networks. △ Less

Submitted 28 February, 2024; originally announced March 2024.

Comments: 30 pages, 8figures, 4tables,journal

MSC Class: 68T07(Primary)90B25; 05C80; 05C82; 90B15; 90B18(Secondary) ACM Class: I.2.6; G.2.2; J.4; F.2.2

arXiv:2312.13583 [pdf, other]

Fine-tuning Graph Neural Networks by Preserving Graph Generative Patterns

Authors: Yifei Sun, Qi Zhu, Yang Yang, Chun** Wang, Tianyu Fan, Jiajun Zhu, Lei Chen

Abstract: Recently, the paradigm of pre-training and fine-tuning graph neural networks has been intensively studied and applied in a wide range of graph mining tasks. Its success is generally attributed to the structural consistency between pre-training and downstream datasets, which, however, does not hold in many real-world scenarios. Existing works have shown that the structural divergence between pre-tr… ▽ More Recently, the paradigm of pre-training and fine-tuning graph neural networks has been intensively studied and applied in a wide range of graph mining tasks. Its success is generally attributed to the structural consistency between pre-training and downstream datasets, which, however, does not hold in many real-world scenarios. Existing works have shown that the structural divergence between pre-training and downstream graphs significantly limits the transferability when using the vanilla fine-tuning strategy. This divergence leads to model overfitting on pre-training graphs and causes difficulties in capturing the structural properties of the downstream graphs. In this paper, we identify the fundamental cause of structural divergence as the discrepancy of generative patterns between the pre-training and downstream graphs. Furthermore, we propose G-Tuning to preserve the generative patterns of downstream graphs. Given a downstream graph G, the core idea is to tune the pre-trained GNN so that it can reconstruct the generative patterns of G, the graphon W. However, the exact reconstruction of a graphon is known to be computationally expensive. To overcome this challenge, we provide a theoretical analysis that establishes the existence of a set of alternative graphons called graphon bases for any given graphon. By utilizing a linear combination of these graphon bases, we can efficiently approximate W. This theoretical finding forms the basis of our proposed model, as it enables effective learning of the graphon bases and their associated coefficients. Compared with existing algorithms, G-Tuning demonstrates an average improvement of 0.5% and 2.6% on in-domain and out-of-domain transfer learning experiments, respectively. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.13469 [pdf, other]

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation

Authors: Sudharshan Suresh, Haozhi Qi, Tingfan Wu, Taosha Fan, Luis Pineda, Mike Lambeta, Jitendra Malik, Mrinal Kalakrishnan, Roberto Calandra, Michael Kaess, Joseph Ortiz, Mustafa Mukadam

Abstract: To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects… ▽ More To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of $81$% and average pose drifts of $4.7\,\text{mm}$, further reduced to $2.3\,\text{mm}$ with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to $94$% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/ △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: 43 pages, 20 figures, 1 table; https://suddhu.github.io/neural-feels/

arXiv:2311.17431 [pdf, other]

Grounding Foundation Models through Federated Transfer Learning: A General Framework

Authors: Yan Kang, Tao Fan, Hanlin Gu, Xiao** Zhang, Lixin Fan, Qiang Yang

Abstract: Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges,… ▽ More Foundation Models (FMs) such as GPT-4 encoded with vast knowledge and powerful emergent abilities have achieved remarkable success in various natural language processing and computer vision tasks. Grounding FMs by adapting them to domain-specific tasks or augmenting them with domain-specific knowledge enables us to exploit the full potential of FMs. However, grounding FMs faces several challenges, stemming primarily from constrained computing resources, data privacy, model heterogeneity, and model ownership. Federated Transfer Learning (FTL), the combination of federated learning and transfer learning, provides promising solutions to address these challenges. In recent years, the need for grounding FMs leveraging FTL, coined FTL-FM, has arisen strongly in both academia and industry. Motivated by the strong growth in FTL-FM research and the potential impact of FTL-FM on industrial applications, we propose an FTL-FM framework that formulates problems of grounding FMs in the federated learning setting, construct a detailed taxonomy based on the FTL-FM framework to categorize state-of-the-art FTL-FM works, and comprehensively overview FTL-FM works based on the proposed taxonomy. We also establish correspondences between FTL-FM and conventional phases of adapting FM so that FM practitioners can align their research works with FTL-FM. In addition, we overview advanced efficiency-improving and privacy-preserving techniques because efficiency and privacy are critical concerns in FTL-FM. Last, we discuss opportunities and future research directions of FTL-FM. △ Less

Submitted 29 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: In progress

arXiv:2311.00684 [pdf, other]

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

Authors: Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky

Abstract: An ideal length-extrapolatable Transformer language model can handle sequences longer than the training length without any fine-tuning. Such long-context utilization capability relies heavily on a flexible positional embedding design. Upon investigating the flexibility of existing large pre-trained Transformer language models, we find that the T5 family deserves a closer look, as its positional em… ▽ More An ideal length-extrapolatable Transformer language model can handle sequences longer than the training length without any fine-tuning. Such long-context utilization capability relies heavily on a flexible positional embedding design. Upon investigating the flexibility of existing large pre-trained Transformer language models, we find that the T5 family deserves a closer look, as its positional embeddings capture rich and flexible attention patterns. However, T5 suffers from the dispersed attention issue: the longer the input sequence, the flatter the attention distribution. To alleviate the issue, we propose two attention alignment strategies via temperature scaling. Our findings show improvement on the long-context utilization capability of T5 on language modeling, retrieval, multi-document question answering, and code completion tasks without any fine-tuning. This suggests that a flexible positional embedding design and attention alignment can go a long way toward Transformer length extrapolation. △ Less

Submitted 15 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.10049 [pdf, other]

FATE-LLM: A Industrial Grade Federated Learning Framework for Large Language Models

Authors: Tao Fan, Yan Kang, Guoqiang Ma, Wei**g Chen, Wenbin Wei, Lixin Fan, Qiang Yang

Abstract: Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that training LLMs consumes vast computing resources, preventing LLMs from being adopted by small and medium-sized enterprises with limited computing resources. Another i… ▽ More Large Language Models (LLMs), such as ChatGPT, LLaMA, GLM, and PaLM, have exhibited remarkable performances across various tasks in recent years. However, LLMs face two main challenges in real-world applications. One challenge is that training LLMs consumes vast computing resources, preventing LLMs from being adopted by small and medium-sized enterprises with limited computing resources. Another is that training LLM requires a large amount of high-quality data, which are often scattered among enterprises. To address these challenges, we propose FATE-LLM, an industrial-grade federated learning framework for large language models. FATE-LLM (1) facilitates federated learning for large language models (coined FedLLM); (2) promotes efficient training of FedLLM using parameter-efficient fine-tuning methods; (3) protects the intellectual property of LLMs; (4) preserves data privacy during training and inference through privacy-preserving mechanisms. We release the code of FATE-LLM at https://github.com/FederatedAI/FATE-LLM to facilitate the research of FedLLM and enable a broad range of industrial applications. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.07412 [pdf, other]

Advancing Regular Language Reasoning in Linear Recurrent Neural Networks

Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky

Abstract: In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretica… ▽ More In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations in modeling regular language. Motivated by this analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN capable of performing length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic. The code is released at \url{https://github.com/tinghanf/RegluarLRNN}. △ Less

Submitted 9 April, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted at the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024). The first two authors contributed equally to this work

arXiv:2308.08012 [pdf, ps, other]

Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid Pooling

Authors: Wenjun Jiang, Tianlong Fan, Changhao Li, Chuanfu Zhang, Tao Zhang, Zong-fu Luo

Abstract: Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturin… ▽ More Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation. △ Less

Submitted 28 May, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 25 pages, 8 figures, 7 tables, journal

MSC Class: 68T07 (Primary) 90B25; 05C80; 05C82; 90B15; 90B18 (Secondary) ACM Class: I.2.6; G.2.2; J.4; F.2.2

arXiv:2305.13571 [pdf, other]

Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings

Authors: Ta-Chung Chi, Ting-Han Fan, Li-Wei Chen, Alexander I. Rudnicky, Peter J. Ramadge

Abstract: The use of positional embeddings in transformer language models is widely accepted. However, recent research has called into question the necessity of such embeddings. We further extend this inquiry by demonstrating that a randomly initialized and frozen transformer language model, devoid of positional embeddings, inherently encodes strong positional information through the shrinkage of self-atten… ▽ More The use of positional embeddings in transformer language models is widely accepted. However, recent research has called into question the necessity of such embeddings. We further extend this inquiry by demonstrating that a randomly initialized and frozen transformer language model, devoid of positional embeddings, inherently encodes strong positional information through the shrinkage of self-attention variance. To quantify this variance, we derive the underlying distribution of each step within a transformer layer. Through empirical validation using a fully pretrained model, we show that the variance shrinkage effect still persists after extensive gradient updates. Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023

arXiv:2305.10758 [pdf, other]

Extracting Low-/High- Frequency Knowledge from Graph Neural Networks and Injecting it into MLPs: An Effective GNN-to-MLP Distillation Framework

Authors: Lirong Wu, Haitao Lin, Yufei Huang, Tianyu Fan, Stan Z. Li

Abstract: Recent years have witnessed the great success of Graph Neural Networks (GNNs) in handling graph-related tasks. However, MLPs remain the primary workhorse for practical industrial applications due to their desirable inference efficiency and scalability. To reduce their gaps, one can directly distill knowledge from a well-designed teacher GNN to a student MLP, which is termed as GNN-to-MLP distillat… ▽ More Recent years have witnessed the great success of Graph Neural Networks (GNNs) in handling graph-related tasks. However, MLPs remain the primary workhorse for practical industrial applications due to their desirable inference efficiency and scalability. To reduce their gaps, one can directly distill knowledge from a well-designed teacher GNN to a student MLP, which is termed as GNN-to-MLP distillation. However, the process of distillation usually entails a loss of information, and ``which knowledge patterns of GNNs are more likely to be left and distilled into MLPs?" becomes an important question. In this paper, we first factorize the knowledge learned by GNNs into low- and high-frequency components in the spectral domain and then derive their correspondence in the spatial domain. Furthermore, we identified a potential information drowning problem for existing GNN-to-MLP distillation, i.e., the high-frequency knowledge of the pre-trained GNNs may be overwhelmed by the low-frequency knowledge during distillation; we have described in detail what it represents, how it arises, what impact it has, and how to deal with it. In this paper, we propose an efficient Full-Frequency GNN-to-MLP (FF-G2M) distillation framework, which extracts both low-frequency and high-frequency knowledge from GNNs and injects it into MLPs. Extensive experiments show that FF-G2M improves over the vanilla MLPs by 12.6% and outperforms its corresponding teacher GNNs by 2.6% averaged over six graph datasets and three common GNN architectures. △ Less

Submitted 4 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.07026 [pdf, other]

Decentralization and Acceleration Enables Large-Scale Bundle Adjustment

Authors: Taosha Fan, Joseph Ortiz, Ming Hsiao, Maurizio Monge, **g Dong, Todd Murphey, Mustafa Mukadam

Abstract: Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in computation and communication. In this paper, we present a fully decentralized method that alleviates computation and communication bottlenecks to solve arbitrarily lar… ▽ More Scaling to arbitrarily large bundle adjustment problems requires data and compute to be distributed across multiple devices. Centralized methods in prior works are only able to solve small or medium size problems due to overhead in computation and communication. In this paper, we present a fully decentralized method that alleviates computation and communication bottlenecks to solve arbitrarily large bundle adjustment problems. We achieve this by reformulating the reprojection error and deriving a novel surrogate function that decouples optimization variables from different devices. This function makes it possible to use majorization minimization techniques and reduces bundle adjustment to independent optimization subproblems that can be solved in parallel. We further apply Nesterov's acceleration and adaptive restart to improve convergence while maintaining its theoretical guarantees. Despite limited peer-to-peer communication, our method has provable convergence to first-order critical points under mild conditions. On extensive benchmarks with public datasets, our method converges much faster than decentralized baselines with similar memory usage and communication load. Compared to centralized baselines using a single device, our method, while being decentralized, yields more accurate solutions with significant speedups of up to 953.7x over Ceres and 174.6x over DeepLM. Code: https://joeaortiz.github.io/daba. △ Less

Submitted 8 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Robotics: Science and Systems (RSS), 2023

arXiv:2305.05356 [pdf, other]

Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching

Authors: Shuting Xia, Tingyu Fan, Yiling Xu, Jenq-Neng Hwang, Zhu Li

Abstract: 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compre… ▽ More 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compress the DPC geometry in latent space. Specifically, we propose a hierarchical motion estimation and motion compensation (Hie-ME/MC) framework for flexible inter-prediction, which dynamically selects the granularity of optical flow to encapsulate the motion information accurately. To improve the motion estimation efficiency of the proposed inter-prediction module, we further design a KNN-attention block matching (KABM) network that determines the impact of potential corresponding points based on the geometry and feature correlation. Finally, we compress the residual and the multi-scale optical flow with a fully-factorized deep entropy model. The experiment result on the MPEG-specified Owlii Dynamic Human Dynamic Point Cloud (Owlii) dataset shows that our framework outperforms the previous state-of-the-art methods and the MPEG standard V-PCC v18 in inter-frame low-delay mode. △ Less

Submitted 16 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: 9 pages for the main body, 3 pages for the supplemental after References

arXiv:2305.03796 [pdf, other]

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

Authors: Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge

Abstract: Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and suc… ▽ More Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of Weight-Sharing, Adaptive-Depth, and Sliding-Dilated-Attention, RegularGPT constructs working memory along the depth dimension, thereby enabling efficient and successful modeling of regular languages such as PARITY. We further test RegularGPT on the task of natural language length extrapolation and surprisingly find that it rediscovers the local windowed attention effect deemed necessary in prior work for length extrapolation. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.14356 [pdf, other]

S$^2$MAT: Simultaneous and Self-Reinforced Map** and Tracking in Dynamic Urban Scenariosorcing Framework for Simultaneous Map** and Tracking in Unbounded Urban Environments

Authors: Tingxiang Fan, Bowen Shen, Yinqiang Zhang, Chuye Zhang, Lei Yang, Hua Chen, Wei Zhang, Jia Pan

Abstract: Despite the increasing prevalence of robots in daily life, their navigation capabilities are still limited to environments with prior knowledge, such as a global map. To fully unlock the potential of robots, it is crucial to enable them to navigate in large-scale unknown and changing unstructured scenarios. This requires the robot to construct an accurate static map in real-time as it explores, wh… ▽ More Despite the increasing prevalence of robots in daily life, their navigation capabilities are still limited to environments with prior knowledge, such as a global map. To fully unlock the potential of robots, it is crucial to enable them to navigate in large-scale unknown and changing unstructured scenarios. This requires the robot to construct an accurate static map in real-time as it explores, while filtering out moving objects to ensure map** accuracy and, if possible, achieving high-quality pedestrian tracking and collision avoidance. While existing methods can achieve individual goals of spatial map** or dynamic object detection and tracking, there has been limited research on effectively integrating these two tasks, which are actually coupled and reciprocal. In this work, we propose a solution called S$^2$MAT (Simultaneous and Self-Reinforced Map** and Tracking) that integrates a front-end dynamic object detection and tracking module with a back-end static map** module. S$^2$MAT leverages the close and reciprocal interplay between these two modules to efficiently and effectively solve the open problem of simultaneous tracking and map** in highly dynamic scenarios. We conducted extensive experiments using widely-used datasets and simulations, providing both qualitative and quantitative results to demonstrate S$^2$MAT's state-of-the-art performance in dynamic object detection, tracking, and high-quality static structure map**. Additionally, we performed long-range robotic navigation in real-world urban scenarios spanning over 7 km, which included challenging obstacles like pedestrians and other traffic agents. The successful navigation provides a comprehensive test of S$^2$MAT's robustness, scalability, efficiency, quality, and its ability to benefit autonomous robots in wild scenarios without pre-built maps. △ Less

Submitted 20 November, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: homepage: https://sites.google.com/view/smat-nav

arXiv:2303.13277 [pdf, other]

SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

Authors: Chong Bao, Yinda Zhang, Bangbang Yang, Tianxing Fan, Zesong Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui

Abstract: Despite the great success in 2D editing using user-friendly tools, such as Photoshop, semantic strokes, or even text prompts, similar capabilities in 3D areas are still limited, either relying on 3D modeling skills or allowing editing within only a few categories. In this paper, we present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a sin… ▽ More Despite the great success in 2D editing using user-friendly tools, such as Photoshop, semantic strokes, or even text prompts, similar capabilities in 3D areas are still limited, either relying on 3D modeling skills or allowing editing within only a few categories. In this paper, we present a novel semantic-driven NeRF editing approach, which enables users to edit a neural radiance field with a single image, and faithfully delivers edited novel views with high fidelity and multi-view consistency. To achieve this goal, we propose a prior-guided editing field to encode fine-grained geometric and texture editing in 3D space, and develop a series of techniques to aid the editing process, including cyclic constraints with a proxy mesh to facilitate geometric supervision, a color compositing mechanism to stabilize semantic-driven texture editing, and a feature-cluster-based regularization to preserve the irrelevant content unchanged. Extensive experiments and editing examples on both real-world and synthetic data demonstrate that our method achieves photo-realistic 3D editing using only a single edited image, pushing the bound of semantic-driven editing in 3D real-world scenes. Our project webpage: https://zju3dv.github.io/sine/. △ Less

Submitted 25 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted to CVPR 2023. Project Page: https://zju3dv.github.io/sine/

arXiv:2212.11589 [pdf, other]

Simulation-based Testing of Simulink Models with Test Sequence and Test Assessment Blocks

Authors: Federico Formica, Tony Fan, Akshay Rajhans, Vera Pantelic, Mark Lawford, Claudio Menghi

Abstract: Simulation-based software testing supports engineers in finding faults in Simulink models. It typically relies on search algorithms that iteratively generate test inputs used to exercise models in simulation to detect design errors. While simulation-based software testing techniques are effective in many practical scenarios, they are typically not fully integrated within the Simulink environment a… ▽ More Simulation-based software testing supports engineers in finding faults in Simulink models. It typically relies on search algorithms that iteratively generate test inputs used to exercise models in simulation to detect design errors. While simulation-based software testing techniques are effective in many practical scenarios, they are typically not fully integrated within the Simulink environment and require additional manual effort. Many techniques require engineers to specify requirements using logical languages that are neither intuitive nor fully supported by Simulink, thereby limiting their adoption in industry. This work presents HECATE, a testing approach for Simulink models using Test Sequence and Test Assessment blocks from Simulink Test. Unlike existing testing techniques, HECATE uses information from Simulink models to guide the search-based exploration. Specifically, HECATE relies on information provided by the Test Sequence and Test Assessment blocks to guide the search procedure. Across a benchmark of 16 Simulink models from different domains and industries, our comparison of HECATE with the state-of-the-art testing tool S-TALIRO indicates that HECATE is both more effective (more failure-revealing test cases) and efficient (less iterations and computational time) than S-TALIRO for ~94% and ~81% of benchmark models respectively. Furthermore, HECATE successfully generated a failure-revealing test case for a representative case study from the automotive domain demonstrating its practical usefulness. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2212.10356 [pdf, other]

Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis

Authors: Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge

Abstract: Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to date. We dissect ALiBi via the lens of receptive field analysis empowered by a novel cumulative normalized gradient tool. The concept of receptive field further all… ▽ More Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences. A relative positional embedding design, ALiBi, has had the widest usage to date. We dissect ALiBi via the lens of receptive field analysis empowered by a novel cumulative normalized gradient tool. The concept of receptive field further allows us to modify the vanilla Sinusoidal positional embedding to create ~\textbf{Sandwich}, the first parameter-free relative positional embedding design that truly length information uses longer than the training sequence. Sandwich shares with KERPLE and T5 the same logarithmic decaying temporal bias pattern with learnable relative positional embeddings; these elucidate future extrapolatable positional embedding design. △ Less

Submitted 23 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted by ACL 2023

arXiv:2210.02694 [pdf, other]

doi 10.1002/nme.7207

Probabilistic partition of unity networks for high-dimensional regression problems

Authors: Tiffany Fan, Nathaniel Trask, Marta D'Elia, Eric Darve

Abstract: We explore the probabilistic partition of unity network (PPOU-Net) model in the context of high-dimensional regression problems and propose a general framework focusing on adaptive dimensionality reduction. With the proposed framework, the target function is approximated by a mixture of experts model on a low-dimensional manifold, where each cluster is associated with a local fixed-degree polynomi… ▽ More We explore the probabilistic partition of unity network (PPOU-Net) model in the context of high-dimensional regression problems and propose a general framework focusing on adaptive dimensionality reduction. With the proposed framework, the target function is approximated by a mixture of experts model on a low-dimensional manifold, where each cluster is associated with a local fixed-degree polynomial. We present a training strategy that leverages the expectation maximization (EM) algorithm. During the training, we alternate between (i) applying gradient descent to update the DNN coefficients; and (ii) using closed-form formulae derived from the EM algorithm to update the mixture of experts model parameters. Under the probabilistic formulation, step (ii) admits the form of embarrassingly parallelizable weighted least-squares solves. The PPOU-Nets consistently outperform the baseline fully-connected neural networks of comparable sizes in numerical experiments of various data dimensions. We also explore the proposed model in applications of quantum computing, where the PPOU-Nets act as surrogate models for cost landscapes associated with variational quantum circuits. △ Less

Submitted 11 June, 2023; v1 submitted 6 October, 2022; originally announced October 2022.

arXiv:2210.02099 [pdf, other]

Automated Graph Self-supervised Learning via Multi-teacher Knowledge Distillation

Authors: Lirong Wu, Yufei Huang, Haitao Lin, Zicheng Liu, Tianyu Fan, Stan Z. Li

Abstract: Self-supervised learning on graphs has recently achieved remarkable success in graph representation learning. With hundreds of self-supervised pretext tasks proposed over the past few years, the research community has greatly developed, and the key is no longer to design more powerful but complex pretext tasks, but to make more effective use of those already on hand. This paper studies the problem… ▽ More Self-supervised learning on graphs has recently achieved remarkable success in graph representation learning. With hundreds of self-supervised pretext tasks proposed over the past few years, the research community has greatly developed, and the key is no longer to design more powerful but complex pretext tasks, but to make more effective use of those already on hand. This paper studies the problem of how to automatically, adaptively, and dynamically learn instance-level self-supervised learning strategies for each node from a given pool of pretext tasks. In this paper, we propose a novel multi-teacher knowledge distillation framework for Automated Graph Self-Supervised Learning (AGSSL), which consists of two main branches: (i) Knowledge Extraction: training multiple teachers with different pretext tasks, so as to extract different levels of knowledge with different inductive biases; (ii) Knowledge Integration: integrating different levels of knowledge and distilling them into the student model. Without simply treating different teachers as equally important, we provide a provable theoretical guideline for how to integrate the knowledge of different teachers, i.e., the integrated teacher probability should be close to the true Bayesian class-probability. To approach the theoretical optimum in practice, two adaptive knowledge integration strategies are proposed to construct a relatively "good" integrated teacher. Extensive experiments on eight datasets show that AGSSL can benefit from multiple pretext tasks, outperforming the corresponding individual tasks; by combining a few simple but classical pretext tasks, the resulting performance is comparable to other leading counterparts. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.12512 [pdf, other]

Multiscale Latent-Guided Entropy Model for LiDAR Point Cloud Compression

Authors: Tingyu Fan, Linyao Gao, Yiling Xu, Dong Wang, Zhu Li

Abstract: The non-uniform distribution and extremely sparse nature of the LiDAR point cloud (LPC) bring significant challenges to its high-efficient compression. This paper proposes a novel end-to-end, fully-factorized deep framework that encodes the original LPC into an octree structure and hierarchically decomposes the octree entropy model in layers. The proposed framework utilizes a hierarchical latent v… ▽ More The non-uniform distribution and extremely sparse nature of the LiDAR point cloud (LPC) bring significant challenges to its high-efficient compression. This paper proposes a novel end-to-end, fully-factorized deep framework that encodes the original LPC into an octree structure and hierarchically decomposes the octree entropy model in layers. The proposed framework utilizes a hierarchical latent variable as side information to encapsulate the sibling and ancestor dependence, which provides sufficient context information for the modelling of point cloud distribution while enabling the parallel encoding and decoding of octree nodes in the same layer. Besides, we propose a residual coding framework for the compression of the latent variable, which explores the spatial correlation of each layer by progressive downsampling, and model the corresponding residual with a fully-factorized entropy model. Furthermore, we propose soft addition and subtraction for residual coding to improve network flexibility. The comprehensive experiment results on the LiDAR benchmark SemanticKITTI and MPEG-specified dataset Ford demonstrates that our proposed framework achieves state-of-the-art performance among all the previous LPC frameworks. Besides, our end-to-end, fully-factorized framework is proved by experiment to be high-parallelized and time-efficient and saves more than 99.8% of decoding time compared to previous state-of-the-art methods on LPC compression. △ Less

Submitted 14 February, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

arXiv:2207.13979

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

Authors: Song Tao, Zijian Wang, Tiantian Fan, Canjie Luo, Can Huang

Abstract: Due to the complex layouts of documents, it is challenging to extract information for documents. Most previous studies develop multimodal pre-trained models in a self-supervised way. In this paper, we focus on the embedding learning of word blocks containing text and layout information, and propose UTel, a language model with Unified TExt and Layout pre-training. Specifically, we propose two pre-t… ▽ More Due to the complex layouts of documents, it is challenging to extract information for documents. Most previous studies develop multimodal pre-trained models in a self-supervised way. In this paper, we focus on the embedding learning of word blocks containing text and layout information, and propose UTel, a language model with Unified TExt and Layout pre-training. Specifically, we propose two pre-training tasks: Surrounding Word Prediction (SWP) for the layout learning, and Contrastive learning of Word Embeddings (CWE) for identifying different word blocks. Moreover, we replace the commonly used 1D position embedding with a 1D clipped relative position embedding. In this way, the joint training of Masked Layout-Language Modeling (MLLM) and two newly proposed tasks enables the interaction between semantic and spatial features in a unified way. Additionally, the proposed UTel can process arbitrary-length sequences by removing the 1D position embedding, while maintaining competitive performance. Extensive experimental results show UTel learns better joint representations and achieves superior performance than previous methods on various downstream tasks, though requiring no image modality. Code is available at \url{https://github.com/taosong2019/UTel}. △ Less

Submitted 29 July, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

Comments: incomplete experiments

arXiv:2207.11467 [pdf, other]

CompNVS: Novel View Synthesis with Scene Completion

Authors: Zuoyue Li, Tianxing Fan, Zhenqiang Li, Zhaopeng Cui, Yoichi Sato, Marc Pollefeys, Martin R. Oswald

Abstract: We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. While generative neural approaches have demonstrated spectacular results on 2D images, they have not yet achieved similar photorealistic results in combination with scene completion where a spatial 3D scene understanding is essential. To this end, we propose a generative pipeline pe… ▽ More We introduce a scalable framework for novel view synthesis from RGB-D images with largely incomplete scene coverage. While generative neural approaches have demonstrated spectacular results on 2D images, they have not yet achieved similar photorealistic results in combination with scene completion where a spatial 3D scene understanding is essential. To this end, we propose a generative pipeline performing on a sparse grid-based neural scene representation to complete unobserved scene parts via a learned distribution of scenes in a 2.5D-3D-2.5D manner. We process encoded image features in 3D space with a geometry completion network and a subsequent texture inpainting network to extrapolate the missing area. Photorealistic image sequences can be finally obtained via consistency-relevant differentiable rendering. Comprehensive experiments show that the graphical outputs of our method outperform the state of the art, especially within unobserved scene parts. △ Less

Submitted 23 July, 2022; originally announced July 2022.

Comments: ECCV 2022

arXiv:2207.11456 [pdf, other]

doi 10.1109/TBDATA.2022.3192898

Accelerating Vertical Federated Learning

Authors: Dongqi Cai, Tao Fan, Yan Kang, Lixin Fan, Mengwei Xu, Shangguang Wang, Qiang Yang

Abstract: Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for dis… ▽ More Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for distributed features. To preserve privacy, homomorphic encryption is applied to enable encrypted operations without decryption. Nevertheless, together with a robust security guarantee, homomorphic encryption brings extra communication and computation overhead. In this paper, we analyze the current bottlenecks of vertical federated learning under homomorphic encryption comprehensively and numerically. We propose a straggler-resilient and computation-efficient accelerating system that reduces the communication overhead in heterogeneous scenarios by 65.26% at most and reduces the computation overhead caused by homomorphic encryption by 40.66% at most. Our system can improve the robustness and efficiency of the current vertical federated learning framework without loss of security. △ Less

Submitted 21 January, 2024; v1 submitted 23 July, 2022; originally announced July 2022.

arXiv:2207.11016 [pdf, other]

Search-based Software Testing Driven by Automatically Generated and Manually Defined Fitness Functions

Authors: Federico Formica, Tony Fan, Claudio Menghi

Abstract: Search-based software testing (SBST) typically relies on fitness functions to guide the search exploration toward software failures. There are two main techniques to define fitness functions: (a) automated fitness function computation from the specification of the system requirements, and (b) manual fitness function design. Both techniques have advantages. The former uses information from the syst… ▽ More Search-based software testing (SBST) typically relies on fitness functions to guide the search exploration toward software failures. There are two main techniques to define fitness functions: (a) automated fitness function computation from the specification of the system requirements, and (b) manual fitness function design. Both techniques have advantages. The former uses information from the system requirements to guide the search toward portions of the input domain more likely to contain failures. The latter uses the engineers' domain knowledge. We propose ATheNA, a novel SBST framework that combines fitness functions automatically generated from requirements specifications and those manually defined by engineers. We design and implement ATheNA-S, an instance of ATheNA that targets Simulink models. We evaluate ATheNA-S by considering a large set of models from different domains. Our results show that ATheNA-S generates more failure-revealing test cases than existing baseline tools and that the difference between the runtime performance of ATheNA-S and the baseline tools is not statistically significant. We also assess whether ATheNA-S could generate failure-revealing test cases when applied to two representative case studies: one from the automotive domain and one from the medical domain. Our results show that ATheNA-S successfully revealed a requirement violation in our case studies. △ Less

Submitted 7 September, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.09442 [pdf, other]

Theseus: A Library for Differentiable Nonlinear Optimization

Authors: Luis Pineda, Taosha Fan, Maurizio Monge, Shobha Venkataraman, Paloma Sodhi, Ricky T. Q. Chen, Joseph Ortiz, Daniel DeTone, Austin Wang, Stuart Anderson, **g Dong, Brandon Amos, Mustafa Mukadam

Abstract: We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnost… ▽ More We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai △ Less

Submitted 18 January, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: Advances in Neural Information Processing Systems (NeurIPS), 2022

arXiv:2206.15102 [pdf, other]

DynamicFilter: an Online Dynamic Objects Removal Framework for Highly Dynamic Environments

Authors: Tingxiang Fan, Bowen Shen, Hua Chen, Wei Zhang, Jia Pan

Abstract: Emergence of massive dynamic objects will diversify spatial structures when robots navigate in urban environments. Therefore, the online removal of dynamic objects is critical. In this paper, we introduce a novel online removal framework for highly dynamic urban environments. The framework consists of the scan-to-map front-end and the map-to-map back-end modules. Both the front- and back-ends deep… ▽ More Emergence of massive dynamic objects will diversify spatial structures when robots navigate in urban environments. Therefore, the online removal of dynamic objects is critical. In this paper, we introduce a novel online removal framework for highly dynamic urban environments. The framework consists of the scan-to-map front-end and the map-to-map back-end modules. Both the front- and back-ends deeply integrate the visibility-based approach and map-based approach. The experiments validate the framework in highly dynamic simulation scenarios and real-world datasets. △ Less

Submitted 30 June, 2022; originally announced June 2022.

Comments: ICRA 2022

arXiv:2206.07235 [pdf, other]

Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

Authors: Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J. Ramadge

Abstract: While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process. Monte Carlo is a common solution used in most variance reduction approaches. However, this involves time-consuming resampling and multiple function… ▽ More While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process. Monte Carlo is a common solution used in most variance reduction approaches. However, this involves time-consuming resampling and multiple function evaluations. We propose a Gapped Straight-Through (GST) estimator to reduce the variance without incurring resampling overhead. This estimator is inspired by the essential properties of Straight-Through Gumbel-Softmax. We determine these properties and show via an ablation study that they are essential. Experiments demonstrate that the proposed GST estimator enjoys better performance compared to strong baselines on two discrete deep generative modeling tasks, MNIST-VAE and ListOps. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: Accepted at the International Conference on Machine Learning (ICML) 2022. The first two authors contributed equally

arXiv:2205.09921 [pdf, other]

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Authors: Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky

Abstract: Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions k… ▽ More Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation. We propose KERPLE, a framework that generalizes relative position embedding for extrapolation by kernelizing positional differences. We achieve this goal using conditionally positive definite (CPD) kernels, a class of functions known for generalizing distance metrics. To maintain the inner product interpretation of self-attention, we show that a CPD kernel can be transformed into a PD kernel by adding a constant offset. This offset is implicitly absorbed in the Softmax normalization during self-attention. The diversity of CPD kernels allows us to derive various RPEs that enable length extrapolation in a principled way. Experiments demonstrate that the logarithmic variant achieves excellent extrapolation performance on three large language modeling datasets. Our implementation and pretrained checkpoints are released at https://github.com/chijames/KERPLE.git. △ Less

Submitted 13 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted at the 36th Conference on Neural Information Processing Systems (NeurIPS 2022). The first two authors contributed equally to this work

arXiv:2205.01135 [pdf, other]

D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction

Authors: Tingyu Fan, Linyao Gao, Yiling Xu, Zhu Li, Dong Wang

Abstract: The non-uniformly distributed nature of the 3D dynamic point cloud (DPC) brings significant challenges to its high-efficient inter-frame compression. This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression (D-DPCC) network to compensate and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. In the proposed D-DPCC n… ▽ More The non-uniformly distributed nature of the 3D dynamic point cloud (DPC) brings significant challenges to its high-efficient inter-frame compression. This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression (D-DPCC) network to compensate and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. In the proposed D-DPCC network, we design a {\it Multi-scale Motion Fusion} (MMF) module to accurately estimate the 3D optical flow between the feature representations of adjacent point cloud frames. Specifically, we utilize a 3D sparse convolution-based encoder to obtain the latent representation for motion estimation in the feature space and introduce the proposed MMF module for fused 3D motion embedding. Besides, for motion compensation, we propose a 3D {\it Adaptively Weighted Interpolation} (3DAWI) algorithm with a penalty coefficient to adaptively decrease the impact of distant neighbors. We compress the motion embedding and the residual with a lossy autoencoder-based network. To our knowledge, this paper is the first work proposing an end-to-end deep dynamic point cloud compression framework. The experimental result shows that the proposed D-DPCC framework achieves an average 76\% BD-Rate (Bjontegaard Delta Rate) gains against state-of-the-art Video-based Point Cloud Compression (V-PCC) v13 in inter mode. △ Less

Submitted 2 May, 2022; originally announced May 2022.

arXiv:2202.03846 [pdf]

The Soft Compiler: A Web-Based Tool for the Design of Modular Pneumatic Circuits for Soft Robots

Authors: Lauryn Whiteside, Savita V. Kendre, Tian Y. Fan, Jovanna A. Tracz, Gus T. Teran, Thomas C. Underwood, Mohammed E. Sayed, Haihui J. Jiang, Adam A. Stokes, Daniel J. Preston, George M. Whitesides, Markus P. Nemitz

Abstract: Develo** soft circuits from individual soft logic gates poses a unique challenge: with increasing numbers of logic gates, the design and implementation of circuits leads to inefficiencies due to mathematically unoptimized circuits and wiring mistakes during assembly. It is therefore practically important to introduce design tools that support the development of soft circuits. We developed a web-… ▽ More Develo** soft circuits from individual soft logic gates poses a unique challenge: with increasing numbers of logic gates, the design and implementation of circuits leads to inefficiencies due to mathematically unoptimized circuits and wiring mistakes during assembly. It is therefore practically important to introduce design tools that support the development of soft circuits. We developed a web-based graphical user interface, the Soft Compiler, that accepts a user-defined robot behavior as a truth table to generate a mathematically optimized circuit diagram that guides the assembly of a soft fluidic circuit. We describe the design and experimental verification of three soft circuits of increasing complexity, using the Soft Compiler as a design tool and a novel pneumatic glove as an input interface. In one example, we reduce the size of a soft circuit from the original 11 logic gates to 4 logic gates while maintaining circuit functionality. The Soft Compiler is a web-based design tool for fluidic, soft circuits and published under open-source MIT License. △ Less

Submitted 8 February, 2022; originally announced February 2022.

Comments: Accepted manuscript (journal): Robotics and Automation Letter, 2022

arXiv:2111.15113 [pdf, other]

LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human Bodies

Authors: Sandro Lombardi, Bangbang Yang, Tianxing Fan, Hujun Bao, Guofeng Zhang, Marc Pollefeys, Zhaopeng Cui

Abstract: 3D representation and reconstruction of human bodies have been studied for a long time in computer vision. Traditional methods rely mostly on parametric statistical linear models, limiting the space of possible bodies to linear combinations. It is only recently that some approaches try to leverage neural implicit representations for human body modeling, and while demonstrating impressive results,… ▽ More 3D representation and reconstruction of human bodies have been studied for a long time in computer vision. Traditional methods rely mostly on parametric statistical linear models, limiting the space of possible bodies to linear combinations. It is only recently that some approaches try to leverage neural implicit representations for human body modeling, and while demonstrating impressive results, they are either limited by representation capability or not physically meaningful and controllable. In this work, we propose a novel neural implicit representation for the human body, which is fully differentiable and optimizable with disentangled shape and pose latent spaces. Contrary to prior work, our representation is designed based on the kinematic model, which makes the representation controllable for tasks like pose animation, while simultaneously allowing the optimization of shape and pose for tasks like 3D fitting and pose tracking. Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses. Experiments demonstrate the improved 3D reconstruction performance over SoTA approaches and show the applicability of our method to shape interpolation, model fitting, pose tracking, and motion retargeting. △ Less

Submitted 29 November, 2021; originally announced November 2021.

Comments: Accepted to 3DV 2021. Project Page: https://latenthuman.github.io/

arXiv:2110.12352 [pdf, other]

DiffSRL: Learning Dynamical State Representation for Deformable Object Manipulation with Differentiable Simulator

Authors: Sirui Chen, Yunhao Liu, Jialong Li, Shang Wen Yao, Tingxiang Fan, Jia Pan

Abstract: Dynamic state representation learning is an important task in robot learning. Latent space that can capture dynamics related information has wide application in areas such as accelerating model free reinforcement learning, closing the simulation to reality gap, as well as reducing the motion planning complexity. However, current dynamic state representation learning methods scale poorly on complex… ▽ More Dynamic state representation learning is an important task in robot learning. Latent space that can capture dynamics related information has wide application in areas such as accelerating model free reinforcement learning, closing the simulation to reality gap, as well as reducing the motion planning complexity. However, current dynamic state representation learning methods scale poorly on complex dynamic systems such as deformable objects, and cannot directly embed well defined simulation function into the training pipeline. We propose DiffSRL, a dynamic state representation learning pipeline utilizing differentiable simulation that can embed complex dynamics models as part of the end-to-end training. We also integrate differentiable dynamic constraints as part of the pipeline which provide incentives for the latent state to be aware of dynamical constraints. We further establish a state representation learning benchmark on a soft-body simulation system, PlasticineLab, and our model demonstrates superior performance in terms of capturing long-term dynamics as well as reward prediction. △ Less

Submitted 25 July, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

Comments: 8 pages 9 figures

Journal ref: IEEE Robotics and Automation Letters, 2022

arXiv:2110.10927 [pdf, other]

SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree

Authors: Tao Fan, Wei**g Chen, Guoqiang Ma, Yan Kang, Lixin Fan, Qiang Yang

Abstract: Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBo… ▽ More Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions. △ Less

Submitted 18 June, 2024; v1 submitted 21 October, 2021; originally announced October 2021.

arXiv:2110.02421 [pdf, other]

Explaining Off-Policy Actor-Critic From A Bias-Variance Perspective

Authors: Ting-Han Fan, Peter J. Ramadge

Abstract: Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing… ▽ More Off-policy Actor-Critic algorithms have demonstrated phenomenal experimental performance but still require better explanations. To this end, we show its policy evaluation error on the distribution of transitions decomposes into: a Bellman error, a bias from policy mismatch, and a variance term from sampling. By comparing the magnitude of bias and variance, we explain the success of the Emphasizing Recent Experience sampling and 1/age weighted sampling. Both sampling strategies yield smaller bias and variance and are hence preferable to uniform sampling. △ Less

Submitted 5 October, 2021; originally announced October 2021.

arXiv:2109.08512 [pdf, other]

Soft Actor-Critic With Integer Actions

Authors: Ting-Han Fan, Yubo Wang

Abstract: Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure c… ▽ More Reinforcement learning is well-studied under discrete actions. Integer actions setting is popular in the industry yet still challenging due to its high dimensionality. To this end, we study reinforcement learning under integer actions by incorporating the Soft Actor-Critic (SAC) algorithm with an integer reparameterization. Our key observation for integer actions is that their discrete structure can be simplified using their comparability property. Hence, the proposed integer reparameterization does not need one-hot encoding and is of low dimensionality. Experiments show that the proposed SAC under integer actions is as good as the continuous action version on robot control tasks and outperforms Proximal Policy Optimization on power distribution systems control tasks. △ Less

Submitted 14 March, 2022; v1 submitted 17 September, 2021; originally announced September 2021.

Comments: The 2022 American Control Conference (ACC)

arXiv:2109.03970 [pdf, other]

PowerGym: A Reinforcement Learning Environment for Volt-Var Control in Power Distribution Systems

Authors: Ting-Han Fan, Xian Yeow Lee, Yubo Wang

Abstract: We introduce PowerGym, an open-source reinforcement learning environment for Volt-Var control in power distribution systems. Following OpenAI Gym APIs, PowerGym targets minimizing power loss and voltage violations under physical networked constraints. PowerGym provides four distribution systems (13Bus, 34Bus, 123Bus, and 8500Node) based on IEEE benchmark systems and design variants for various con… ▽ More We introduce PowerGym, an open-source reinforcement learning environment for Volt-Var control in power distribution systems. Following OpenAI Gym APIs, PowerGym targets minimizing power loss and voltage violations under physical networked constraints. PowerGym provides four distribution systems (13Bus, 34Bus, 123Bus, and 8500Node) based on IEEE benchmark systems and design variants for various control difficulties. To foster generalization, PowerGym offers a detailed customization guide for users working with their distribution systems. As a demonstration, we examine state-of-the-art reinforcement learning algorithms in PowerGym and validate the environment by studying controller behaviors. The repository is available at \url{https://github.com/siemens/powergym}. △ Less

Submitted 14 March, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: The 4th Annual Learning for Dynamics & Control Conference (L4DC) 2022

arXiv:2108.00083 [pdf, other]

Majorization Minimization Methods for Distributed Pose Graph Optimization

Authors: Taosha Fan, Todd Murphey

Abstract: We consider the problem of distributed pose graph optimization (PGO) that has important applications in multi-robot simultaneous localization and map** (SLAM). We propose the majorization minimization (MM) method for distributed PGO ($\mathsf{MM-PGO}$) that applies to a broad class of robust loss kernels. The $\mathsf{MM-PGO}$ method is guaranteed to converge to first-order critical points under… ▽ More We consider the problem of distributed pose graph optimization (PGO) that has important applications in multi-robot simultaneous localization and map** (SLAM). We propose the majorization minimization (MM) method for distributed PGO ($\mathsf{MM-PGO}$) that applies to a broad class of robust loss kernels. The $\mathsf{MM-PGO}$ method is guaranteed to converge to first-order critical points under mild conditions. Furthermore, noting that the $\mathsf{MM-PGO}$ method is reminiscent of proximal methods, we leverage Nesterov's method and adopt adaptive restarts to accelerate convergence. The resulting accelerated MM methods for distributed PGO -- both with a master node in the network ($\mathsf{AMM-PGO}^*$) and without ($\mathsf{AMM-PGO}^{\#}$) -- have faster convergence in contrast to the $\mathsf{AMM-PGO}$ method without sacrificing theoretical guarantees. In particular, the $\mathsf{AMM-PGO}^{\#}$ method, which needs no master node and is fully decentralized, features a novel adaptive restart scheme and has a rate of convergence comparable to that of the $\mathsf{AMM-PGO}^*$ method using a master node to aggregate information from all the other nodes. The efficacy of this work is validated through extensive applications to 2D and 3D SLAM benchmark datasets and comprehensive comparisons against existing state-of-the-art methods, indicating that our MM methods converge faster and result in better solutions to distributed PGO. △ Less

Submitted 23 January, 2023; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: 33 pages

arXiv:2105.13965 [pdf, other]

Revitalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Authors: Taosha Fan, Kalyan Vasudev Alwala, Donglai Xiang, Weipeng Xu, Todd Murphey, Mustafa Mukadam

Abstract: We propose a novel sparse constrained formulation and from it derive a real-time optimization method for 3D human pose and shape estimation. Our optimization method, SCOPE (Sparse Constrained Optimization for 3D human Pose and shapE estimation), is orders of magnitude faster (avg. 4 ms convergence) than existing optimization methods, while being mathematically equivalent to their dense unconstrain… ▽ More We propose a novel sparse constrained formulation and from it derive a real-time optimization method for 3D human pose and shape estimation. Our optimization method, SCOPE (Sparse Constrained Optimization for 3D human Pose and shapE estimation), is orders of magnitude faster (avg. 4 ms convergence) than existing optimization methods, while being mathematically equivalent to their dense unconstrained formulation under mild assumptions. We achieve this by exploiting the underlying sparsity and constraints of our formulation to efficiently compute the Gauss-Newton direction. We show that this computation scales linearly with the number of joints and measurements of a complex 3D human model, in contrast to prior work where it scales cubically due to their dense unconstrained formulation. Based on our optimization method, we present a real-time motion capture framework that estimates 3D human poses and shapes from a single image at over 30 FPS. In benchmarks against state-of-the-art methods on multiple public datasets, our framework outperforms other optimization methods and achieves competitive accuracy against regression methods. Project page with code and videos: https://sites.google.com/view/scope-human/. △ Less

Submitted 4 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: 21 pages, including appendix

arXiv:2104.11674 [pdf, other]

Genetic Constrained Graph Variational Autoencoder for COVID-19 Drug Discovery

Authors: Tianyue Cheng, Tianchi Fan, Landi Wang

Abstract: In the past several months, COVID-19 has spread over the globe and caused severe damage to the people and the society. In the context of this severe situation, an effective drug discovery method to generate potential drugs is extremely meaningful. In this paper, we provide a methodology of discovering potential drugs for the treatment of Severe Acute Respiratory Syndrome Corona-Virus 2 (commonly k… ▽ More In the past several months, COVID-19 has spread over the globe and caused severe damage to the people and the society. In the context of this severe situation, an effective drug discovery method to generate potential drugs is extremely meaningful. In this paper, we provide a methodology of discovering potential drugs for the treatment of Severe Acute Respiratory Syndrome Corona-Virus 2 (commonly known as SARS-CoV-2). We proposed a new model called Genetic Constrained Graph Variational Autoencoder (GCGVAE) to solve this problem. We trained our model based on the data of various viruses' protein structure, including that of the SARS, HIV, Hep3, and MERS, and used it to generate possible drugs for SARS-CoV-2. Several optimization algorithms, including valency masking and genetic algorithm, are deployed to fine tune our model. According to the simulation, our generated molecules have great effectiveness in inhibiting SARS-CoV-2. We quantitatively calculated the scores of our generated molecules and compared it with the scores of existing drugs, and the result shows our generated molecules scores much better than those existing drugs. Moreover, our model can be also applied to generate effective drugs for treating other viruses given their protein structure, which could be used to generate drugs for future viruses. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2103.03505 [pdf]

Prediction of financial time series using LSTM and data denoising methods

Authors: Qi Tang, Tongmei Fan, Ruchen Shi, **gyan Huang, Yidan Ma

Abstract: In order to further overcome the difficulties of the existing models in dealing with the non-stationary and nonlinear characteristics of high-frequency financial time series data, especially its weak generalization ability, this paper proposes an ensemble method based on data denoising methods, including the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term short-term memo… ▽ More In order to further overcome the difficulties of the existing models in dealing with the non-stationary and nonlinear characteristics of high-frequency financial time series data, especially its weak generalization ability, this paper proposes an ensemble method based on data denoising methods, including the wavelet transform (WT) and singular spectrum analysis (SSA), and long-term short-term memory neural network (LSTM) to build a data prediction model, The financial time series is decomposed and reconstructed by WT and SSA to denoise. Under the condition of denoising, the smooth sequence with effective information is reconstructed. The smoothing sequence is introduced into LSTM and the predicted value is obtained. With the Dow Jones industrial average index (DJIA) as the research object, the closing price of the DJIA every five minutes is divided into short-term (1 hour), medium-term (3 hours) and long-term (6 hours) respectively. . Based on root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE) and absolute percentage error standard deviation (SDAPE), the experimental results show that in the short-term, medium-term and long-term, data denoising can greatly improve the accuracy and stability of the prediction, and can effectively improve the generalization ability of LSTM prediction model. As WT and SSA can extract useful information from the original sequence and avoid overfitting, the hybrid model can better grasp the sequence pattern of the closing price of the DJIA. And the WT-LSTM model is better than the benchmark LSTM model and SSA-LSTM model. △ Less

Submitted 5 March, 2021; originally announced March 2021.

arXiv:2012.10099 [pdf, other]

Crowd-Driven Map**, Localization and Planning

Authors: Tingxiang Fan, Dawei Wang, Wenxi Liu, Jia Pan

Abstract: Navigation in dense crowds is a well-known open problem in robotics with many challenges in map**, localization, and planning. Traditional solutions consider dense pedestrians as passive/active moving obstacles that are the cause of all troubles: they negatively affect the sensing of static scene landmarks and must be actively avoided for safety. In this paper, we provide a new perspective: the… ▽ More Navigation in dense crowds is a well-known open problem in robotics with many challenges in map**, localization, and planning. Traditional solutions consider dense pedestrians as passive/active moving obstacles that are the cause of all troubles: they negatively affect the sensing of static scene landmarks and must be actively avoided for safety. In this paper, we provide a new perspective: the crowd flow locally observed can be treated as a sensory measurement about the surrounding scenario, encoding not only the scene's traversability but also its social navigation preference. We demonstrate that even using the crowd-flow measurement alone without any sensing about static obstacles, our method still accomplishes good results for map**, localization, and social-aware planning in dense crowds. Videos of the experiments are available at https://sites.google.com/view/crowdmap**. △ Less

Submitted 3 January, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: Accepted to ISER 2020

arXiv:2012.02709 [pdf, other]

Generalized Proximal Methods for Pose Graph Optimization

Authors: Taosha Fan, Todd Murphey

Abstract: In this paper, we generalize proximal methods that were originally designed for convex optimization on normed vector space to non-convex pose graph optimization (PGO) on special Euclidean groups, and show that our proposed generalized proximal methods for PGO converge to first-order critical points. Furthermore, we propose methods that significantly accelerate the rates of convergence almost witho… ▽ More In this paper, we generalize proximal methods that were originally designed for convex optimization on normed vector space to non-convex pose graph optimization (PGO) on special Euclidean groups, and show that our proposed generalized proximal methods for PGO converge to first-order critical points. Furthermore, we propose methods that significantly accelerate the rates of convergence almost without loss of any theoretical guarantees. In addition, our proposed methods can be easily distributed and parallelized with no compromise of efficiency. The efficacy of this work is validated through implementation on simultaneous localization and map** (SLAM) and distributed 3D sensor network localization, which indicate that our proposed methods are a lot faster than existing techniques to converge to sufficient accuracy for practical use. △ Less

Submitted 4 May, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

Comments: 29 pages

Journal ref: International Symposium on Robotics Research (ISRR), 2019

arXiv:2009.08586 [pdf, ps, other]

A Contraction Approach to Model-based Reinforcement Learning

Authors: Ting-Han Fan, Peter J. Ramadge

Abstract: Despite its experimental success, Model-based Reinforcement Learning still lacks a complete theoretical understanding. To this end, we analyze the error in the cumulative reward using a contraction approach. We consider both stochastic and deterministic state transitions for continuous (non-discrete) state and action spaces. This approach doesn't require strong assumptions and can recover the typi… ▽ More Despite its experimental success, Model-based Reinforcement Learning still lacks a complete theoretical understanding. To this end, we analyze the error in the cumulative reward using a contraction approach. We consider both stochastic and deterministic state transitions for continuous (non-discrete) state and action spaces. This approach doesn't require strong assumptions and can recover the typical quadratic error to the horizon. We prove that branched rollouts can reduce this error and are essential for deterministic transitions to have a Bellman contraction. Our analysis of policy mismatch error also applies to Imitation Learning. In this case, we show that GAN-type learning has an advantage over Behavioral Cloning when its discriminator is well-trained. △ Less

Submitted 25 February, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: The 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2008.08889 [pdf, other]

Autonomous Social Distancing in Urban Environments using a Quadruped Robot

Authors: Tingxiang Fan, Zhiming Chen, Xuan Zhao, **g Liang, Cong Shen, Dinesh Manocha, Jia Pan, Wei Zhang

Abstract: COVID-19 pandemic has become a global challenge faced by people all over the world. Social distancing has been proved to be an effective practice to reduce the spread of COVID-19. Against this backdrop, we propose that the surveillance robots can not only monitor but also promote social distancing. Robots can be flexibly deployed and they can take precautionary actions to remind people of practici… ▽ More COVID-19 pandemic has become a global challenge faced by people all over the world. Social distancing has been proved to be an effective practice to reduce the spread of COVID-19. Against this backdrop, we propose that the surveillance robots can not only monitor but also promote social distancing. Robots can be flexibly deployed and they can take precautionary actions to remind people of practicing social distancing. In this paper, we introduce a fully autonomous surveillance robot based on a quadruped platform that can promote social distancing in complex urban environments. Specifically, to achieve autonomy, we mount multiple cameras and a 3D LiDAR on the legged robot. The robot then uses an onboard real-time social distancing detection system to track nearby pedestrian groups. Next, the robot uses a crowd-aware navigation algorithm to move freely in highly dynamic scenarios. The robot finally uses a crowd-aware routing algorithm to effectively promote social distancing by using human-friendly verbal cues to send suggestions to over-crowded pedestrians. We demonstrate and validate that our robot can be operated autonomously by conducting several experiments in various urban scenarios. △ Less

Submitted 20 August, 2020; originally announced August 2020.

arXiv:2007.13393 [pdf, other]

Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry

Authors: Yifan Xu, Tianqi Fan, Yi Yuan, Gurprit Singh

Abstract: Deep implicit field regression methods are effective for 3D reconstruction from single-view images. However, the impact of different sampling patterns on the reconstruction quality is not well-understood. In this work, we first study the effect of point set discrepancy on the network training. Based on Farthest Point Sampling algorithm, we propose a sampling scheme that theoretically encourages be… ▽ More Deep implicit field regression methods are effective for 3D reconstruction from single-view images. However, the impact of different sampling patterns on the reconstruction quality is not well-understood. In this work, we first study the effect of point set discrepancy on the network training. Based on Farthest Point Sampling algorithm, we propose a sampling scheme that theoretically encourages better generalization performance, and results in fast convergence for SGD-based optimization algorithms. Secondly, based on the reflective symmetry of an object, we propose a feature fusion method that alleviates issues due to self-occlusions which makes it difficult to utilize local image features. Our proposed system Ladybird is able to create high quality 3D object reconstructions from a single input image. We evaluate Ladybird on a large scale 3D dataset (ShapeNet) demonstrating highly competitive results in terms of Chamfer distance, Earth Mover's distance and Intersection Over Union (IoU). △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: European Conference on Computer Vision 2020 (ECCV 2020)

Showing 1–50 of 68 results for author: Fan, T