-
Confounding-Robust Policy Improvement with Human-AI Teams
Authors:
Ruijiang Gao,
Mingzhang Yin
Abstract:
Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaborat…
▽ More
Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaboration by employing the marginal sensitivity model (MSM). Our approach combines domain expertise with AI-driven statistical modeling to account for potential confounders that may otherwise remain hidden. We present a deferral collaboration framework for incorporating the MSM into policy learning from observational data, enabling the system to control for the influence of unobserved confounding factors. In addition, we propose a personalized deferral collaboration system to leverage the diverse expertise of different human decision-makers. By adjusting for potential biases, our proposed solution enhances the robustness and reliability of collaborative outcomes. The empirical and theoretical analyses demonstrate the efficacy of our approach in mitigating unobserved confounding and improving the overall performance of human-AI collaborations.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations
Authors:
Zhuoyan Li,
Hangxiao Zhu,
Zhuoran Lu,
Ming Yin
Abstract:
The collection and curation of high-quality training data is crucial for develo** text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data…
▽ More
The collection and curation of high-quality training data is crucial for develo** text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data in supporting model training is inconsistent across different classification tasks. To better understand factors that moderate the effectiveness of the LLM-generated synthetic data, in this study, we look into how the performance of models trained on these synthetic data may vary with the subjectivity of classification. Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data. We conclude by discussing the implications of our work on the potential and limitations of leveraging LLM for synthetic data generation.
△ Less
Submitted 12 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG
Authors:
Dahlia Malkhi,
Chrysoula Stathakopoulou,
Maofan Yin
Abstract:
This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra typ…
▽ More
This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra types of messages or additional communication costs to BCB.
BBCA is harnessed into BBCA-CHAIN to make direct commits on a chained backbone of a causally ordered graph of blocks, without any additional voting blocks or artificial layering. With the help of Complete-Adopt, the additional knowledge gained from the underlying BCB completely removes the voting latency in popular DAG-based protocols. At the same time, causal ordering allows nodes to propose blocks in parallel and achieve high throughput. BBCA-CHAIN thus closes up the gap between protocols built by consistent broadcasts (e.g., Bullshark) to those without such an abstraction (e.g., PBFT/HotStuff), emphasizing their shared fundamental principles.
Using a Bracha-style BCB as an example, we fully specify BBCA-CHAIN with simplicity, serving as a solid basis for high-performance replication systems (and blockchains).
△ Less
Submitted 24 May, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
The effect of 3D stereopsis and hand-tool alignment on learning effectiveness and skill transfer of a VR-based simulator for dental training
Authors:
Maximilian Kaluschke,
Myat Su Yin,
Peter Haddawy,
Siriwan Suebnukarn,
Gabriel Zachmann
Abstract:
Dental simulators gained prevalence in recent years. Important aspects distinguishing VR hardware configurations are 3D stereoscopic rendering and visual alignment of the user's hands with the virtual tools. New dental simulators are often evaluated without analysing the impact of these simulation aspects. In this paper, we seek to determine the impact of 3D stereoscopic rendering and of hand-tool…
▽ More
Dental simulators gained prevalence in recent years. Important aspects distinguishing VR hardware configurations are 3D stereoscopic rendering and visual alignment of the user's hands with the virtual tools. New dental simulators are often evaluated without analysing the impact of these simulation aspects. In this paper, we seek to determine the impact of 3D stereoscopic rendering and of hand-tool alignment on the teaching effectiveness and skill assessment accuracy of a VR dental simulator. We developed a bimanual simulator using an HMD and two haptic devices that provides an immersive environment with both 3D stereoscopic rendering and hand-tool alignment. We then independently controlled for each of the two aspects of the simulation. We trained four groups of students in root canal access opening using the simulator and measured the virtual and real learning gains. We quantified the real learning gains by pre- and post-testing using realistic plastic teeth and the virtual learning gains by scoring the training outcomes inside the simulator. We developed a scoring metric to automatically score the training outcomes that strongly correlates with experts' scoring of those outcomes. We found that hand-tool alignment has a positive impact on virtual and real learning gains, and improves the accuracy of skill assessment. We found that stereoscopic 3D had a negative impact on virtual and real learning gains, however it improves the accuracy of skill assessment. This finding is counter-intuitive, and we found eye-tooth distance to be a confounding variable of stereoscopic 3D, as it was significantly lower for the monoscopic 3D condition and negatively correlates with real learning gain. The results of our study provide valuable information for the future design of dental simulators, as well as simulators for other high-precision psycho-motor tasks.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
Enumerating pattern-avoiding permutations by leading terms
Authors:
Ömer Eğecioğlu,
Collier Gaiser,
Mei Yin
Abstract:
The number of 123-avoiding permutation on $\{1,2,\ldots,n\}$ with a fixed leading terms is counted by the ballot numbers. The same holds for $132$-avoiding permutations. These results were proved by Miner and Pak using the Robinson-Schensted-Knuth (RSK) correspondence to connect permutations with Dyck paths. In this paper, we first provide an alternate proof of these enumeration results via a dire…
▽ More
The number of 123-avoiding permutation on $\{1,2,\ldots,n\}$ with a fixed leading terms is counted by the ballot numbers. The same holds for $132$-avoiding permutations. These results were proved by Miner and Pak using the Robinson-Schensted-Knuth (RSK) correspondence to connect permutations with Dyck paths. In this paper, we first provide an alternate proof of these enumeration results via a direct counting argument. We then study the number of pattern-avoiding permutations with a fixed prefix of length $t\geq1$, generalizing the $t=1$ case. We find exact expressions for single and pairs of patterns of length three as well as the pair $3412$ and $3421$. These expressions depend on $t$, the extrema, and the order statistics. We also define $r$-Wilf equivalence for permutations with a single fixed leading term $r$, and classify the $r$-Wilf-equivalence classes for both classical and vincular patterns of length three.
△ Less
Submitted 24 June, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games
Authors:
Songtao Feng,
Ming Yin,
Yu-Xiang Wang,
**g Yang,
Yingbin Liang
Abstract:
The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the d…
▽ More
The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.
△ Less
Submitted 5 June, 2024; v1 submitted 17 August, 2023;
originally announced August 2023.
-
Let's Give a Voice to Conversational Agents in Virtual Reality
Authors:
Michele Yin,
Gabriel Roccabruna,
Abhinav Azad,
Giuseppe Riccardi
Abstract:
The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and ad…
▽ More
The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and adding custom or cloud-based Speech-To-Text and Text-To-Speech models to make the interaction voice-based. Using this architecture, we present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
BBCA-LEDGER: High Throughput Consensus meets Low Latency
Authors:
Chrysoula Stathakopoulou,
Michael Wei,
Maofan Yin,
Hongbo Zhang,
Dahlia Malkhi
Abstract:
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks.…
▽ More
This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks. Finalizing log slots in parallel achieves both high throughput and low latency. BBCA-LEDGER is composed of two principal protocols that interweave together, a low-latency/high-throughput happy path, and a high-throughput DAG-based fallback path. The happy path employs a novel primitive called BBCA, a consistent broadcast enforcing unique slot numbering. In steady state, BBCA ensures that a transaction can be committed with low latency, in just 3 network steps. Under network partitions or faults, we harness recent advances in BFT and build a fallback mechanism on a direct acyclic graph (DAG) created by BBCA broadcasts. In this manner, BBCA-LEDGER exhibits the throughput benefits of DAG-based BFT in face of gaps.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data
Authors:
Sunil Madhow,
Dan Qiao,
Ming Yin,
Yu-Xiang Wang
Abstract:
Develo** theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution -- namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may…
▽ More
Develo** theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution -- namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may have been gathered adaptively. We develop theory for the TMIS Offline Policy Evaluation (OPE) estimator in this generalized setting for tabular MDPs, deriving high-probability, instance-dependent bounds on its estimation error. We also recover minimax-optimal offline learning in the adaptive setting. Finally, we conduct simulations to empirically analyze the behavior of these estimators under adaptive and non-adaptive regimes.
△ Less
Submitted 30 April, 2024; v1 submitted 24 June, 2023;
originally announced June 2023.
-
Some enumerative properties of parking functions
Authors:
Richard P. Stanley,
Mei Yin
Abstract:
A parking function is a sequence $(a_1,\dots, a_n)$ of positive integers such that if $b_1\leq\cdots\leq b_n$ is the increasing rearrangement of $a_1,\dots,a_n$, then $b_i\leq i$ for $1\leq i\leq n$. In this paper we obtain some new results on the enumeration of parking functions. We will consider the joint distribution of several sets of statistics on parking functions. The distribution of most o…
▽ More
A parking function is a sequence $(a_1,\dots, a_n)$ of positive integers such that if $b_1\leq\cdots\leq b_n$ is the increasing rearrangement of $a_1,\dots,a_n$, then $b_i\leq i$ for $1\leq i\leq n$. In this paper we obtain some new results on the enumeration of parking functions. We will consider the joint distribution of several sets of statistics on parking functions. The distribution of most of these individual statistics is known, but the joint distributions are new. Parking functions of length $n$ are in bijection with labelled forests on the vertex set $[n]=\{1,2,\dots,n\}$ (or rooted trees on $[n]_0=\{0,1,\dots,n\}$ with root $0$), so our results can also be applied to labelled forests. Extensions of our techniques are discussed.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework
Authors:
Minglei Yin,
Bin Liu,
Neil Zhenqiang Gong,
Xin Li
Abstract:
With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to…
▽ More
With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy.
△ Less
Submitted 11 June, 2023;
originally announced June 2023.
-
Zero-Shot Wireless Indoor Navigation through Physics-Informed Reinforcement Learning
Authors:
Mingsheng Yin,
Tao Li,
Haozhe Lei,
Yaqi Hu,
Sundeep Rangan,
Quanyan Zhu
Abstract:
The growing focus on indoor robot navigation utilizing wireless signals has stemmed from the capability of these signals to capture high-resolution angular and temporal measurements. Prior heuristic-based methods, based on radio frequency propagation, are intuitive and generalizable across simple scenarios, yet fail to navigate in complex environments. On the other hand, end-to-end (e2e) deep rein…
▽ More
The growing focus on indoor robot navigation utilizing wireless signals has stemmed from the capability of these signals to capture high-resolution angular and temporal measurements. Prior heuristic-based methods, based on radio frequency propagation, are intuitive and generalizable across simple scenarios, yet fail to navigate in complex environments. On the other hand, end-to-end (e2e) deep reinforcement learning (RL), powered by advanced computing machinery, can explore the entire state space, delivering surprising performance when facing complex wireless environments. However, the price to pay is the astronomical amount of training samples, and the resulting policy, without fine-tuning (zero-shot), is unable to navigate efficiently in new scenarios unseen in the training phase. To equip the navigation agent with sample-efficient learning and {zero-shot} generalization, this work proposes a novel physics-informed RL (PIRL) where a distance-to-target-based cost (standard in e2e) is augmented with physics-informed reward sha**. The key intuition is that wireless environments vary, but physics laws persist. After learning to utilize the physics information, the agent can transfer this knowledge across different tasks and navigate in an unknown environment without fine-tuning. The proposed PIRL is evaluated using a wireless digital twin (WDT) built upon simulations of a large class of indoor environments from the AI Habitat dataset augmented with electromagnetic (EM) radiation simulation for wireless signals. It is shown that the PIRL significantly outperforms both e2e RL and heuristic-based solutions in terms of generalization and performance. Source code is available at \url{https://github.com/Panshark/PIRL-WIN}.
△ Less
Submitted 15 September, 2023; v1 submitted 11 June, 2023;
originally announced June 2023.
-
Non-stationary Reinforcement Learning under General Function Approximation
Authors:
Songtao Feng,
Ming Yin,
Ruiquan Huang,
Yu-Xiang Wang,
**g Yang,
Yingbin Liang
Abstract:
General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension fo…
▽ More
General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Authors:
**qi Xiao,
Miao Yin,
Yu Gong,
Xiao Zang,
Jian Ren,
Bo Yuan
Abstract:
Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational costs, calling for efficient model compression solutions. To date, pruning ViTs has been well studied, while other compression strategies that have been widely…
▽ More
Attention-based vision models, such as Vision Transformer (ViT) and its variants, have shown promising performance in various computer vision tasks. However, these emerging architectures suffer from large model sizes and high computational costs, calling for efficient model compression solutions. To date, pruning ViTs has been well studied, while other compression strategies that have been widely applied in CNN compression, e.g., model factorization, is little explored in the context of ViT compression. This paper explores an efficient method for compressing vision transformers to enrich the toolset for obtaining compact attention-based vision models. Based on the new insight on the multi-head attention layer, we develop a highly efficient ViT compression solution, which outperforms the state-of-the-art pruning methods. For compressing DeiT-small and DeiT-base models on ImageNet, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to $2.6\times$ speedup) and lower extra storage cost (up to $1927.5\times$ reduction) than the existing works.
△ Less
Submitted 9 June, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Lessons from HotStuff
Authors:
Dahlia Malkhi,
Maofan Yin
Abstract:
This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem.
The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for dec…
▽ More
This article will take you on a journey to the core of blockchains, their Byzantine consensus engine, where HotStuff emerged as a new algorithmic foundation for the classical Byzantine generals consensus problem.
The first part of the article underscores the theoretical advances HotStuff enabled, including several models in which HotStuff-based solutions closed problems which were opened for decades.
The second part focuses on HotStuff performance in real life setting, where its simplicity drove adoption of HotStuff as the golden standard for blockchain design, and many variants and improvements built on top of it.
Both parts of this document are meant to describe lessons drawn from HotStuff as well as dispel certain myths.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
TheoremQA: A Theorem-driven Question Answering dataset
Authors:
Wenhu Chen,
Ming Yin,
Max Ku,
Pan Lu,
Yixin Wan,
Xueguang Ma,
Jianyu Xu,
Xinyi Wang,
Tony Xia
Abstract:
The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in solving fundamental math problems like GSM8K by achieving over 90% accuracy. However, their capabilities to solve more challenging math problems which require domain-specific knowledge (i.e. theorem) have yet to be investigated. In this paper, we introduce TheoremQA, the first theorem-driven question-answering dataset designed…
▽ More
The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in solving fundamental math problems like GSM8K by achieving over 90% accuracy. However, their capabilities to solve more challenging math problems which require domain-specific knowledge (i.e. theorem) have yet to be investigated. In this paper, we introduce TheoremQA, the first theorem-driven question-answering dataset designed to evaluate AI models' capabilities to apply theorems to solve challenging science problems. TheoremQA is curated by domain experts containing 800 high-quality questions covering 350 theorems (e.g. Taylor's theorem, Lagrange's theorem, Huffman coding, Quantum Theorem, Elasticity Theorem, etc) from Math, Physics, EE&CS, and Finance. We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts. We found that GPT-4's capabilities to solve these problems are unparalleled, achieving an accuracy of 51% with Program-of-Thoughts Prompting. All the existing open-sourced models are below 15%, barely surpassing the random-guess baseline. Given the diversity and broad coverage of TheoremQA, we believe it can be used as a better benchmark to evaluate LLMs' capabilities to solve challenging science problems. The data and code are released in https://github.com/wenhuchen/TheoremQA.
△ Less
Submitted 5 December, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Moments of Colored Permutation Statistics on Conjugacy Classes
Authors:
Jesse Campion Loth,
Michael Levet,
Kevin Liu,
Sheila Sundaram,
Mei Yin
Abstract:
In this paper, we consider the moments of statistics on conjugacy classes of the colored permutation groups $\mathfrak{S}_{n,r}=\mathbb{Z}_r\wr \mathfrak{S}_n$. We first show that any fixed moment coincides on all conjugacy classes where all cycles have sufficiently long length. Additionally, for permutation statistics that can be realized via a process we call symmetric extensions, these moments…
▽ More
In this paper, we consider the moments of statistics on conjugacy classes of the colored permutation groups $\mathfrak{S}_{n,r}=\mathbb{Z}_r\wr \mathfrak{S}_n$. We first show that any fixed moment coincides on all conjugacy classes where all cycles have sufficiently long length. Additionally, for permutation statistics that can be realized via a process we call symmetric extensions, these moments are polynomials in $n$. Finally, for the descent statistic on the hyperoctahedral group $B_n\cong \mathfrak{S}_{n,2}$, we show that its distribution on conjugacy classes without short cycles satisfies a central limit theorem. Our results build on and generalize previous work of Fulman (\textit{J. Comb. Theory Ser. A.}, 1998), Hamaker and Rhoades (arXiv, 2022), and Campion Loth, Levet, Liu, Stucky, Sundaram, and Yin (arXiv, 2023). In particular, our techniques utilize the combinatorial framework introduced by Campion Loth, Levet, Liu, Stucky, Sundaram, and Yin.
△ Less
Submitted 27 December, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Blow-up phenomena for a class of extensible beam equations
Authors:
Gongwei Liu,
Mengyun Yin,
Suxia Xia
Abstract:
In this paper, we investigate the initial boundary value problem of the following nonlinear extensible beam equation with nonlinear dam** term $$u_{t t}+Δ^2 u-M\left(\|\nabla u\|^2\right) Δu-Δu_t+\left|u_t\right|^{r-1} u_t=|u|^{p-1} u$$ which was considered by Yang et al. (Advanced Nonlinear Studies 2022; 22:436-468). We consider the problem with the nonlinear dam** and establish the finite ti…
▽ More
In this paper, we investigate the initial boundary value problem of the following nonlinear extensible beam equation with nonlinear dam** term $$u_{t t}+Δ^2 u-M\left(\|\nabla u\|^2\right) Δu-Δu_t+\left|u_t\right|^{r-1} u_t=|u|^{p-1} u$$ which was considered by Yang et al. (Advanced Nonlinear Studies 2022; 22:436-468). We consider the problem with the nonlinear dam** and establish the finite time blow-up of the solution for the initial data at arbitrary high energy level, including the estimate lower and upper bounds of the blowup time. The result provides some affirmative answer to the open problems given in (Advanced Nonlinear Studies 2022; 22:436-468).
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
Interactive Concept Learning for Uncovering Latent Themes in Large Text Collections
Authors:
Maria Leonor Pacheco,
Tunazzina Islam,
Lyle Ungar,
Ming Yin,
Dan Goldwasser
Abstract:
Experts across diverse disciplines are often interested in making sense of large text collections. Traditionally, this challenge is approached either by noisy unsupervised techniques such as topic models, or by following a manual theme discovery process. In this paper, we expand the definition of a theme to account for more than just a word distribution, and include generalized concepts deemed rel…
▽ More
Experts across diverse disciplines are often interested in making sense of large text collections. Traditionally, this challenge is approached either by noisy unsupervised techniques such as topic models, or by following a manual theme discovery process. In this paper, we expand the definition of a theme to account for more than just a word distribution, and include generalized concepts deemed relevant by domain experts. Then, we propose an interactive framework that receives and encodes expert feedback at different levels of abstraction. Our framework strikes a balance between automation and manual coding, allowing experts to maintain control of their study while reducing the manual effort required.
△ Less
Submitted 8 May, 2023;
originally announced May 2023.
-
A Generative Modeling Framework for Inferring Families of Biomechanical Constitutive Laws in Data-Sparse Regimes
Authors:
Minglang Yin,
Zongren Zou,
Enrui Zhang,
Cristina Cavinato,
Jay D. Humphrey,
George Em Karniadakis
Abstract:
Quantifying biomechanical properties of the human vasculature could deepen our understanding of cardiovascular diseases. Standard nonlinear regression in constitutive modeling requires considerable high-quality data and an explicit form of the constitutive model as prior knowledge. By contrast, we propose a novel approach that combines generative deep learning with Bayesian inference to efficientl…
▽ More
Quantifying biomechanical properties of the human vasculature could deepen our understanding of cardiovascular diseases. Standard nonlinear regression in constitutive modeling requires considerable high-quality data and an explicit form of the constitutive model as prior knowledge. By contrast, we propose a novel approach that combines generative deep learning with Bayesian inference to efficiently infer families of constitutive relationships in data-sparse regimes. Inspired by the concept of functional priors, we develop a generative adversarial network (GAN) that incorporates a neural operator as the generator and a fully-connected neural network as the discriminator. The generator takes a vector of noise conditioned on measurement data as input and yields the predicted constitutive relationship, which is scrutinized by the discriminator in the following step. We demonstrate that this framework can accurately estimate means and standard deviations of the constitutive relationships of the murine aorta using data collected either from model-generated synthetic data or ex vivo experiments for mice with genetic deficiencies. In addition, the framework learns priors of constitutive models without explicitly knowing their functional form, providing a new model-agnostic approach to learning hidden constitutive behaviors from data.
△ Less
Submitted 4 May, 2023;
originally announced May 2023.
-
Neutral Atom Quantum Computing Hardware: Performance and End-User Perspective
Authors:
Karen Wintersperger,
Florian Dommert,
Thomas Ehmer,
Andrey Hoursanov,
Johannes Klepsch,
Wolfgang Mauerer,
Georg Reuber,
Thomas Strohm,
Ming Yin,
Sebastian Luber
Abstract:
We present an industrial end-user perspective on the current state of quantum computing hardware for one specific technological approach, the neutral atom platform. Our aim is to assist developers in understanding the impact of the specific properties of these devices on the effectiveness of algorithm execution. Based on discussions with different vendors and recent literature, we discuss the perf…
▽ More
We present an industrial end-user perspective on the current state of quantum computing hardware for one specific technological approach, the neutral atom platform. Our aim is to assist developers in understanding the impact of the specific properties of these devices on the effectiveness of algorithm execution. Based on discussions with different vendors and recent literature, we discuss the performance data of the neutral atom platform. Specifically, we focus on the physical qubit architecture, which affects state preparation, qubit-to-qubit connectivity, gate fidelities, native gate instruction set, and individual qubit stability. These factors determine both the quantum-part execution time and the end-to-end wall clock time relevant for end-users, but also the ability to perform fault-tolerant quantum computation in the future. We end with an overview of which applications have been shown to be well suited for the peculiar properties of neutral atom-based quantum computers.
△ Less
Submitted 15 September, 2023; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Sampling planar tanglegrams and pairs of disjoint triangulations
Authors:
Alexander E. Black,
Kevin Liu,
Alex Mcdonough,
Garrett Nelson,
Michael C. Wigal,
Mei Yin,
Youngho Yoo
Abstract:
A tanglegram consists of two rooted binary trees and a perfect matching between their leaves, and a planar tanglegram is one that admits a layout with no crossings. We show that the problem of generating planar tanglegrams uniformly at random reduces to the corresponding problem for irreducible planar tanglegram layouts, which are known to be in bijection with pairs of disjoint triangulations of a…
▽ More
A tanglegram consists of two rooted binary trees and a perfect matching between their leaves, and a planar tanglegram is one that admits a layout with no crossings. We show that the problem of generating planar tanglegrams uniformly at random reduces to the corresponding problem for irreducible planar tanglegram layouts, which are known to be in bijection with pairs of disjoint triangulations of a convex polygon. We extend the flip operation on a single triangulation to a flip operation on pairs of disjoint triangulations. Interestingly, the resulting flip graph is both connected and regular, and hence a random walk on this graph converges to the uniform distribution. We also show that the restriction of the flip graph to the pairs with a fixed triangulation in either coordinate is connected, and give diameter bounds that are near optimal. Our results furthermore yield new insight into the flip graph of triangulations of a convex $n$-gon with a geometric interpretation on the associahedron.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
A Dual System-Level Parameterization for Identification from Closed-Loop Data
Authors:
Amber Srivastava,
Mingzhou Yin,
Andrea Iannelli,
Roy S. Smith
Abstract:
This work presents a dual system-level parameterization (D-SLP) method for closed-loop system identification. The recent system-level synthesis framework parameterizes all stabilizing controllers via linear constraints on closed-loop response functions, known as system-level parameters. It was demonstrated that several structural, locality, and communication constraints on the controller can be po…
▽ More
This work presents a dual system-level parameterization (D-SLP) method for closed-loop system identification. The recent system-level synthesis framework parameterizes all stabilizing controllers via linear constraints on closed-loop response functions, known as system-level parameters. It was demonstrated that several structural, locality, and communication constraints on the controller can be posed as convex constraints on these system-level parameters. In the current work, the identification problem is treated as a {\em dual} of the system-level synthesis problem. The plant model is identified from the dual system-level parameters associated to the plant. In comparison to existing closed-loop identification approaches (such as the dual-Youla parameterization), the D-SLP framework neither requires the knowledge of a nominal plant that is stabilized by the known controller, nor depends upon the choice of factorization of the nominal plant and the stabilizing controller. Numerical simulations demonstrate the efficacy of the proposed D-SLP method in terms of identification errors, compared to existing closed-loop identification techniques.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
An ontology-aided, natural language-based approach for multi-constraint BIM model querying
Authors:
Mengtian Yin,
Llewellyn Tang,
Chris Webster,
Shen Xu,
Xiongyi Li,
Huaquan Ying
Abstract:
Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fin…
▽ More
Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fine-grained queries, dampening the usability of natural language (NL)-based BIM queries. This paper presents a novel ontology-aided semantic parser to automatically map natural language queries (NLQs) that contain different attribute and relational constraints into computer-readable codes for querying complex BIM models. First, a modular ontology was developed to represent NL expressions of Industry Foundation Classes (IFC) concepts and relationships, and was then populated with entities from target BIM models to assimilate project-specific information. Hereafter, the ontology-aided semantic parser progressively extracts concepts, relationships, and value restrictions from NLQs to fully identify constraint conditions, resulting in standard SPARQL queries with reasoning rules to successfully retrieve IFC-based BIM models. The approach was evaluated based on 225 NLQs collected from BIM users, with a 91% accuracy rate. Finally, a case study about the design-checking of a real-world residential building demonstrates the practical value of the proposed approach in the construction industry.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Error Bounds for Kernel-Based Linear System Identification with Unknown Hyperparameters
Authors:
Mingzhou Yin,
Roy S. Smith
Abstract:
The kernel-based method has been successfully applied in linear system identification using stable kernel designs. From a Gaussian process perspective, it automatically provides probabilistic error bounds for the identified models from the posterior covariance, which are useful in robust and stochastic control. However, the error bounds require knowledge of the true hyperparameters in the kernel d…
▽ More
The kernel-based method has been successfully applied in linear system identification using stable kernel designs. From a Gaussian process perspective, it automatically provides probabilistic error bounds for the identified models from the posterior covariance, which are useful in robust and stochastic control. However, the error bounds require knowledge of the true hyperparameters in the kernel design and are demonstrated to be inaccurate with estimated hyperparameters for lightly damped systems or in the presence of high noise. In this work, we provide reliable quantification of the estimation error when the hyperparameters are unknown. The bounds are obtained by first constructing a high-probability set for the true hyperparameters from the marginal likelihood function and then finding the worst-case posterior covariance within the set. The proposed bound is proven to contain the true model with a high probability and its validity is verified in numerical simulation.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Path Planning Under Uncertainty to Localize mmWave Sources
Authors:
Kai Pfeiffer,
Yuze Jia,
Mingsheng Yin,
Akshaj Kumar Veldanda,
Yaqi Hu,
Amee Trivedi,
Jeff Zhang,
Siddharth Garg,
Elza Erkip,
Sundeep Rangan,
Ludovic Righetti
Abstract:
In this paper, we study a navigation problem where a mobile robot needs to locate a mmWave wireless signal. Using the directionality properties of the signal, we propose an estimation and path planning algorithm that can efficiently navigate in cluttered indoor environments. We formulate Extended Kalman filters for emitter location estimation in cases where the signal is received in line-of-sight…
▽ More
In this paper, we study a navigation problem where a mobile robot needs to locate a mmWave wireless signal. Using the directionality properties of the signal, we propose an estimation and path planning algorithm that can efficiently navigate in cluttered indoor environments. We formulate Extended Kalman filters for emitter location estimation in cases where the signal is received in line-of-sight or after reflections. We then propose to plan motion trajectories based on belief-space dynamics in order to minimize the uncertainty of the position estimates. The associated non-linear optimization problem is solved by a state-of-the-art constrained iLQR solver. In particular, we propose a method that can handle a large number of obstacles (~300) with reasonable computation times. We validate the approach in an extensive set of simulations. We show that our estimators can help increase navigation success rate and that planning to reduce estimation uncertainty can improve the overall task completion speed.
△ Less
Submitted 8 March, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
No-Regret Linear Bandits beyond Realizability
Authors:
Chong Liu,
Ming Yin,
Yu-Xiang Wang
Abstract:
We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportion…
▽ More
We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $ε$ that measures the sup-norm error of the best linear approximation. This results in an unavoidable linear regret whenever $ε> 0$. We describe a more natural model of misspecification which only requires the approximation error at each input $x$ to be proportional to the suboptimality gap at $x$. It captures the intuition that, for optimization problems, near-optimal regions should matter more and we can tolerate larger approximation errors in suboptimal regions. Quite surprisingly, we show that the classical LinUCB algorithm -- designed for the realizable case -- is automatically robust against such gap-adjusted misspecification. It achieves a near-optimal $\sqrt{T}$ regret for problems that the best-known regret is almost linear in time horizon $T$. Technically, our proof relies on a novel self-bounding argument that bounds the part of the regret due to misspecification by the regret itself.
△ Less
Submitted 19 July, 2023; v1 submitted 26 February, 2023;
originally announced February 2023.
-
Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs
Authors:
Dan Qiao,
Ming Yin,
Yu-Xiang Wang
Abstract:
In many real-life reinforcement learning (RL) problems, deploying new policies is costly. In those scenarios, algorithms must solve exploration (which requires adaptivity) while switching the deployed policy sparsely (which limits adaptivity). In this paper, we go beyond the existing state-of-the-art on this problem that focused on linear Markov Decision Processes (MDPs) by considering linear Bell…
▽ More
In many real-life reinforcement learning (RL) problems, deploying new policies is costly. In those scenarios, algorithms must solve exploration (which requires adaptivity) while switching the deployed policy sparsely (which limits adaptivity). In this paper, we go beyond the existing state-of-the-art on this problem that focused on linear Markov Decision Processes (MDPs) by considering linear Bellman-complete MDPs with low inherent Bellman error. We propose the ELEANOR-LowSwitching algorithm that achieves the near-optimal regret with a switching cost logarithmic in the number of episodes and linear in the time-horizon $H$ and feature dimension $d$. We also prove a lower bound proportional to $dH$ among all algorithms with sublinear regret. In addition, we show the ``doubling trick'' used in ELEANOR-LowSwitching can be further leveraged for the generalized linear function approximation, under which we design a sample-efficient algorithm with near-optimal switching cost.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
A Counterfactual Collaborative Session-based Recommender System
Authors:
Wenzhuo Song,
Shou** Wang,
Yan Wang,
Kunpeng Liu,
Xueyan Liu,
Minghao Yin
Abstract:
Most session-based recommender systems (SBRSs) focus on extracting information from the observed items in the current session of a user to predict a next item, ignoring the causes outside the session (called outer-session causes, OSCs) that influence the user's selection of items. However, these causes widely exist in the real world, and few studies have investigated their role in SBRSs. In this w…
▽ More
Most session-based recommender systems (SBRSs) focus on extracting information from the observed items in the current session of a user to predict a next item, ignoring the causes outside the session (called outer-session causes, OSCs) that influence the user's selection of items. However, these causes widely exist in the real world, and few studies have investigated their role in SBRSs. In this work, we analyze the causalities and correlations of the OSCs in SBRSs from the perspective of causal inference. We find that the OSCs are essentially the confounders in SBRSs, which leads to spurious correlations in the data used to train SBRS models. To address this problem, we propose a novel SBRS framework named COCO-SBRS (COunterfactual COllaborative Session-Based Recommender Systems) to learn the causality between OSCs and user-item interactions in SBRSs. COCO-SBRS first adopts a self-supervised approach to pre-train a recommendation model by designing pseudo-labels of causes for each user's selection of the item in data to guide the training process. Next, COCO-SBRS adopts counterfactual inference to recommend items based on the outputs of the pre-trained recommendation model considering the causalities to alleviate the data sparsity problem. As a result, COCO-SBRS can learn the causalities in data, preventing the model from learning spurious correlations. The experimental results of our extensive experiments conducted on three real-world datasets demonstrate the superiority of our proposed framework over ten representative SBRSs.
△ Less
Submitted 6 May, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Bi-AM-RRT*: A Fast and Efficient Sampling-Based Motion Planning Algorithm in Dynamic Environments
Authors:
Ying Zhang,
Heyong Wang,
Maoliang Yin,
Jiankun Wang,
Changchun Hua
Abstract:
The efficiency of sampling-based motion planning brings wide application in autonomous mobile robots. The conventional rapidly exploring random tree (RRT) algorithm and its variants have gained significant successes, but there are still challenges for the optimal motion planning of mobile robots in dynamic environments. In this paper, based on Bidirectional RRT and the use of an assisting metric (…
▽ More
The efficiency of sampling-based motion planning brings wide application in autonomous mobile robots. The conventional rapidly exploring random tree (RRT) algorithm and its variants have gained significant successes, but there are still challenges for the optimal motion planning of mobile robots in dynamic environments. In this paper, based on Bidirectional RRT and the use of an assisting metric (AM), we propose a novel motion planning algorithm, namely Bi-AM-RRT*. Different from the existing RRT-based methods, the AM is introduced in this paper to optimize the performance of robot motion planning in dynamic environments with obstacles. On this basis, the bidirectional search sampling strategy is employed to reduce the search time. Further, we present a new rewiring method to shorten path lengths. The effectiveness and efficiency of the proposed Bi-AM-RRT* are proved through comparative experiments in different environments. Experimental results show that the Bi-AM-RRT* algorithm can achieve better performance in terms of path length and search time, and always finds near-optimal paths with the shortest search time when the diffusion metric is used as the AM.
△ Less
Submitted 30 April, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks
Authors:
**qi Xiao,
Chengming Zhang,
Yu Gong,
Miao Yin,
Yang Sui,
Lizhi Xiang,
Dingwen Tao,
Bo Yuan
Abstract:
Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed…
▽ More
Low-rank compression is an important model compression strategy for obtaining compact neural network models. In general, because the rank values directly determine the model complexity and model accuracy, proper selection of layer-wise rank is very critical and desired. To date, though many low-rank compression approaches, either selecting the ranks in a manual or automatic way, have been proposed, they suffer from costly manual trials or unsatisfied compression performance. In addition, all of the existing works are not designed in a hardware-aware way, limiting the practical performance of the compressed models on real-world hardware platforms.
To address these challenges, in this paper we propose HALOC, a hardware-aware automatic low-rank compression framework. By interpreting automatic rank selection from an architecture search perspective, we develop an end-to-end solution to determine the suitable layer-wise ranks in a differentiable and hardware-aware way. We further propose design principles and mitigation strategy to efficiently explore the rank space and reduce the potential interference problem.
Experimental results on different datasets and hardware platforms demonstrate the effectiveness of our proposed approach. On CIFAR-10 dataset, HALOC enables 0.07% and 0.38% accuracy increase over the uncompressed ResNet-20 and VGG-16 models with 72.20% and 86.44% fewer FLOPs, respectively. On ImageNet dataset, HALOC achieves 0.9% higher top-1 accuracy than the original ResNet-18 model with 66.16% fewer FLOPs. HALOC also shows 0.66% higher top-1 accuracy increase than the state-of-the-art automatic low-rank compression solution with fewer computational and memory costs. In addition, HALOC demonstrates the practical speedups on different hardware platforms, verified by the measurement results on desktop GPU, embedded GPU and ASIC accelerator.
△ Less
Submitted 1 February, 2023; v1 submitted 19 January, 2023;
originally announced January 2023.
-
Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making
Authors:
Shuai Ma,
Ying Lei,
Xinru Wang,
Chengbo Zheng,
Chuhan Shi,
Ming Yin,
Xiaojuan Ma
Abstract:
In AI-assisted decision-making, it is critical for human decision-makers to know when to trust AI and when to trust themselves. However, prior studies calibrated human trust only based on AI confidence indicating AI's correctness likelihood (CL) but ignored humans' CL, hindering optimal team decision-making. To mitigate this gap, we proposed to promote humans' appropriate trust based on the CL of…
▽ More
In AI-assisted decision-making, it is critical for human decision-makers to know when to trust AI and when to trust themselves. However, prior studies calibrated human trust only based on AI confidence indicating AI's correctness likelihood (CL) but ignored humans' CL, hindering optimal team decision-making. To mitigate this gap, we proposed to promote humans' appropriate trust based on the CL of both sides at a task-instance level. We first modeled humans' CL by approximating their decision-making models and computing their potential performance in similar instances. We demonstrated the feasibility and effectiveness of our model via two preliminary studies. Then, we proposed three CL exploitation strategies to calibrate users' trust explicitly/implicitly in the AI-assisted decision-making process. Results from a between-subjects experiment (N=293) showed that our CL exploitation strategies promoted more appropriate human trust in AI, compared with only using AI confidence. We further provided practical implications for more human-compatible AI-assisted decision-making.
△ Less
Submitted 13 January, 2023;
originally announced January 2023.
-
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer
Authors:
Miao Yin,
Burak Uzkent,
Yilin Shen,
Hongxia **,
Bo Yuan
Abstract:
The recently proposed Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks, and they are viewed as an important type of foundation model. However, ViTs are typically constructed with large-scale sizes, which then severely hinder their potential deployment in many practical resources-constrained applications. To mitigate this challenging probl…
▽ More
The recently proposed Vision transformers (ViTs) have shown very impressive empirical performance in various computer vision tasks, and they are viewed as an important type of foundation model. However, ViTs are typically constructed with large-scale sizes, which then severely hinder their potential deployment in many practical resources-constrained applications. To mitigate this challenging problem, structured pruning is a promising solution to compress model size and enable practical efficiency. However, unlike its current popularity for CNNs and RNNs, structured pruning for ViT models is little explored.
In this paper, we propose GOHSP, a unified framework of Graph and Optimization-based Structured Pruning for ViT models. We first develop a graph-based ranking for measuring the importance of attention heads, and the extracted importance information is further integrated to an optimization-based procedure to impose the heterogeneous structured sparsity patterns on the ViT models. Experimental results show that our proposed GOHSP demonstrates excellent compression performance. On CIFAR-10 dataset, our approach can bring 40% parameters reduction with no accuracy loss for ViT-Small model. On ImageNet dataset, with 30% and 35% sparsity ratio for DeiT-Tiny and DeiT-Small models, our approach achieves 1.65% and 0.76% accuracy increase over the existing structured pruning methods, respectively.
△ Less
Submitted 6 February, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Multi-View MOOC Quality Evaluation via Information-Aware Graph Representation Learning
Authors:
Lu Jiang,
Yibin Wang,
Jianan Wang,
Pengyang Wang,
Minghao Yin
Abstract:
In this paper, we study the problem of MOOC quality evaluation which is essential for improving the course materials, promoting students' learning efficiency, and benefiting user services. While achieving promising performances, current works still suffer from the complicated interactions and relationships of entities in MOOC platforms. To tackle the challenges, we formulate the problem as a cours…
▽ More
In this paper, we study the problem of MOOC quality evaluation which is essential for improving the course materials, promoting students' learning efficiency, and benefiting user services. While achieving promising performances, current works still suffer from the complicated interactions and relationships of entities in MOOC platforms. To tackle the challenges, we formulate the problem as a course representation learning task-based and develop an Information-aware Graph Representation Learning(IaGRL) for multi-view MOOC quality evaluation. Specifically, We first build a MOOC Heterogeneous Network (HIN) to represent the interactions and relationships among entities in MOOC platforms. And then we decompose the MOOC HIN into multiple single-relation graphs based on meta-paths to depict the multi-view semantics of courses. The course representation learning can be further converted to a multi-view graph representation task. Different from traditional graph representation learning, the learned course representations are expected to match the following three types of validity: (1) the agreement on expressiveness between the raw course portfolio and the learned course representations; (2) the consistency between the representations in each view and the unified representations; (3) the alignment between the course and MOOC platform representations. Therefore, we propose to exploit mutual information for preserving the validity of course representations. We conduct extensive experiments over real-world MOOC datasets to demonstrate the effectiveness of our proposed method.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Permutation Statistics in Conjugacy Classes of the Symmetric Group
Authors:
Jesse Campion Loth,
Michael Levet,
Kevin Liu,
Eric Nathan Stucky,
Sheila Sundaram,
Mei Yin
Abstract:
We introduce the notion of a weighted inversion statistic on the symmetric group, and examine its distribution on each conjugacy class. Our work generalizes the study of several common permutation statistics, including the number of inversions, the number of descents, the major index, and the number of excedances. As a consequence, we obtain explicit formulas for the first moments of several stati…
▽ More
We introduce the notion of a weighted inversion statistic on the symmetric group, and examine its distribution on each conjugacy class. Our work generalizes the study of several common permutation statistics, including the number of inversions, the number of descents, the major index, and the number of excedances. As a consequence, we obtain explicit formulas for the first moments of several statistics by conjugacy class. We also show that when the cycle lengths are sufficiently large, the higher moments of arbitrary permutation statistics are independent of the conjugacy class. Fulman (J. Comb. Theory Ser. A., 1998) previously established this result for major index and descents. We obtain these results, in part, by generalizing the techniques of Fulman (ibid.), and introducing the notion of permutation constraints. For permutation statistics that can be realized via symmetric constraints, we show that each moment is a polynomial in the degree of the symmetric group.
△ Less
Submitted 17 May, 2023; v1 submitted 2 January, 2023;
originally announced January 2023.
-
Network analysis on cortical morphometry in first-episode schizophrenia
Authors:
Mowen Yin,
Weikai Huang,
Zhichao Liang,
Quanying Liu,
Xiaoying Tang
Abstract:
First-episode schizophrenia (FES) results in abnormality of brain connectivity at different levels. Despite some successful findings on functional and structural connectivity of FES, relatively few studies have been focused on morphological connectivity, which may provide a potential biomarker for FES. In this study, we aim to investigate cortical morphological connectivity in FES. T1-weighted mag…
▽ More
First-episode schizophrenia (FES) results in abnormality of brain connectivity at different levels. Despite some successful findings on functional and structural connectivity of FES, relatively few studies have been focused on morphological connectivity, which may provide a potential biomarker for FES. In this study, we aim to investigate cortical morphological connectivity in FES. T1-weighted magnetic resonance image data from 92 FES patients and 106 healthy controls (HCs) are analyzed.We parcellate brain into 68 cortical regions, calculate the averaged thickness and surface area of each region, construct undirected networks by correlating cortical thickness or surface area measures across 68 regions for each group, and finally compute a variety of network-related topology characteristics. Our experimental results show that both the cortical thickness network and the surface area network in two groups are small-world networks; that is, those networks have high clustering coefficients and low characteristic path lengths. At certain network sparsity levels, both the cortical thickness network and the surface area network of FES have significantly lower clustering coefficients and local efficiencies than those of HC, indicating FES-related abnormalities in local connectivity and small-worldness. These abnormalities mainly involve the frontal, parietal, and temporal lobes. Further regional analyses confirm significant group differences in the node betweenness of the posterior cingulate gyrus for both the cortical thickness network and the surface area network. Our work supports that cortical morphological connectivity, which is constructed based on correlations across subjects' cortical thickness, may serve as a tool to study topological abnormalities in neurological disorders.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
Streaming Traffic Flow Prediction Based on Continuous Reinforcement Learning
Authors:
Yanan Xiao,
Minyu Liu,
Zichen Zhang,
Lu Jiang,
Minghao Yin,
Jianan Wang
Abstract:
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end,…
▽ More
Traffic flow prediction is an important part of smart transportation. The goal is to predict future traffic conditions based on historical data recorded by sensors and the traffic network. As the city continues to build, parts of the transportation network will be added or modified. How to accurately predict expanding and evolving long-term streaming networks is of great significance. To this end, we propose a new simulation-based criterion that considers teaching autonomous agents to mimic sensor patterns, planning their next visit based on the sensor's profile (e.g., traffic, speed, occupancy). The data recorded by the sensor is most accurate when the agent can perfectly simulate the sensor's activity pattern. We propose to formulate the problem as a continuous reinforcement learning task, where the agent is the next flow value predictor, the action is the next time-series flow value in the sensor, and the environment state is a dynamically fused representation of the sensor and transportation network. Actions taken by the agent change the environment, which in turn forces the agent's mode to update, while the agent further explores changes in the dynamic traffic network, which helps the agent predict its next visit more accurately. Therefore, we develop a strategy in which sensors and traffic networks update each other and incorporate temporal context to quantify state representations evolving over time.
△ Less
Submitted 24 December, 2022;
originally announced December 2022.
-
Multi-Frequency Channel Modeling for Millimeter Wave and THz Wireless Communication via Generative Adversarial Networks
Authors:
Yaqi Hu,
Mingsheng Yin,
William Xia,
Sundeep Rangan,
Marco Mezzavilla
Abstract:
Modern cellular systems rely increasingly on simultaneous communication in multiple discontinuous bands for macro-diversity and increased bandwidth. Multi-frequency communication is particularly crucial in the millimeter wave (mmWave) and Terahertz (THz) frequencies, as these bands are often coupled with lower frequencies for robustness. Evaluation of these systems requires statistical models that…
▽ More
Modern cellular systems rely increasingly on simultaneous communication in multiple discontinuous bands for macro-diversity and increased bandwidth. Multi-frequency communication is particularly crucial in the millimeter wave (mmWave) and Terahertz (THz) frequencies, as these bands are often coupled with lower frequencies for robustness. Evaluation of these systems requires statistical models that can capture the joint distribution of the channel paths across multiple frequencies. This paper presents a general neural network based methodology for training multi-frequency double directional statistical channel models. In the proposed approach, each is described as a multi-clustered set, and a generative adversarial network (GAN) is trained to generate random multi-cluster profiles where the generated cluster data includes the angles and delay of the clusters along with the vectors of random received powers, angular, and delay spread at different frequencies. The model can be readily applied for multi-frequency link or network layer simulation. The methodology is demonstrated on modeling urban micro-cellular links at 28 and 140 GHz trained from extensive ray tracing data. The methodology makes minimal statistical assumptions and experiments show the model can capture interesting statistical relationships between frequencies.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Source-Free Domain Adaptation for Question Answering with Masked Self-training
Authors:
M. Yin,
B. Wang,
Y. Dong,
C. Ling
Abstract:
Most previous unsupervised domain adaptation (UDA) methods for question answering(QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and may be restricted. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target doma…
▽ More
Most previous unsupervised domain adaptation (UDA) methods for question answering(QA) require access to source domain data while fine-tuning the model for the target domain. Source domain data may, however, contain sensitive information and may be restricted. In this study, we investigate a more challenging setting, source-free UDA, in which we have only the pretrained source model and target domain data, without access to source domain data. We propose a novel self-training approach to QA models that integrates a unique mask module for domain adaptation. The mask is auto-adjusted to extract key domain knowledge while trained on the source domain. To maintain previously learned domain knowledge, certain mask weights are frozen during adaptation, while other weights are adjusted to mitigate domain shifts with pseudo-labeled samples generated in the target domain. %As part of the self-training process, we generate pseudo-labeled samples in the target domain based on models trained in the source domain. Our empirical results on four benchmark datasets suggest that our approach significantly enhances the performance of pretrained QA models on the target domain, and even outperforms models that have access to the source data during adaptation.
△ Less
Submitted 17 March, 2024; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Algorithm and Hardware Co-Design of Energy-Efficient LSTM Networks for Video Recognition with Hierarchical Tucker Tensor Decomposition
Authors:
Yu Gong,
Miao Yin,
Lingyi Huang,
Chunhua Deng,
Yang Sui,
Bo Yuan
Abstract:
Long short-term memory (LSTM) is a type of powerful deep neural network that has been widely used in many sequence analysis and modeling applications. However, the large model size problem of LSTM networks make their practical deployment still very challenging, especially for the video recognition tasks that require high-dimensional input data. Aiming to overcome this limitation and fully unlock t…
▽ More
Long short-term memory (LSTM) is a type of powerful deep neural network that has been widely used in many sequence analysis and modeling applications. However, the large model size problem of LSTM networks make their practical deployment still very challenging, especially for the video recognition tasks that require high-dimensional input data. Aiming to overcome this limitation and fully unlock the potentials of LSTM models, in this paper we propose to perform algorithm and hardware co-design towards high-performance energy-efficient LSTM networks. At algorithm level, we propose to develop fully decomposed hierarchical Tucker (FDHT) structure-based LSTM, namely FDHT-LSTM, which enjoys ultra-low model complexity while still achieving high accuracy. In order to fully reap such attractive algorithmic benefit, we further develop the corresponding customized hardware architecture to support the efficient execution of the proposed FDHT-LSTM model. With the delicate design of memory access scheme, the complicated matrix transformation can be efficiently supported by the underlying hardware without any access conflict in an on-the-fly way. Our evaluation results show that both the proposed ultra-compact FDHT-LSTM models and the corresponding hardware accelerator achieve very high performance. Compared with the state-of-the-art compressed LSTM models, FDHT-LSTM enjoys both order-of-magnitude reduction in model size and significant accuracy improvement across different video recognition datasets. Meanwhile, compared with the state-of-the-art tensor decomposed model-oriented hardware TIE, our proposed FDHT-LSTM architecture achieves better performance in throughput, area efficiency and energy efficiency, respectively on LSTM-Youtube workload. For LSTM-UCF workload, our proposed design also outperforms TIE with higher throughput, higher energy efficiency and comparable area efficiency.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
CSTAR: Towards Compact and STructured Deep Neural Networks with Adversarial Robustness
Authors:
Huy Phan,
Miao Yin,
Yang Sui,
Bo Yuan,
Saman Zonouz
Abstract:
Model compression and model defense for deep neural networks (DNNs) have been extensively and individually studied. Considering the co-importance of model compactness and robustness in practical applications, several prior works have explored to improve the adversarial robustness of the sparse neural networks. However, the structured sparse models obtained by the exiting works suffer severe perfor…
▽ More
Model compression and model defense for deep neural networks (DNNs) have been extensively and individually studied. Considering the co-importance of model compactness and robustness in practical applications, several prior works have explored to improve the adversarial robustness of the sparse neural networks. However, the structured sparse models obtained by the exiting works suffer severe performance degradation for both benign and robust accuracy, thereby causing a challenging dilemma between robustness and structuredness of the compact DNNs. To address this problem, in this paper, we propose CSTAR, an efficient solution that can simultaneously impose the low-rankness-based Compactness, high STructuredness and high Adversarial Robustness on the target DNN models. By formulating the low-rankness and robustness requirement within the same framework and globally determining the ranks, the compressed DNNs can simultaneously achieve high compression performance and strong adversarial robustness. Evaluations for various DNN models on different datasets demonstrate the effectiveness of CSTAR. Compared with the state-of-the-art robust structured pruning methods, CSTAR shows consistently better performance. For instance, when compressing ResNet-18 on CIFAR-10, CSTAR can achieve up to 20.07% and 11.91% improvement for benign accuracy and robust accuracy, respectively. For compressing ResNet-18 with 16x compression ratio on Imagenet, CSTAR can obtain 8.58% benign accuracy gain and 4.27% robust accuracy gain compared to the existing robust structured pruning method.
△ Less
Submitted 17 February, 2023; v1 submitted 4 December, 2022;
originally announced December 2022.
-
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
Authors:
Jiachen Li,
Edwin Zhang,
Ming Yin,
Qinxun Bai,
Yu-Xiang Wang,
William Yang Wang
Abstract:
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a n…
▽ More
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate offline RL algorithms with our novel policy improvement operators and empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark. Our code is available at https://cfpi-icml23.github.io/.
△ Less
Submitted 22 July, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Authors:
Thanh Nguyen-Tang,
Ming Yin,
Sunil Gupta,
Svetha Venkatesh,
Raman Arora
Abstract:
Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We pr…
▽ More
Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrap** and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of \emph{concentrability} with respect to an optimal policy, the proposed algorithm yields a fast rate of $\tilde{\mathcal{O}}(\frac{1}{K})$ for offline RL when there is a positive gap in the optimal Q-value functions, even when the offline data were adaptively collected. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when $K$ exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first $\tilde{\mathcal{O}}(\frac{1}{K})$ bound and absolute zero sub-optimality bound respectively for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.
△ Less
Submitted 27 January, 2023; v1 submitted 23 November, 2022;
originally announced November 2022.
-
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition
Authors:
Lizhi Xiang,
Miao Yin,
Chengming Zhang,
Aravind Sukumaran-Rajam,
P. Sadayappan,
Bo Yuan,
Dingwen Tao
Abstract:
Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference cod…
▽ More
Tucker decomposition is one of the SOTA CNN model compression techniques. However, unlike the FLOPs reduction, we observe very limited inference time reduction with Tucker-compressed models using existing GPU software such as cuDNN. To this end, we propose an efficient end-to-end framework that can generate highly accurate and compact CNN models via Tucker decomposition and optimized inference code on GPUs. Specifically, we propose an ADMM-based training algorithm that can achieve highly accurate Tucker-format models. We also develop a high-performance kernel for Tucker-format convolutions and analytical performance models to guide the selection of execution parameters. We further propose a co-design framework to determine the proper Tucker ranks driven by practical inference time (rather than FLOPs). Our evaluation on five modern CNNs with A100 demonstrates that our compressed models with our optimized code achieve up to 2.21X speedup over cuDNN, 1.12X speedup over TVM, and 3.27X over the original models using cuDNN with at most 0.05% accuracy loss.
△ Less
Submitted 4 January, 2023; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Probabilistic Parking Functions
Authors:
Irfan Durmić,
Alex Han,
Pamela E. Harris,
Rodrigo Ribeiro,
Mei Yin
Abstract:
We consider the notion of classical parking functions by introducing randomness and a new parking protocol, as inspired by the work presented in the paper ``Parking Functions: Choose your own adventure,'' (arXiv:2001.04817) by Carlson, Christensen, Harris, Jones, and Rodríguez. Among our results, we prove that the probability of obtaining a parking function, from a length $n$ preference vector, is…
▽ More
We consider the notion of classical parking functions by introducing randomness and a new parking protocol, as inspired by the work presented in the paper ``Parking Functions: Choose your own adventure,'' (arXiv:2001.04817) by Carlson, Christensen, Harris, Jones, and Rodríguez. Among our results, we prove that the probability of obtaining a parking function, from a length $n$ preference vector, is independent of the probabilistic parameter $p$. We also explore the properties of a preference vector given that it is a parking function and discuss the effect of the probabilistic parameter $p$. Of special interest is when $p=1/2$, where we demonstrate a sharp transition in some parking statistics. We also present several interesting combinatorial consequences of the parking protocol. In particular, we provide a combinatorial interpretation for the array described in OEIS A220884 as the expected number of preference sequences with a particular property related to occupied parking spots, which solves an open problem of Novelli and Thibon posed in 2020 (arXiv:1209.5959). Lastly, we connect our results to other weighted phenomena in combinatorics and provide further directions for research.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Intra-session Context-aware Feed Recommendation in Live Systems
Authors:
Luo Ji,
Gao Liu,
Mingyang Yin,
Hongxia Yang
Abstract:
Feed recommendation allows users to constantly browse items until feel uninterested and leave the session, which differs from traditional recommendation scenarios. Within a session, user's decision to continue browsing or not substantially affects occurrences of later clicks. However, such type of exposure bias is generally ignored or not explicitly modeled in most feed recommendation studies. In…
▽ More
Feed recommendation allows users to constantly browse items until feel uninterested and leave the session, which differs from traditional recommendation scenarios. Within a session, user's decision to continue browsing or not substantially affects occurrences of later clicks. However, such type of exposure bias is generally ignored or not explicitly modeled in most feed recommendation studies. In this paper, we model this effect as part of intra-session context, and propose a novel intra-session Context-aware Feed Recommendation (INSCAFER) framework to maximize the total views and total clicks simultaneously. User click and browsing decisions are jointly learned by a multi-task setting, and the intra-session context is encoded by the session-wise exposed item sequence. We deploy our model online with all key business benchmarks improved. Our method sheds some lights on feed recommendation studies which aim to optimize session-level click and view metrics.
△ Less
Submitted 11 January, 2023; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Finding and Exploring Promising Search Space for the 0-1 Multidimensional Knapsack Problem
Authors:
Jitao Xu,
Hongbo Li,
Minghao Yin
Abstract:
The 0-1 Multidimensional Knapsack Problem (MKP) is a classical NP-hard combinatorial optimization problem with many engineering applications. In this paper, we propose a novel algorithm combining evolutionary computation with the exact algorithm to solve the 0-1 MKP. It maintains a set of solutions and utilizes the information from the population to extract good partial assignments. To find high-q…
▽ More
The 0-1 Multidimensional Knapsack Problem (MKP) is a classical NP-hard combinatorial optimization problem with many engineering applications. In this paper, we propose a novel algorithm combining evolutionary computation with the exact algorithm to solve the 0-1 MKP. It maintains a set of solutions and utilizes the information from the population to extract good partial assignments. To find high-quality solutions, an exact algorithm is applied to explore the promising search space specified by the good partial assignments. The new solutions are used to update the population. Thus, the good partial assignments evolve towards a better direction with the improvement of the population. Extensive experimentation with commonly used benchmark sets shows that our algorithm outperforms the state-of-the-art heuristic algorithms, TPTEA and DQPSO, as well as the commercial solver CPlex. It finds better solutions than the existing algorithms and provides new lower bounds for 10 large and hard instances.
△ Less
Submitted 26 May, 2024; v1 submitted 8 October, 2022;
originally announced October 2022.
-
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient
Authors:
Ming Yin,
Mengdi Wang,
Yu-Xiang Wang
Abstract:
Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understan…
▽ More
Offline reinforcement learning, which aims at optimizing sequential decision-making strategies with historical data, has been extensively applied in real-life applications. State-Of-The-Art algorithms usually leverage powerful function approximators (e.g. neural networks) to alleviate the sample complexity hurdle for better empirical performances. Despite the successes, a more systematic understanding of the statistical complexity for function approximation remains lacking. Towards bridging the gap, we take a step by considering offline reinforcement learning with differentiable function class approximation (DFA). This function class naturally incorporates a wide range of models with nonlinear/nonconvex structures. Most importantly, we show offline RL with differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning (PFQL) algorithm, and our results provide the theoretical basis for understanding a variety of practical heuristics that rely on Fitted Q-Iteration style design. In addition, we further improve our guarantee with a tighter instance-dependent characterization. We hope our work could draw interest in studying reinforcement learning with differentiable function approximation beyond the scope of current research.
△ Less
Submitted 23 November, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Robot Motion Planning as Video Prediction: A Spatio-Temporal Neural Network-based Motion Planner
Authors:
Xiao Zang,
Miao Yin,
Lingyi Huang,
**g** Yu,
Saman Zonouz,
Bo Yuan
Abstract:
Neural network (NN)-based methods have emerged as an attractive approach for robot motion planning due to strong learning capabilities of NN models and their inherently high parallelism. Despite the current development in this direction, the efficient capture and processing of important sequential and spatial information, in a direct and simultaneous way, is still relatively under-explored. To ove…
▽ More
Neural network (NN)-based methods have emerged as an attractive approach for robot motion planning due to strong learning capabilities of NN models and their inherently high parallelism. Despite the current development in this direction, the efficient capture and processing of important sequential and spatial information, in a direct and simultaneous way, is still relatively under-explored. To overcome the challenge and unlock the potentials of neural networks for motion planning tasks, in this paper, we propose STP-Net, an end-to-end learning framework that can fully extract and leverage important spatio-temporal information to form an efficient neural motion planner. By interpreting the movement of the robot as a video clip, robot motion planning is transformed to a video prediction task that can be performed by STP-Net in both spatially and temporally efficient ways. Empirical evaluations across different seen and unseen environments show that, with nearly 100% accuracy (aka, success rate), STP-Net demonstrates very promising performance with respect to both planning speed and path cost. Compared with existing NN-based motion planners, STP-Net achieves at least 5x, 2.6x and 1.8x faster speed with lower path cost on 2D Random Forest, 2D Maze and 3D Random Forest environments, respectively. Furthermore, STP-Net can quickly and simultaneously compute multiple near-optimal paths in multi-robot motion planning tasks
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Gradient Estimation for Binary Latent Variables via Gradient Variance Clip**
Authors:
Russell Z. Kunes,
Mingzhang Yin,
Max Land,
Doron Haviv,
Dana Pe'er,
Simon Tavaré
Abstract:
Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have pot…
▽ More
Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clip**} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.
△ Less
Submitted 12 August, 2022;
originally announced August 2022.