Search | arXiv e-print repository

DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning

Authors: Utsav Singh, Souradip Chakraborty, Wesley A. Suttle, Brian M. Sadler, Vinay P Namboodiri, Amrit Singh Bedi

Abstract: Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil… ▽ More Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, while limited human preference data often is; making efficient use of such data to guide learning is therefore essential. Methods for learning to perform complex robotics tasks from human preference data must overcome both these challenges simultaneously. In this work, we introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning, an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy and reinforcement learning to learn a lower-level policy. DIPPER enjoys improved computational efficiency due to its use of direct preference optimization instead of standard preference-based approaches such as reinforcement learning from human feedback, while it also mitigates the well-known hierarchical reinforcement learning issues of non-stationarity and infeasible subgoal generation due to our use of primitive-informed regularization inspired by a novel bi-level optimization formulation of the hierarchical reinforcement learning problem. To validate our approach, we perform extensive experimental analysis on a variety of challenging robotics tasks, demonstrating that DIPPER outperforms hierarchical and non-hierarchical baselines, while ameliorating the non-stationarity and infeasible subgoal generation issues of hierarchical reinforcement learning. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2404.13423 [pdf, other]

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Authors: Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Singh Bedi

Abstract: In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mit… ▽ More In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals for lower-level policies. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress. △ Less

Submitted 16 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.00538 [pdf, ps, other]

Eclipse Attack Detection on a Blockchain Network as a Non-Parametric Change Detection Problem

Authors: Anurag Gupta, Vikram Krishnamurthy, Brian M. Sadler

Abstract: This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby dist… ▽ More This paper introduces a novel non-parametric change detection algorithm to identify eclipse attacks on a blockchain network; the non-parametric algorithm relies only on the empirical mean and variance of the dataset, making it highly adaptable. An eclipse attack occurs when malicious actors isolate blockchain users, disrupting their ability to reach consensus with the broader network, thereby distorting their local copy of the ledger. To detect an eclipse attack, we monitor changes in the Fréchet mean and variance of the evolving blockchain communication network connecting blockchain users. First, we leverage the Johnson-Lindenstrauss lemma to project large-dimensional networks into a lower-dimensional space, preserving essential statistical properties. Subsequently, we employ a non-parametric change detection procedure, leading to a test statistic that converges weakly to a Brownian bridge process in the absence of an eclipse attack. This enables us to quantify the false alarm rate of the detector. Our detector can be implemented as a smart contract on the blockchain, offering a tamper-proof and reliable solution. Finally, we use numerical examples to compare the proposed eclipse attack detector with a detector based on the random forest model. △ Less

Submitted 30 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.11925 [pdf, other]

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

Authors: Bhrij Patel, Wesley A. Suttle, Alec Koppel, Vaneet Aggarwal, Brian M. Sadler, Amrit Singh Bedi, Dinesh Manocha

Abstract: In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating… ▽ More In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte-Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest available dependence of $\mathcal{O}\left( \sqrt{τ_{mix}} \right)$known from prior work. With a 2D grid world goal-reaching navigation experiment, we demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings. △ Less

Submitted 20 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 26 Pages, 2 Figures

arXiv:2403.04007 [pdf, other]

Sampling-based Safe Reinforcement Learning for Nonlinear Dynamical Systems

Authors: Wesley A. Suttle, Vipul K. Sharma, Krishna C. Kosaraju, S. Sivaranjani, Ji Liu, Vijay Gupta, Brian M. Sadler

Abstract: We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to le… ▽ More We develop provably safe and convergent reinforcement learning (RL) algorithms for control of nonlinear dynamical systems, bridging the gap between the hard safety guarantees of control theory and the convergence guarantees of RL theory. Recent advances at the intersection of control and RL follow a two-stage, safety filter approach to enforcing hard safety constraints: model-free RL is used to learn a potentially unsafe controller, whose actions are projected onto safe sets prescribed, for example, by a control barrier function. Though safe, such approaches lose any convergence guarantees enjoyed by the underlying RL methods. In this paper, we develop a single-stage, sampling-based approach to hard constraint satisfaction that learns RL controllers enjoying classical convergence guarantees while satisfying hard safety constraints throughout training and deployment. We validate the efficacy of our approach in simulation, including safe control of a quadcopter in a challenging obstacle avoidance problem, and demonstrate that it outperforms existing benchmarks. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 20 pages, 7 figures

arXiv:2402.10340 [pdf, other]

Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

Authors: Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, **g Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

Abstract: In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplo… ▽ More In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplored yet extremely critical. LLMs and VLMs are highly susceptible to adversarial inputs, prompting a significant inquiry into the safety of robotic systems. This concern is important because robotics operate in the physical world where erroneous actions can result in severe consequences. This paper explores this issue thoroughly, presenting a mathematical formulation of potential attacks on LLM/VLM-based robotic systems and offering experimental evidence of the safety challenges. Our empirical findings highlight a significant vulnerability: simple modifications to the input can drastically reduce system effectiveness. Specifically, our results demonstrate an average performance deterioration of 19.4% under minor input prompt modifications and a more alarming 29.1% under slight perceptual changes. These findings underscore the urgent need for robust countermeasures to ensure the safe and reliable deployment of advanced LLM/VLM-based robotic systems. △ Less

Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.06552 [pdf, other]

Deceptive Path Planning via Reinforcement Learning with Graph Neural Networks

Authors: Michael Y. Fatemi, Wesley A. Suttle, Brian M. Sadler

Abstract: Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Gi… ▽ More Deceptive path planning (DPP) is the problem of designing a path that hides its true goal from an outside observer. Existing methods for DPP rely on unrealistic assumptions, such as global state observability and perfect model knowledge, and are typically problem-specific, meaning that even minor changes to a previously solved problem can force expensive computation of an entirely new solution. Given these drawbacks, such methods do not generalize to unseen problem instances, lack scalability to realistic problem sizes, and preclude both on-the-fly tunability of deception levels and real-time adaptivity to changing environments. In this paper, we propose a reinforcement learning (RL)-based scheme for training policies to perform DPP over arbitrary weighted graphs that overcomes these issues. The core of our approach is the introduction of a local perception model for the agent, a new state space representation distilling the key components of the DPP problem, the use of graph neural network-based policies to facilitate generalization and scaling, and the introduction of new deception bonuses that translate the deception objectives of classical methods to the RL setting. Through extensive experimentation we show that, without additional fine-tuning, at test time the resulting policies successfully generalize, scale, enjoy tunable levels of deception, and adapt in real-time to changes in the environment. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 11 pages, 14 figures

MSC Class: 68T05

arXiv:2310.17660 [pdf, other]

An Invitation to Hypercomplex Phase Retrieval: Theory and Applications

Authors: Roman Jacome, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: Hypercomplex signal processing (HSP) provides state-of-the-art tools to handle multidimensional signals by harnessing intrinsic correlation of the signal dimensions through Clifford algebra. Recently, the hypercomplex representation of the phase retrieval (PR) problem, wherein a complex-valued signal is estimated through its intensity-only projections, has attracted significant interest. The hyper… ▽ More Hypercomplex signal processing (HSP) provides state-of-the-art tools to handle multidimensional signals by harnessing intrinsic correlation of the signal dimensions through Clifford algebra. Recently, the hypercomplex representation of the phase retrieval (PR) problem, wherein a complex-valued signal is estimated through its intensity-only projections, has attracted significant interest. The hypercomplex PR (HPR) arises in many optical imaging and computational sensing applications that usually comprise quaternion and octonion-valued signals. Analogous to the traditional PR, measurements in HPR may involve complex, hypercomplex, Fourier, and other sensing matrices. This set of problems opens opportunities for develo** novel HSP tools and algorithms. This article provides a synopsis of the emerging areas and applications of HPR with a focus on optical imaging. △ Less

Submitted 22 April, 2024; v1 submitted 20 October, 2023; originally announced October 2023.

Comments: 10 pages, 4 figures, 2 tables

arXiv:2310.14167 [pdf, other]

Factor Graph Processing for Dual-Blind Deconvolution at ISAC Receiver

Authors: Roman Jacome, Edwin Vargas, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: Integrated sensing and communications (ISAC) systems have gained significant interest because of their ability to jointly and efficiently access, utilize, and manage the scarce electromagnetic spectrum. The co-existence approach toward ISAC focuses on the receiver processing of overlaid radar and communications signals coming from independent transmitters. A specific ISAC coexistence problem is du… ▽ More Integrated sensing and communications (ISAC) systems have gained significant interest because of their ability to jointly and efficiently access, utilize, and manage the scarce electromagnetic spectrum. The co-existence approach toward ISAC focuses on the receiver processing of overlaid radar and communications signals coming from independent transmitters. A specific ISAC coexistence problem is dual-blind deconvolution (DBD), wherein the transmit signals and channels of both radar and communications are unknown to the receiver. Prior DBD works ignore the evolution of the signal model over time. In this work, we consider a dynamic DBD scenario using a linear state space model (LSSM) such that, apart from the transmit signals and channels of both systems, the LSSM parameters are also unknown. We employ a factor graph representation to model these unknown variables. We avoid the conventional matrix inversion approach to estimate the unknown variables by using an efficient expectation-maximization algorithm, where each iteration employs a Gaussian message passing over the factor graph structure. Numerical experiments demonstrate the accurate estimation of radar and communications channels, including in the presence of noise. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 13 pages, 4 figures

arXiv:2308.15784 [pdf, other]

Octonion Phase Retrieval

Authors: Roman Jacome, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: Signal processing over hypercomplex numbers arises in many optical imaging applications. In particular, spectral image or color stereo data are often processed using octonion algebra. Recently, the eight-band multispectral image phase recovery has gained salience, wherein it is desired to recover the eight bands from the phaseless measurements. In this paper, we tackle this hitherto unaddressed hy… ▽ More Signal processing over hypercomplex numbers arises in many optical imaging applications. In particular, spectral image or color stereo data are often processed using octonion algebra. Recently, the eight-band multispectral image phase recovery has gained salience, wherein it is desired to recover the eight bands from the phaseless measurements. In this paper, we tackle this hitherto unaddressed hypercomplex variant of the popular phase retrieval (PR) problem. We propose octonion Wirtinger flow (OWF) to recover an octonion signal from its intensity-only observation. However, contrary to the complex-valued Wirtinger flow, the non-associative nature of octonion algebra and the consequent lack of octonion derivatives make the extension to OWF non-trivial. We resolve this using the pseudo-real-matrix representation of octonion to perform the derivatives in each OWF update. We demonstrate that our approach recovers the octonion signal up to a right-octonion phase factor. Numerical experiments validate OWF-based PR with high accuracy under both noiseless and noisy measurements. △ Less

Submitted 1 June, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: 5 pages, 3 figures

arXiv:2306.06192 [pdf, other]

Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

Authors: Bhrij Patel, Kasun Weerakoon, Wesley A. Suttle, Alec Koppel, Brian M. Sadler, Tianyi Zhou, Amrit Singh Bedi, Dinesh Manocha

Abstract: Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in roboti… ▽ More Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments. △ Less

Submitted 20 March, 2024; v1 submitted 9 June, 2023; originally announced June 2023.

Comments: 11 pages, 9 figures, 2 tables

arXiv:2303.13609 [pdf, other]

Multi-Antenna Dual-Blind Deconvolution for Joint Radar-Communications via SoMAN Minimization

Authors: Roman Jacome, Edwin Vargas, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: In joint radar-communications (JRC) applications such as secure military receivers, often the radar and communications signals are overlaid in the received signal. In these passive listening outposts, the signals and channels of both radar and communications are unknown to the receiver. The ill-posed problem of recovering all signal and channel parameters from the overlaid signal is termed as \tex… ▽ More In joint radar-communications (JRC) applications such as secure military receivers, often the radar and communications signals are overlaid in the received signal. In these passive listening outposts, the signals and channels of both radar and communications are unknown to the receiver. The ill-posed problem of recovering all signal and channel parameters from the overlaid signal is termed as \textit{dual-blind deconvolution} (DBD). In this work, we investigate DBD for a multi-antenna receiver. We model the radar and communications channels with a few (sparse) \textit{continuous-valued} parameters such as time delays, Doppler velocities, and directions-of-arrival (DoAs). To solve this highly ill-posed DBD, we propose to minimize the sum of multivariate atomic norms (SoMAN) that depend on unknown parameters. To this end, we devise an exact semidefinite program using theories of positive hyperoctant trigonometric polynomials (PhTP). Our theoretical analyses show that the minimum number of samples and antennas required for perfect recovery is logarithmically dependent on the maximum of the number of radar targets and communications paths rather than their sum. We show that our approach is easily generalized to include several practical issues such as gain/phase errors and additive noise. Numerical experiments show the exact parameter recovery for different JRC scenarios. △ Less

Submitted 28 March, 2024; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 30 pages, 7 figures

arXiv:2301.12083 [pdf, other]

Beyond Exponentially Fast Mixing in Average-Reward Reinforcement Learning via Multi-Level Monte Carlo Actor-Critic

Authors: Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, Brian M. Sadler, Alec Koppel, Dinesh Manocha

Abstract: Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, mak… ▽ More Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, making the step size inoperable. In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. This method, which we call \textbf{M}ulti-level \textbf{A}ctor-\textbf{C}ritic (MAC), is developed especially for infinite-horizon average-reward settings and neither relies on oracle knowledge of the mixing time in its parameter selection nor assumes its exponential decay; it, therefore, is readily applicable to applications with slower mixing times. Nonetheless, it achieves a convergence rate comparable to the state-of-the-art AC algorithms. We experimentally show that these alleviated restrictions on the technical conditions required for stability translate to superior performance in practice for RL problems with sparse rewards. △ Less

Submitted 1 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.04088 [pdf, other]

LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

Authors: Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su

Abstract: This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a… ▽ More This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a novel method, LLM-Planner, that harnesses the power of large language models to do few-shot planning for embodied agents. We further propose a simple but effective way to enhance LLMs with physical grounding to generate and update plans that are grounded in the current environment. Experiments on the ALFRED dataset show that our method can achieve very competitive few-shot performance: Despite using less than 0.5% of paired training data, LLM-Planner achieves competitive performance with recent baselines that are trained using the full training data. Existing methods can barely complete any task successfully under the same few-shot setting. Our work opens the door for develo** versatile and sample-efficient embodied agents that can quickly learn many tasks. Website: https://dki-lab.github.io/LLM-Planner △ Less

Submitted 30 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: 14 pages, 5 figures

Report number: ICCV 2023

arXiv:2211.09253 [pdf, other]

Beurling-Selberg Extremization for Dual-Blind Deconvolution Recovery in Joint Radar-Communications

Authors: Jonathan Monsalve, Edwin Vargas, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: Recent interest in integrated sensing and communications has led to the design of novel signal processing techniques to recover information from an overlaid radar-communications signal. Here, we focus on a spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown to the common receiver. In this dual-blind deconvolution (DBD) probl… ▽ More Recent interest in integrated sensing and communications has led to the design of novel signal processing techniques to recover information from an overlaid radar-communications signal. Here, we focus on a spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown to the common receiver. In this dual-blind deconvolution (DBD) problem, the receiver admits a multi-carrier wireless communications signal that is overlaid with the radar signal reflected off multiple targets. The communications and radar channels are represented by continuous-valued range-times or delays corresponding to multiple transmission paths and targets, respectively. Prior works addressed recovery of unknown channels and signals in this ill-posed DBD problem through atomic norm minimization but contingent on individual minimum separation conditions for radar and communications channels. In this paper, we provide an optimal joint separation condition using extremal functions from the Beurling-Selberg interpolation theory. Thereafter, we formulate DBD as a low-rank modified Hankel matrix retrieval and solve it via nuclear norm minimization. We estimate the unknown target and communications parameters from the recovered low-rank matrix using multiple signal classification (MUSIC) method. We show that the joint separation condition also guarantees that the underlying Vandermonde matrix for MUSIC is well-conditioned. Numerical experiments validate our theoretical findings. △ Less

Submitted 27 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

Comments: 5 pages, 3 figures

arXiv:2211.06967 [pdf, ps, other]

Identifying Coordination in a Cognitive Radar Network -- A Multi-Objective Inverse Reinforcement Learning Approach

Authors: Luke Snow, Vikram Krishnamurthy, Brian M. Sadler

Abstract: Consider a target being tracked by a cognitive radar network. If the target can intercept some radar network emissions, how can it detect coordination among the radars? By 'coordination' we mean that the radar emissions satisfy Pareto optimality with respect to multi-objective optimization over each radar's utility. This paper provides a novel multi-objective inverse reinforcement learning approac… ▽ More Consider a target being tracked by a cognitive radar network. If the target can intercept some radar network emissions, how can it detect coordination among the radars? By 'coordination' we mean that the radar emissions satisfy Pareto optimality with respect to multi-objective optimization over each radar's utility. This paper provides a novel multi-objective inverse reinforcement learning approach which allows for both detection of such Pareto optimal ('coordinating') behavior and subsequent reconstruction of each radar's utility function, given a finite dataset of radar network emissions. The method for accomplishing this is derived from the micro-economic setting of Revealed Preferences, and also applies to more general problems of inverse detection and learning of multi-objective optimizing systems. △ Less

Submitted 13 November, 2022; originally announced November 2022.

arXiv:2209.11944 [pdf, other]

Communication-Efficient {Federated} Learning Using Censored Heavy Ball Descent

Authors: Yicheng Chen, Rick S. Blum, Brian M. Sadler

Abstract: Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distribu… ▽ More Distributed machine learning enables scalability and computational offloading, but requires significant levels of communication. Consequently, communication efficiency in distributed learning settings is an important consideration, especially when the communications are wireless and battery-driven devices are employed. In this paper we develop a censoring-based heavy ball (CHB) method for distributed learning in a server-worker architecture. Each worker self-censors unless its local gradient is sufficiently different from the previously transmitted one. The significant practical advantages of the HB method for learning problems are well known, but the question of reducing communications has not been addressed. CHB takes advantage of the HB smoothing to eliminate reporting small changes, and provably achieves a linear convergence rate equivalent to that of the classical HB method for smooth and strongly convex objective functions. The convergence guarantee of CHB is theoretically justified for both convex and nonconvex cases. In addition we prove that, under some conditions, at least half of all communications can be eliminated without any impact on convergence rate. Extensive numerical results validate the communication efficiency of CHB on both synthetic and real datasets, for convex, nonconvex, and nondifferentiable cases. Given a target accuracy, CHB can significantly reduce the number of communications compared to existing algorithms, achieving the same accuracy without slowing down the optimization process. △ Less

Submitted 24 September, 2022; originally announced September 2022.

arXiv:2208.04381 [pdf, other]

Dual-Blind Deconvolution for Overlaid Radar-Communications Systems

Authors: Edwin Vargas, Kumar Vijay Mishra, Roman Jacome, Brian M. Sadler, Henry Arguello

Abstract: The increasingly crowded spectrum has spurred the design of joint radar-communications systems that share hardware resources and efficiently use the radio frequency spectrum. We study a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this dual-blind deconvolution (DBD) problem, a common receiv… ▽ More The increasingly crowded spectrum has spurred the design of joint radar-communications systems that share hardware resources and efficiently use the radio frequency spectrum. We study a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this dual-blind deconvolution (DBD) problem, a common receiver admits a multi-carrier wireless communications signal that is overlaid with the radar signal reflected off multiple targets. The communications and radar channels are represented by continuous-valued range-time and Doppler velocities of multiple transmission paths and multiple targets. We exploit the sparsity of both channels to solve the highly ill-posed DBD problem by casting it into a sum of multivariate atomic norms (SoMAN) minimization. We devise a semidefinite program to estimate the unknown target and communications parameters using the theories of positive-hyperoctant trigonometric polynomials (PhTP). Our theoretical analyses show that the minimum number of samples required for near-perfect recovery is dependent on the logarithm of the maximum of number of radar targets and communications paths rather than their sum. We show that our SoMAN method and PhTP formulations are also applicable to more general scenarios such as unsynchronized transmission, the presence of noise, and multiple emitters. Numerical experiments demonstrate great performance enhancements during parameter recovery under different scenarios. △ Less

Submitted 19 June, 2023; v1 submitted 8 August, 2022; originally announced August 2022.

Comments: 26 pages, 13 figures, 1 table

arXiv:2206.10815 [pdf, other]

FedBC: Calibrating Global and Local Models via Federated Learning Beyond Consensus

Authors: Amrit Singh Bedi, Chen Fan, Alec Koppel, Anit Kumar Sahu, Brian M. Sadler, Furong Huang, Dinesh Manocha

Abstract: In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considerin… ▽ More In this work, we quantitatively calibrate the performance of global and local models in federated learning through a multi-criterion optimization-based framework, which we cast as a constrained program. The objective of a device is its local objective, which it seeks to minimize while satisfying nonlinear constraints that quantify the proximity between the local and the global model. By considering the Lagrangian relaxation of this problem, we develop a novel primal-dual method called Federated Learning Beyond Consensus (\texttt{FedBC}). Theoretically, we establish that \texttt{FedBC} converges to a first-order stationary point at rates that matches the state of the art, up to an additional error term that depends on a tolerance parameter introduced to scalarize the multi-criterion formulation. Finally, we demonstrate that \texttt{FedBC} balances the global and local model test accuracy metrics across a suite of datasets (Synthetic, MNIST, CIFAR-10, Shakespeare), achieving competitive performance with state-of-the-art. △ Less

Submitted 1 February, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.05166 [pdf, other]

Multi-dimensional dual-blind deconvolution approach toward joint radar-communications

Authors: Roman Jacome, Kumar Vijay Mishra, Edwin Vargas, Brian M. Sadler, Henry Arguello

Abstract: We consider a joint multiple-antenna radar-communications system in a co-existence scenario. Contrary to conventional applications, wherein at least the radar waveform and communications channel are known or estimated \textit{a priori}, we investigate the case when the channels and transmit signals of both systems are unknown. In radar applications, this problem arises in multistatic or passive sy… ▽ More We consider a joint multiple-antenna radar-communications system in a co-existence scenario. Contrary to conventional applications, wherein at least the radar waveform and communications channel are known or estimated \textit{a priori}, we investigate the case when the channels and transmit signals of both systems are unknown. In radar applications, this problem arises in multistatic or passive systems, where transmit signal is not known. Similarly, highly dynamic vehicular or mobile communications may render prior estimates of wireless channel unhelpful. In particular, the radar signal reflected-off multiple targets is overlaid with the multi-carrier communications signal. In order to extract the unknown continuous-valued target parameters (range, Doppler velocity, and direction-of-arrival) and communications messages, we formulate the problem as a sparse dual-blind deconvolution and solve it using atomic norm minimization. Numerical experiments validate our proposed approach and show that precise estimation of continuous-valued channel parameters, radar waveform, and communications messages is possible up to scaling ambiguities. △ Less

Submitted 10 June, 2022; originally announced June 2022.

Comments: 5 pages, 3 figures

arXiv:2206.01306 [pdf, ps, other]

Deceptive Planning for Resource Allocation

Authors: Shenghui Chen, Yagiz Savas, Mustafa O. Karabag, Brian M. Sadler, Ufuk Topcu

Abstract: We consider a team of autonomous agents that navigate in an adversarial environment and aim to achieve a task by allocating their resources over a set of target locations. An adversary in the environment observes the autonomous team's behavior to infer their objective and responds against the team. In this setting, we propose strategies for controlling the density of the autonomous team so that th… ▽ More We consider a team of autonomous agents that navigate in an adversarial environment and aim to achieve a task by allocating their resources over a set of target locations. An adversary in the environment observes the autonomous team's behavior to infer their objective and responds against the team. In this setting, we propose strategies for controlling the density of the autonomous team so that they can deceive the adversary regarding their objective while achieving the desired final resource allocation. We first develop a prediction algorithm based on the principle of maximum entropy to express the team's behavior expected by the adversary. Then, by measuring the deceptiveness via Kullback-Leibler divergence, we devise convex optimization-based planning algorithms that deceive the adversary by either exaggerating the behavior towards a decoy allocation strategy or creating ambiguity regarding the final allocation strategy. A user study with $320$ participants demonstrates that the proposed algorithms are effective for deception and reveal the inherent biases of participants towards proximate goals. △ Less

Submitted 5 October, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

arXiv:2206.01162 [pdf, other]

Posterior Coreset Construction with Kernelized Stein Discrepancy for Model-Based Reinforcement Learning

Authors: Souradip Chakraborty, Amrit Singh Bedi, Alec Koppel, Brian M. Sadler, Furong Huang, Pratap Tokekar, Dinesh Manocha

Abstract: Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assump… ▽ More Model-based approaches to reinforcement learning (MBRL) exhibit favorable performance in practice, but their theoretical guarantees in large spaces are mostly restricted to the setting when transition model is Gaussian or Lipschitz, and demands a posterior estimate whose representational complexity grows unbounded with time. In this work, we develop a novel MBRL method (i) which relaxes the assumptions on the target transition model to belong to a generic family of mixture models; (ii) is applicable to large-scale training by incorporating a compression step such that the posterior estimate consists of a Bayesian coreset of only statistically significant past state-action pairs; and (iii) exhibits a sublinear Bayesian regret. To achieve these results, we adopt an approach based upon Stein's method, which, under a smoothness condition on the constructed posterior and target, allows distributional distance to be evaluated in closed form as the kernelized Stein discrepancy (KSD). The aforementioned compression step is then computed in terms of greedily retaining only those samples which are more than a certain KSD away from the previous model estimate. Experimentally, we observe that this approach is competitive with several state-of-the-art RL methodologies, and can achieve up-to 50 percent reduction in wall clock time in some continuous control environments. △ Less

Submitted 4 May, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

arXiv:2202.07028 [pdf, other]

One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

Authors: Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su

Abstract: We study the problem of develo** autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in… ▽ More We study the problem of develo** autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker (M-TRACK) to guide the agent and monitor its progress. Specifically, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent's progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models. △ Less

Submitted 10 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: 10 pages, 5 figures. Accepted to CVPR 2022

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15482-15491

arXiv:2202.02580 [pdf, ps, other]

Communication Efficient Federated Learning via Ordered ADMM in a Fully Decentralized Setting

Authors: Yicheng Chen, Rick S. Blum, Brian M. Sadler

Abstract: The challenge of communication-efficient distributed optimization has attracted attention in recent years. In this paper, a communication efficient algorithm, called ordering-based alternating direction method of multipliers (OADMM) is devised in a general fully decentralized network setting where a worker can only exchange messages with neighbors. Compared to the classical ADMM, a key feature of… ▽ More The challenge of communication-efficient distributed optimization has attracted attention in recent years. In this paper, a communication efficient algorithm, called ordering-based alternating direction method of multipliers (OADMM) is devised in a general fully decentralized network setting where a worker can only exchange messages with neighbors. Compared to the classical ADMM, a key feature of OADMM is that transmissions are ordered among workers at each iteration such that a worker with the most informative data broadcasts its local variable to neighbors first, and neighbors who have not transmitted yet can update their local variables based on that received transmission. In OADMM, we prohibit workers from transmitting if their current local variables are not sufficiently different from their previously transmitted value. A variant of OADMM, called SOADMM, is proposed where transmissions are ordered but transmissions are never stopped for each node at each iteration. Numerical results demonstrate that given a targeted accuracy, OADMM can significantly reduce the number of communications compared to existing algorithms including ADMM. We also show numerically that SOADMM can accelerate convergence, resulting in communication savings compared to the classical ADMM. △ Less

Submitted 5 February, 2022; originally announced February 2022.

arXiv:2202.02491 [pdf, ps, other]

Distributed Learning With Sparsified Gradient Differences

Authors: Yicheng Chen, Rick S. Blum, Martin Takac, Brian M. Sadler

Abstract: A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a va… ▽ More A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger-scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process. △ Less

Submitted 4 February, 2022; originally announced February 2022.

arXiv:2201.12332 [pdf, other]

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

Authors: Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar, Alec Koppel

Abstract: We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that app… ▽ More We focus on parameterized policy search for reinforcement learning over continuous action spaces. Typically, one assumes the score function associated with a policy is bounded, which fails to hold even for Gaussian policies. To properly address this issue, one must introduce an exploration tolerance parameter to quantify the region in which it is bounded. Doing so incurs a persistent bias that appears in the attenuation rate of the expected policy gradient norm, which is inversely proportional to the radius of the action space. To mitigate this hidden bias, heavy-tailed policy parameterizations may be used, which exhibit a bounded score function, but doing so can cause instability in algorithmic updates. To address these issues, in this work, we study the convergence of policy gradient algorithms under heavy-tailed parameterizations, which we propose to stabilize with a combination of mirror ascent-type updates and gradient tracking. Our main theoretical contribution is the establishment that this scheme converges with constant step and batch sizes, whereas prior works require these parameters to respectively shrink to null or grow to infinity. Experimentally, this scheme under a heavy-tailed policy parameterization yields improved reward accumulation across a variety of settings as compared with standard benchmarks. △ Less

Submitted 30 January, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.11384 [pdf, other]

Phase Retrieval for Radar Waveform Design

Authors: Samuel Pinilla, Kumar Vijay Mishra, Brian M. Sadler, Henry Arguello

Abstract: The ability of a radar to discriminate in both range and Doppler velocity is completely characterized by the ambiguity function (AF) of its transmit waveform. Mathematically, it is obtained by correlating the waveform with its Doppler-shifted and delayed replicas. We consider the inverse problem of designing a radar transmit waveform that satisfies the specified AF magnitude. This process may be v… ▽ More The ability of a radar to discriminate in both range and Doppler velocity is completely characterized by the ambiguity function (AF) of its transmit waveform. Mathematically, it is obtained by correlating the waveform with its Doppler-shifted and delayed replicas. We consider the inverse problem of designing a radar transmit waveform that satisfies the specified AF magnitude. This process may be viewed as a signal reconstruction with some variation of phase retrieval methods. We provide a trust-region algorithm that minimizes a smoothed non-convex least-squares objective function to iteratively recover the underlying signal-of-interest for either time- or band-limited support. The method first approximates the signal using an iterative spectral algorithm and then refines the attained initialization based on a sequence of gradient iterations. Our theoretical analysis shows that unique signal reconstruction is possible using signal samples no more than thrice the number of signal frequencies or time samples. Numerical experiments demonstrate that our method recovers both time- and band-limited signals from sparsely and randomly sampled, noisy, and noiseless AFs. △ Less

Submitted 9 June, 2024; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 40 pages, 13 figures, 1 table

arXiv:2111.13670 [pdf, other]

Non-Convex Recovery from Phaseless Low-Resolution Blind Deconvolution Measurements using Noisy Masked Patterns

Authors: Samuel Pinilla, Kumar Vijay Mishra, Brian M. Sadler

Abstract: This paper addresses recovery of a kernel $\boldsymbol{h}\in \mathbb{C}^{n}$ and a signal $\boldsymbol{x}\in \mathbb{C}^{n}$ from the low-resolution phaseless measurements of their noisy circular convolution $\boldsymbol{y} = \left \rvert \boldsymbol{F}_{lo}( \boldsymbol{x}\circledast \boldsymbol{h}) \right \rvert^{2} + \boldsymbolη$, where $\boldsymbol{F}_{lo}\in \mathbb{C}^{m\times n}$ stands fo… ▽ More This paper addresses recovery of a kernel $\boldsymbol{h}\in \mathbb{C}^{n}$ and a signal $\boldsymbol{x}\in \mathbb{C}^{n}$ from the low-resolution phaseless measurements of their noisy circular convolution $\boldsymbol{y} = \left \rvert \boldsymbol{F}_{lo}( \boldsymbol{x}\circledast \boldsymbol{h}) \right \rvert^{2} + \boldsymbolη$, where $\boldsymbol{F}_{lo}\in \mathbb{C}^{m\times n}$ stands for a partial discrete Fourier transform ($m<n$), $\boldsymbolη$ models the noise, and $\lvert \cdot \rvert$ is the element-wise absolute value function. This problem is severely ill-posed because both the kernel and signal are unknown and, in addition, the measurements are phaseless, leading to many $\boldsymbol{x}$-$\boldsymbol{h}$ pairs that correspond to the measurements. Therefore, to guarantee a stable recovery of $\boldsymbol{x}$ and $\boldsymbol{h}$ from $\boldsymbol{y}$, we assume that the kernel $\boldsymbol{h}$ and the signal $\boldsymbol{x}$ lie in known subspaces of dimensions $k$ and $s$, respectively, such that $m\gg k+s$. We solve this problem by proposing a blind deconvolution algorithm for phaseless super-resolution (BliPhaSu) to minimize a non-convex least-squares objective function. The method first estimates a low-resolution version of both signals through a spectral algorithm, which are then refined based upon a sequence of stochastic gradient iterations. We show that our BliPhaSu algorithm converges linearly to a pair of true signals on expectation under a proper initialization that is based on spectral method. Numerical results from experimental data demonstrate perfect recovery of both $\boldsymbol{h}$ and $\boldsymbol{x}$ using our method. △ Less

Submitted 4 December, 2021; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: 5 pages, 4 figures

arXiv:2111.06304 [pdf, other]

Joint Radar-Communications Processing from a Dual-Blind Deconvolution Perspective

Authors: Edwin Vargas, Kumar Vijay Mishra, Roman Jacome, Brian M. Sadler, Henry Arguello

Abstract: We consider a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this \textit{dual-blind deconvolution} (DBD) problem, a common receiver admits the multi-carrier wireless communications signal that is overlaid with the radar signal reflected-off multiple targets. When the radar receiver is not co… ▽ More We consider a general spectral coexistence scenario, wherein the channels and transmit signals of both radar and communications systems are unknown at the receiver. In this \textit{dual-blind deconvolution} (DBD) problem, a common receiver admits the multi-carrier wireless communications signal that is overlaid with the radar signal reflected-off multiple targets. When the radar receiver is not collocated with the transmitter, such as in passive or multistatic radars, the transmitted signal is also unknown apart from the target parameters. Similarly, apart from the transmitted messages, the communications channel may also be unknown in dynamic environments such as vehicular networks. As a result, the estimation of unknown target and communications parameters in a DBD scenario is highly challenging. In this work, we exploit the sparsity of the channel to solve DBD by casting it as an atomic norm minimization problem. Our theoretical analyses and numerical experiments demonstrate perfect recovery of continuous-valued range-time and Doppler velocities of multiple targets as well as delay-Doppler communications channel parameters using uniformly-spaced time samples in the dual-blind receiver. △ Less

Submitted 11 November, 2021; originally announced November 2021.

Comments: 5 pages, 2 figures, submitted to ICASSP 2022

arXiv:2109.12343 [pdf, other]

Beyond Robustness: A Taxonomy of Approaches towards Resilient Multi-Robot Systems

Authors: Amanda Prorok, Matthew Malencia, Luca Carlone, Gaurav S. Sukhatme, Brian M. Sadler, Vijay Kumar

Abstract: Robustness is key to engineering, automation, and science as a whole. However, the property of robustness is often underpinned by costly requirements such as over-provisioning, known uncertainty and predictive models, and known adversaries. These conditions are idealistic, and often not satisfiable. Resilience on the other hand is the capability to endure unexpected disruptions, to recover swiftly… ▽ More Robustness is key to engineering, automation, and science as a whole. However, the property of robustness is often underpinned by costly requirements such as over-provisioning, known uncertainty and predictive models, and known adversaries. These conditions are idealistic, and often not satisfiable. Resilience on the other hand is the capability to endure unexpected disruptions, to recover swiftly from negative events, and bounce back to normality. In this survey article, we analyze how resilience is achieved in networks of agents and multi-robot systems that are able to overcome adversity by leveraging system-wide complementarity, diversity, and redundancy - often involving a reconfiguration of robotic capabilities to provide some key ability that was not present in the system a priori. As society increasingly depends on connected automated systems to provide key infrastructure services (e.g., logistics, transport, and precision agriculture), providing the means to achieving resilient multi-robot systems is paramount. By enumerating the consequences of a system that is not resilient (fragile), we argue that resilience must become a central engineering design consideration. Towards this goal, the community needs to gain clarity on how it is defined, measured, and maintained. We address these questions across foundational robotics domains, spanning perception, control, planning, and learning. One of our key contributions is a formal taxonomy of approaches, which also helps us discuss the defining factors and stressors for a resilient system. Finally, this survey article gives insight as to how resilience may be achieved. Importantly, we highlight open problems that remain to be tackled in order to reap the benefits of resilient robotic systems. △ Less

Submitted 25 September, 2021; originally announced September 2021.

arXiv:2106.13358 [pdf, other]

Scalable Perception-Action-Communication Loops with Convolutional and Graph Neural Networks

Authors: Ting-Kuei Hu, Fernando Gama, Tianlong Chen, Wenqing Zheng, Zhangyang Wang, Alejandro Ribeiro, Brian M. Sadler

Abstract: In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addre… ▽ More In this paper, we present a perception-action-communication loop design using Vision-based Graph Aggregation and Inference (VGAI). This multi-agent decentralized learning-to-control framework maps raw visual observations to agent actions, aided by local communication among neighboring agents. Our framework is implemented by a cascade of a convolutional and a graph neural network (CNN / GNN), addressing agent-level visual perception and feature learning, as well as swarm-level communication, local information aggregation and agent action inference, respectively. By jointly training the CNN and GNN, image features and communication messages are learned in conjunction to better address the specific task. We use imitation learning to train the VGAI controller in an offline phase, relying on a centralized expert controller. This results in a learned VGAI controller that can be deployed in a distributed manner for online execution. Additionally, the controller exhibits good scaling properties, with training in smaller teams and application in larger teams. Through a multi-agent flocking application, we demonstrate that VGAI yields performance comparable to or better than other decentralized controllers, using only the visual input modality and without accessing precise location or motion state information. △ Less

Submitted 5 November, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

arXiv:2106.02892 [pdf, other]

Training Robust Graph Neural Networks with Topology Adaptive Edge Drop**

Authors: Zhan Gao, Subhrajit Bhattacharya, Leiming Zhang, Rick S. Blum, Alejandro Ribeiro, Brian M. Sadler

Abstract: Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data, referred to as over-fitting. This paper proposes Topology Adaptive Edge Drop** (TADropEdge) method as an adaptive data augmentation technique to i… ▽ More Graph neural networks (GNNs) are processing architectures that exploit graph structural information to model representations from network data. Despite their success, GNNs suffer from sub-optimal generalization performance given limited training data, referred to as over-fitting. This paper proposes Topology Adaptive Edge Drop** (TADropEdge) method as an adaptive data augmentation technique to improve generalization performance and learn robust GNN models. We start by explicitly analyzing how random edge drop** increases the data diversity during training, while indicating i.i.d. edge drop** does not account for graph structural information and could result in noisy augmented data degrading performance. To overcome this issue, we consider graph connectivity as the key property that captures graph topology. TADropEdge incorporates this factor into random edge drop** such that the edge-dropped subgraphs maintain similar topology as the underlying graph, yielding more satisfactory data augmentation. In particular, TADropEdge first leverages the graph spectrum to assign proper weights to graph edges, which represent their criticality for establishing the graph connectivity. It then normalizes the edge weights and drops graph edges adaptively based on their normalized weights. Besides improving generalization performance, TADropEdge reduces variance for efficient training and can be applied as a generic method modular to different GNN models. Intensive experiments on real-life and synthetic datasets corroborate theory and verify the effectiveness of the proposed method. △ Less

Submitted 5 June, 2021; originally announced June 2021.

arXiv:2012.09952 [pdf, other]

doi 10.1109/TIFS.2020.3047763

Secrecy of Multi-Antenna Transmission with Full-Duplex User in the Presence of Randomly Located Eavesdroppers

Authors: Ishmam Zabir, Ahmed Maksud, Gaojie Chen, Brian M. Sadler, Yingbo Hua

Abstract: This paper considers the secrecy performance of several schemes for multi-antenna transmission to single-antenna users with full-duplex (FD) capability against randomly distributed single-antenna eavesdroppers (EDs). These schemes and related scenarios include transmit antenna selection (TAS), transmit antenna beamforming (TAB), artificial noise (AN) from the transmitter, user selection based thei… ▽ More This paper considers the secrecy performance of several schemes for multi-antenna transmission to single-antenna users with full-duplex (FD) capability against randomly distributed single-antenna eavesdroppers (EDs). These schemes and related scenarios include transmit antenna selection (TAS), transmit antenna beamforming (TAB), artificial noise (AN) from the transmitter, user selection based their distances to the transmitter, and colluding and non-colluding EDs. The locations of randomly distributed EDs and users are assumed to be distributed as Poisson Point Process (PPP). We derive closed form expressions for the secrecy outage probabilities (SOP) of all these schemes and scenarios. The derived expressions are useful to reveal the impacts of various environmental parameters and user's choices on the SOP, and hence useful for network design purposes. Examples of such numerical results are discussed. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: Paper accepted for publication in IEEE Transactions on Information Forensics and Security

arXiv:2011.07743 [pdf, other]

doi 10.1145/3442381.3449992

Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases

Authors: Yu Gu, Sue Kase, Michelle Vanni, Brian Sadler, Percy Liang, Xifeng Yan, Yu Su

Abstract: Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space… ▽ More Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA. △ Less

Submitted 22 February, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: Accepted to TheWebConf 2021 (previously WWW)

ACM Class: I.2.7

arXiv:2010.04790 [pdf, other]

Inter-cluster Transmission Control Using Graph Modal Barriers

Authors: Leiming Zhang, Brian M. Sadler, Rick S. Blum, Subhrajit Bhattacharya

Abstract: In this paper we consider the problem of transmission across a graph and how to effectively control/restrict it with limited resources. Transmission can represent information transfer across a social network, spread of a malicious virus across a computer network, or spread of an infectious disease across communities. The key insight is to assign proper weights to bottleneck edges of the graph base… ▽ More In this paper we consider the problem of transmission across a graph and how to effectively control/restrict it with limited resources. Transmission can represent information transfer across a social network, spread of a malicious virus across a computer network, or spread of an infectious disease across communities. The key insight is to assign proper weights to bottleneck edges of the graph based on their role in reducing the connection between two or more strongly-connected clusters within the graph. Selectively reducing the weights (implying reduced transmission rate) on the critical edges helps limit the transmission from one cluster to another. We refer to these as barrier weights and their computation is based on the eigenvectors of the graph Laplacian. Unlike other work on graph partitioning and clustering, we completely circumvent the associated computational complexities by assigning weights to edges instead of performing discrete graph cuts. This allows us to provide strong theoretical results on our proposed methods. We also develop approximations that allow low complexity distributed computation of the barrier weights using only neighborhood communication on the graph. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Comments: 16 pages

arXiv:2006.09310 [pdf, other]

Deep Multimodal Transfer-Learned Regression in Data-Poor Domains

Authors: Levi McClenny, Mulugeta Haile, Vahid Attari, Brian Sadler, Ulisses Braga-Neto, Raymundo Arroyave

Abstract: In many real-world applications of deep learning, estimation of a target may rely on various types of input data modes, such as audio-video, image-text, etc. This task can be further complicated by a lack of sufficient data. Here we propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal learning of image and feature data in a deep regression architecture effective at predicti… ▽ More In many real-world applications of deep learning, estimation of a target may rely on various types of input data modes, such as audio-video, image-text, etc. This task can be further complicated by a lack of sufficient data. Here we propose a Deep Multimodal Transfer-Learned Regressor (DMTL-R) for multimodal learning of image and feature data in a deep regression architecture effective at predicting target parameters in data-poor domains. Our model is capable of fine-tuning a given set of pre-trained CNN weights on a small amount of training image data, while simultaneously conditioning on feature information from a complimentary data mode during network training, yielding more accurate single-target or multi-target regression than can be achieved using the images or the features alone. We present results using phase-field simulation microstructure images with an accompanying set of physical features, using pre-trained weights from various well-known CNN architectures, which demonstrate the efficacy of the proposed multimodal approach. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:2003.12637 [pdf, other]

Collaborative Beamforming Under Localization Errors: A Discrete Optimization Approach

Authors: Erfaun Noorani, Yagiz Savas, Alec Koppel, John Baras, Ufuk Topcu, Brian M. Sadler

Abstract: We consider a network of agents that locate themselves in an environment through sensor measurements and aim to transmit a message signal to a base station via collaborative beamforming. The agents' sensor measurements result in localization errors, which degrade the quality of service at the base station due to unknown phase offsets that arise in the agents' communication channels. Assuming that… ▽ More We consider a network of agents that locate themselves in an environment through sensor measurements and aim to transmit a message signal to a base station via collaborative beamforming. The agents' sensor measurements result in localization errors, which degrade the quality of service at the base station due to unknown phase offsets that arise in the agents' communication channels. Assuming that each agent's localization error follows a Gaussian distribution, we study the problem of forming a reliable communication link between the agents and the base station despite the localization errors. In particular, we formulate a discrete optimization problem to choose only a subset of agents to transmit the message signal so that the variance of the signal-to-noise ratio (SNR) received by the base station is minimized while the expected SNR exceeds a desired threshold. When the variances of the localization errors are below a certain threshold characterized in terms of the carrier frequency, we show that greedy algorithms can be used to globally minimize the variance of the received SNR. On the other hand, when some agents have localization errors with large variances, we show that the variance of the received SNR can be locally minimized by exploiting the supermodularity of the mean and variance of the received SNR. In numerical simulations, we demonstrate that the proposed algorithms have the potential to synthesize beamformers orders of magnitude faster than convex optimization-based approaches while achieving comparable performances using less number of agents. △ Less

Submitted 17 March, 2021; v1 submitted 27 March, 2020; originally announced March 2020.

arXiv:2003.10550 [pdf, other]

Regret and Belief Complexity Trade-off in Gaussian Process Bandits via Information Thresholding

Authors: Amrit Singh Bedi, Dheeraj Peddireddy, Vaneet Aggarwal, Brian M. Sadler, Alec Koppel

Abstract: Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB… ▽ More Bayesian optimization is a framework for global search via maximum a posteriori updates rather than simulated annealing, and has gained prominence for decision-making under uncertainty. In this work, we cast Bayesian optimization as a multi-armed bandit problem, where the payoff function is sampled from a Gaussian process (GP). Further, we focus on action selections via upper confidence bound (UCB) or expected improvement (EI) due to their prevalent use in practice. Prior works using GPs for bandits cannot allow the iteration horizon $T$ to be large, as the complexity of computing the posterior parameters scales cubically with the number of past observations. To circumvent this computational burden, we propose a simple statistical test: only incorporate an action into the GP posterior when its conditional entropy exceeds an $ε$ threshold. Doing so permits us to precisely characterize the trade-off between regret bounds of GP bandit algorithms and complexity of the posterior distributions depending on the compression parameter $ε$ for both discrete and continuous action sets. To best of our knowledge, this is the first result which allows us to obtain sublinear regret bounds while still maintaining sublinear growth rate of the complexity of the posterior which is linear in the existing literature. Moreover, a provably finite bound on the complexity could be achieved but the algorithm would result in $ε$-regret which means $\textbf{Reg}_T/T \rightarrow \mathcal{O}(ε)$ as $T\rightarrow \infty$. Experimentally, we observe state of the art accuracy and complexity trade-offs for GP bandit algorithms applied to global optimization, suggesting the merits of compressed GPs in bandit settings. △ Less

Submitted 21 March, 2022; v1 submitted 23 March, 2020; originally announced March 2020.

arXiv:2002.02308 [pdf, other]

VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms

Authors: Ting-Kuei Hu, Fernando Gama, Tianlong Chen, Zhangyang Wang, Alejandro Ribeiro, Brian M. Sadler

Abstract: Decentralized coordination of a robot swarm requires addressing the tension between local perceptions and actions, and the accomplishment of a global objective. In this work, we propose to learn decentralized controllers based on solely raw visual inputs. For the first time, that integrates the learning of two key components: communication and visual perception, in one end-to-end framework. More s… ▽ More Decentralized coordination of a robot swarm requires addressing the tension between local perceptions and actions, and the accomplishment of a global objective. In this work, we propose to learn decentralized controllers based on solely raw visual inputs. For the first time, that integrates the learning of two key components: communication and visual perception, in one end-to-end framework. More specifically, we consider that each robot has access to a visual perception of the immediate surroundings, and communication capabilities to transmit and receive messages from other neighboring robots. Our proposed learning framework combines a convolutional neural network (CNN) for each robot to extract messages from the visual inputs, and a graph neural network (GNN) over the entire swarm to transmit, receive and process these messages in order to decide on actions. The use of a GNN and locally-run CNNs results naturally in a decentralized controller. We jointly train the CNNs and the GNN so that each robot learns to extract messages from the images that are adequate for the team as a whole. Our experiments demonstrate the proposed architecture in the problem of drone flocking and show its promising performance and scalability, e.g., achieving successful decentralized flocking for large-sized swarms consisting of up to 75 drones. △ Less

Submitted 10 December, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

arXiv:1910.08194 [pdf, other]

HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion

Authors: Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, Jiawei Han

Abstract: Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the "is-a" relation. Such a restriction limits their applicability to more di… ▽ More Taxonomies are of great value to many knowledge-rich applications. As the manual taxonomy curation costs enormous human effects, automatic taxonomy construction is in great demand. However, most existing automatic taxonomy construction methods can only build hypernymy taxonomies wherein each edge is limited to expressing the "is-a" relation. Such a restriction limits their applicability to more diverse real-world tasks where the parent-child may carry different relations. In this paper, we aim to construct a task-guided taxonomy from a domain-specific corpus and allow users to input a "seed" taxonomy, serving as the task guidance. We propose an expansion-based taxonomy construction framework, namely HiExpan, which automatically generates key term list from the corpus and iteratively grows the seed taxonomy. Specifically, HiExpan views all children under each taxonomy node forming a coherent set and builds the taxonomy by recursively expanding all these sets. Furthermore, HiExpan incorporates a weakly-supervised relation extraction module to extract the initial children of a newly-expanded node and adjusts the taxonomy tree by optimizing its global structure. Our experiments on three real datasets from different domains demonstrate the effectiveness of HiExpan for building task-guided taxonomies. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: KDD 2018 accepted

arXiv:1909.11555 [pdf, other]

Optimally Compressed Nonparametric Online Learning

Authors: Alec Koppel, Amrit Singh Bedi, Ketan Rajawat, Brian M. Sadler

Abstract: Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models. To go beyond linear in the online setting, nonparametric methods are of interest due to their universality and ability to stably incorporate new information via convexity or Bayes' Rule. Unfortunately, when used online, nonparametric meth… ▽ More Batch training of machine learning models based on neural networks is now well established, whereas to date streaming methods are largely based on linear models. To go beyond linear in the online setting, nonparametric methods are of interest due to their universality and ability to stably incorporate new information via convexity or Bayes' Rule. Unfortunately, when used online, nonparametric methods suffer a "curse of dimensionality" which precludes their use: their complexity scales at least with the time index. We survey online compression tools which bring their memory under control and attain approximate convergence. The asymptotic bias depends on a compression parameter that trades off memory and accuracy. Further, the applications to robotics, communications, economics, and power are discussed, as well as extensions to multi-agent systems. △ Less

Submitted 17 January, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

arXiv:1909.10279 [pdf, other]

Nearly Consistent Finite Particle Estimates in Streaming Importance Sampling

Authors: Alec Koppel, Amrit Singh Bedi, Brian M. Sadler, Victor Elvira

Abstract: In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC) sampling is usually required. {Importance sampling is a standard MC tool that approximates this unavailable distribution with a set of weighted samples.} This pr… ▽ More In Bayesian inference, we seek to compute information about random variables such as moments or quantiles on the basis of {available data} and prior information. When the distribution of random variables is {intractable}, Monte Carlo (MC) sampling is usually required. {Importance sampling is a standard MC tool that approximates this unavailable distribution with a set of weighted samples.} This procedure is asymptotically consistent as the number of MC samples (particles) go to infinity. However, retaining infinitely many particles is intractable. Thus, we propose a way to only keep a \emph{finite representative subset} of particles and their augmented importance weights that is \emph{nearly consistent}. To do so in {an online manner}, we (1) embed the posterior density estimate in a reproducing kernel Hilbert space (RKHS) through its kernel mean embedding; and (2) sequentially project this RKHS element onto a lower-dimensional subspace in RKHS using the maximum mean discrepancy, an integral probability metric. Theoretically, we establish that this scheme results in a bias determined by a compression parameter, which yields a tunable tradeoff between consistency and memory. In experiments, we observe the compressed estimates achieve comparable performance to the dense ones with substantial reductions in representational complexity. △ Less

Submitted 5 April, 2021; v1 submitted 23 September, 2019; originally announced September 2019.

arXiv:1909.05442 [pdf, other]

Nonstationary Nonparametric Online Learning: Balancing Dynamic Regret and Model Parsimony

Authors: Amrit Singh Bedi, Alec Koppel, Ketan Rajawat, Brian M. Sadler

Abstract: An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \… ▽ More An open challenge in supervised learning is \emph{conceptual drift}: a data point begins as classified according to one label, but over time the notion of that label changes. Beyond linear autoregressive models, transfer and meta learning address drift, but require data that is representative of disparate domains at the outset of training. To relax this requirement, we propose a memory-efficient \emph{online} universal function approximator based on compressed kernel methods. Our approach hinges upon viewing non-stationary learning as online convex optimization with dynamic comparators, for which performance is quantified by dynamic regret. Prior works control dynamic regret growth only for linear models. In contrast, we hypothesize actions belong to reproducing kernel Hilbert spaces (RKHS). We propose a functional variant of online gradient descent (OGD) operating in tandem with greedy subspace projections. Projections are necessary to surmount the fact that RKHS functions have complexity proportional to time. For this scheme, we establish sublinear dynamic regret growth in terms of both loss variation and functional path length, and that the memory of the function sequence remains moderate. Experiments demonstrate the usefulness of the proposed technique for online nonlinear regression and classification problems with non-stationary data. △ Less

Submitted 11 September, 2019; originally announced September 2019.

arXiv:1812.09551 [pdf, other]

TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering

Authors: Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian Sadler, Michelle Vanni, Jiawei Han

Abstract: Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the top… ▽ More Taxonomy construction is not only a fundamental task for semantic analysis of text corpora, but also an important step for applications such as information filtering, recommendation, and Web search. Existing pattern-based methods extract hypernym-hyponym term pairs and then organize these pairs into a taxonomy. However, by considering each term as an independent concept node, they overlook the topical proximity and the semantic correlations among terms. In this paper, we propose a method for constructing topic taxonomies, wherein every node represents a conceptual topic and is defined as a cluster of semantically coherent concept terms. Our method, TaxoGen, uses term embeddings and hierarchical clustering to construct a topic taxonomy in a recursive fashion. To ensure the quality of the recursive process, it consists of: (1) an adaptive spherical clustering module for allocating terms to proper levels when splitting a coarse topic into fine-grained ones; (2) a local embedding module for learning term embeddings that maintain strong discriminative power at different levels of the taxonomy. Our experiments on two real datasets demonstrate the effectiveness of TaxoGen compared with baseline methods. △ Less

Submitted 22 December, 2018; originally announced December 2018.

arXiv:1811.07032 [pdf, other]

Mining Entity Synonyms with Efficient Neural Set Generation

Authors: Jiaming Shen, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, Jiawei Han

Abstract: Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holis… ▽ More Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holistic semantics of a set and suffer from the error propagation issue. Here we propose a new framework, named SynSetMine, that efficiently generates entity synonym sets from a given vocabulary, using example sets from external knowledge bases as distant supervision. SynSetMine consists of two novel modules: (1) a set-instance classifier that jointly learns how to represent a permutation invariant synonym set and whether to include a new instance (i.e., a term) into the set, and (2) a set generation algorithm that enumerates the vocabulary only once and applies the learned set-instance classifier to detect all entity synonym sets in it. Experiments on three real datasets from different domains demonstrate both effectiveness and efficiency of SynSetMine for mining entity synonym sets. △ Less

Submitted 16 November, 2018; originally announced November 2018.

Comments: AAAI 2019 camera-ready version

arXiv:1808.06740 [pdf, other]

Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning

Authors: Ziyu Yao, Xiujun Li, Jianfeng Gao, Brian Sadler, Huan Sun

Abstract: Given a text description, most existing semantic parsers synthesize a program in one shot. However, it is quite challenging to produce a correct program solely based on the description, which in reality is often ambiguous or incomplete. In this paper, we investigate interactive semantic parsing, where the agent can ask the user clarification questions to resolve ambiguities via a multi-turn dialog… ▽ More Given a text description, most existing semantic parsers synthesize a program in one shot. However, it is quite challenging to produce a correct program solely based on the description, which in reality is often ambiguous or incomplete. In this paper, we investigate interactive semantic parsing, where the agent can ask the user clarification questions to resolve ambiguities via a multi-turn dialogue, on an important type of programs called "If-Then recipes." We develop a hierarchical reinforcement learning (HRL) based agent that significantly improves the parsing performance with minimal questions to the user. Results under both simulation and human evaluation show that our agent substantially outperforms non-interactive semantic parsers and rule-based agents. △ Less

Submitted 14 November, 2018; v1 submitted 20 August, 2018; originally announced August 2018.

Comments: 13 pages, 2 figures, accepted by AAAI 2019

arXiv:1808.05933 [pdf, other]

Decentralized Dictionary Learning Over Time-Varying Digraphs

Authors: Amir Daneshmand, Ying Sun, Gesualdo Scutari, Francisco Facchinei, Brian M. Sadler

Abstract: This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be ineffi… ▽ More This paper studies Dictionary Learning problems wherein the learning task is distributed over a multi-agent network, modeled as a time-varying directed graph. This formulation is relevant, for instance, in Big Data scenarios where massive amounts of data are collected/stored in different locations (e.g., sensors, clouds) and aggregating and/or processing all data in a fusion center might be inefficient or unfeasible, due to resource limitations, communication overheads or privacy issues. We develop a unified decentralized algorithmic framework for this class of nonconvex problems, which is proved to converge to stationary solutions at a sublinear rate. The new method hinges on Successive Convex Approximation techniques, coupled with a decentralized tracking mechanism aiming at locally estimating the gradient of the smooth part of the sum-utility. To the best of our knowledge, this is the first provably convergent decentralized algorithm for Dictionary Learning and, more generally, bi-convex problems over (time-varying) (di)graphs. △ Less

Submitted 5 March, 2019; v1 submitted 17 August, 2018; originally announced August 2018.

arXiv:1707.05851 [pdf, other]

doi 10.1109/JSTSP.2017.2730040

Graph Filters and the Z-Laplacian

Authors: Xiaoran Yan, Brian M. Sadler, Robert J. Drost, Paul L. Yu, Kristina Lerman

Abstract: In network science, the interplay between dynamical processes and the underlying topologies of complex systems has led to a diverse family of models with different interpretations. In graph signal processing, this is manifested in the form of different graph shifts and their induced algebraic systems. In this paper, we propose the unifying Z-Laplacian framework, whose instances can act as graph sh… ▽ More In network science, the interplay between dynamical processes and the underlying topologies of complex systems has led to a diverse family of models with different interpretations. In graph signal processing, this is manifested in the form of different graph shifts and their induced algebraic systems. In this paper, we propose the unifying Z-Laplacian framework, whose instances can act as graph shift operators. As a generalization of the traditional graph Laplacian, the Z-Laplacian spans the space of all possible Z-matrices, i.e., real square matrices with nonpositive off-diagonal entries. We show that the Z-Laplacian can model general continuous-time dynamical processes, including information flows and epidemic spreading on a given graph. It is also closely related to general nonnegative graph filters in the discrete time domain. We showcase its flexibility by considering two applications. First, we consider a wireless communications networking problem modeled with a graph, where the framework can be applied to model the effects of the underlying communications protocol and traffic. Second, we examine a structural brain network from the perspective of low- to high-frequency connectivity. △ Less

Submitted 18 July, 2017; originally announced July 2017.

Comments: IEEE Journal of Selected Topics in Signal Processing (September 2017 special issue on Graph Signal Processing)

arXiv:1606.09233 [pdf, other]

doi 10.1109/TIT.2017.2760634

Unequal Error Protection Querying Policies for the Noisy 20 Questions Problem

Authors: Hye Won Chung, Brian M. Sadler, Lizhong Zheng, Alfred O. Hero

Abstract: In this paper, we propose an open-loop unequal-error-protection querying policy based on superposition coding for the noisy 20 questions problem. In this problem, a player wishes to successively refine an estimate of the value of a continuous random variable by posing binary queries and receiving noisy responses. When the queries are designed non-adaptively as a single block and the noisy response… ▽ More In this paper, we propose an open-loop unequal-error-protection querying policy based on superposition coding for the noisy 20 questions problem. In this problem, a player wishes to successively refine an estimate of the value of a continuous random variable by posing binary queries and receiving noisy responses. When the queries are designed non-adaptively as a single block and the noisy responses are modeled as the output of a binary symmetric channel the 20 questions problem can be mapped to an equivalent problem of channel coding with unequal error protection (UEP). A new non-adaptive querying strategy based on UEP superposition coding is introduced whose estimation error decreases with an exponential rate of convergence that is significantly better than that of the UEP repetition coding introduced by Variani et al. (2015). With the proposed querying strategy, the rate of exponential decrease in the number of queries matches the rate of a closed-loop adaptive scheme where queries are sequentially designed with the benefit of feedback. Furthermore, the achievable error exponent is significantly better than that of random block codes employing equal error protection. △ Less

Submitted 28 September, 2017; v1 submitted 29 June, 2016; originally announced June 2016.

Comments: To appear in IEEE Transactions on Information Theory

arXiv:1606.05578 [pdf, other]

doi 10.1109/TSP.2017.2686368

Proximity Without Consensus in Online Multi-Agent Optimization

Authors: Alec Koppel, Brian M. Sadler, Alejandro Ribeiro

Abstract: We consider stochastic optimization problems in multi-agent settings, where a network of agents aims to learn parameters which are optimal in terms of a global objective, while giving preference to locally observed streaming information. To do so, we depart from the canonical decentralized optimization framework where agreement constraints are enforced, and instead formulate a problem where each a… ▽ More We consider stochastic optimization problems in multi-agent settings, where a network of agents aims to learn parameters which are optimal in terms of a global objective, while giving preference to locally observed streaming information. To do so, we depart from the canonical decentralized optimization framework where agreement constraints are enforced, and instead formulate a problem where each agent minimizes a global objective while enforcing network proximity constraints. This formulation includes online consensus optimization as a special case, but allows for the more general hypothesis that there is data heterogeneity across the network. To solve this problem, we propose using a stochastic saddle point algorithm inspired by Arrow and Hurwicz. This method yields a decentralized algorithm for processing observations sequentially received at each node of the network. Using Lagrange multipliers to penalize the discrepancy between them, only neighboring nodes exchange model information. We establish that under a constant step-size regime the time-average suboptimality and constraint violation are contained in a neighborhood whose radius vanishes with increasing number of iterations. As a consequence, we prove that the time-average primal vectors converge to the optimal objective while satisfying the network proximity constraints. We apply this method to the problem of sequentially estimating a correlated random field in a sensor network, as well as an online source localization problem, both of which demonstrate the empirical validity of the aforementioned convergence results. △ Less

Submitted 17 June, 2016; originally announced June 2016.

Showing 1–50 of 56 results for author: Sadler, B