-
Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning
Authors:
Niko A. Grupen,
Michael Hanlon,
Alexis Hao,
Daniel D. Lee,
Bart Selman
Abstract:
Large-scale AI systems that combine search and learning have reached super-human levels of performance in game-playing, but have also been shown to fail in surprising ways. The brittleness of such models limits their efficacy and trustworthiness in real-world deployments. In this work, we systematically study one such algorithm, AlphaZero, and identify two phenomena related to the nature of explor…
▽ More
Large-scale AI systems that combine search and learning have reached super-human levels of performance in game-playing, but have also been shown to fail in surprising ways. The brittleness of such models limits their efficacy and trustworthiness in real-world deployments. In this work, we systematically study one such algorithm, AlphaZero, and identify two phenomena related to the nature of exploration. First, we find evidence of policy-value misalignment -- for many states, AlphaZero's policy and value predictions contradict each other, revealing a tension between accurate move-selection and value estimation in AlphaZero's objective. Further, we find inconsistency within AlphaZero's value function, which causes it to generalize poorly, despite its policy playing an optimal strategy. From these insights we derive VISA-VIS: a novel method that improves policy-value alignment and value robustness in AlphaZero. Experimentally, we show that our method reduces policy-value misalignment by up to 76%, reduces value generalization error by up to 50%, and reduces average value error by up to 55%.
△ Less
Submitted 6 February, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
Graph Value Iteration
Authors:
Dieqiao Feng,
Carla P. Gomes,
Bart Selman
Abstract:
In recent years, deep Reinforcement Learning (RL) has been successful in various combinatorial search domains, such as two-player games and scientific discovery. However, directly applying deep RL in planning domains is still challenging. One major difficulty is that without a human-crafted heuristic function, reward signals remain zero unless the learning framework discovers any solution plan. Se…
▽ More
In recent years, deep Reinforcement Learning (RL) has been successful in various combinatorial search domains, such as two-player games and scientific discovery. However, directly applying deep RL in planning domains is still challenging. One major difficulty is that without a human-crafted heuristic function, reward signals remain zero unless the learning framework discovers any solution plan. Search space becomes \emph{exponentially larger} as the minimum length of plans grows, which is a serious limitation for planning instances with a minimum plan length of hundreds to thousands of steps. Previous learning frameworks that augment graph search with deep neural networks and extra generated subgoals have achieved success in various challenging planning domains. However, generating useful subgoals requires extensive domain knowledge. We propose a domain-independent method that augments graph search with graph value iteration to solve hard planning instances that are out of reach for domain-specialized solvers. In particular, instead of receiving learning signals only from discovered plans, our approach also learns from failed search attempts where no goal state has been reached. The graph value iteration component can exploit the graph structure of local search space and provide more informative learning signals. We also show how we use a curriculum strategy to smooth the learning process and perform a full analysis of how graph value iteration scales and enables learning.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
Left Heavy Tails and the Effectiveness of the Policy and Value Networks in DNN-based best-first search for Sokoban Planning
Authors:
Dieqiao Feng,
Carla Gomes,
Bart Selman
Abstract:
Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combin…
▽ More
Despite the success of practical solvers in various NP-complete domains such as SAT and CSP as well as using deep reinforcement learning to tackle two-player games such as Go, certain classes of PSPACE-hard planning problems have remained out of reach. Even carefully designed domain-specialized solvers can fail quickly due to the exponential search space on hard instances. Recent works that combine traditional search methods, such as best-first search and Monte Carlo tree search, with Deep Neural Networks' (DNN) heuristics have shown promising progress and can solve a significant number of hard planning instances beyond specialized solvers. To better understand why these approaches work, we studied the interplay of the policy and value networks of DNN-based best-first search on Sokoban and show the surprising effectiveness of the policy network, further enhanced by the value network, as a guiding heuristic for the search. To further understand the phenomena, we studied the cost distribution of the search algorithms and found that Sokoban instances can have heavy-tailed runtime distributions, with tails both on the left and right-hand sides. In particular, for the first time, we show the existence of \textit{left heavy tails} and propose an abstract tree model that can empirically explain the appearance of these tails. The experiments show the critical role of the policy network as a powerful heuristic guiding the search, which can lead to left heavy tails with polynomial scaling by avoiding exploring exponentially sized subtrees. Our results also demonstrate the importance of random restarts, as are widely used in traditional combinatorial solvers, for DNN-based search methods to avoid left and right heavy tails.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
A Novel Automated Curriculum Strategy to Solve Hard Sokoban Planning Instances
Authors:
Dieqiao Feng,
Carla P. Gomes,
Bart Selman
Abstract:
In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes {\em exponentially rare} as the minimal solution leng…
▽ More
In recent years, we have witnessed tremendous progress in deep reinforcement learning (RL) for tasks such as Go, Chess, video games, and robot control. Nevertheless, other combinatorial domains, such as AI planning, still pose considerable challenges for RL approaches. The key difficulty in those domains is that a positive reward signal becomes {\em exponentially rare} as the minimal solution length increases. So, an RL approach loses its training signal. There has been promising recent progress by using a curriculum-driven learning approach that is designed to solve a single hard instance. We present a novel {\em automated} curriculum approach that dynamically selects from a pool of unlabeled training instances of varying task complexity guided by our {\em difficulty quantum momentum} strategy. We show how the smoothness of the task hardness impacts the final learning results. In particular, as the size of the instance pool increases, the ``hardness gap'' decreases, which facilitates a smoother automated curriculum based learning process. Our automated curriculum approach dramatically improves upon the previous approaches. We show our results on Sokoban, which is a traditional PSPACE-complete planning problem and presents a great challenge even for specialized solvers. Our RL agent can solve hard instances that are far out of reach for any previous state-of-the-art Sokoban solver. In particular, our approach can uncover plans that require hundreds of steps, while the best previous search methods would take many years of computing time to solve such instances. In addition, we show that we can further boost the RL performance with an intricate coupling of our automated curriculum approach with a curiosity-driven search strategy and a graph neural net representation.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Automating Crystal-Structure Phase Map**: Combining Deep Learning with Constraint Reasoning
Authors:
Di Chen,
Yiwei Bai,
Sebastian Ament,
Wenting Zhao,
Dan Guevarra,
Lan Zhou,
Bart Selman,
R. Bruce van Dover,
John M. Gregoire,
Carla P. Gomes
Abstract:
Crystal-structure phase map** is a core, long-standing challenge in materials science that requires identifying crystal structures, or mixtures thereof, in synthesized materials. Materials science experts excel at solving simple systems but cannot solve complex systems, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase mapp…
▽ More
Crystal-structure phase map** is a core, long-standing challenge in materials science that requires identifying crystal structures, or mixtures thereof, in synthesized materials. Materials science experts excel at solving simple systems but cannot solve complex systems, creating a major bottleneck in high-throughput materials discovery. Herein we show how to automate crystal-structure phase map**. We formulate phase map** as an unsupervised pattern demixing problem and describe how to solve it using Deep Reasoning Networks (DRNets). DRNets combine deep learning with constraint reasoning for incorporating scientific prior knowledge and consequently require only a modest amount of (unlabeled) data. DRNets compensate for the limited data by exploiting and magnifying the rich prior knowledge about the thermodynamic rules governing the mixtures of crystals with constraint reasoning seamlessly integrated into neural network optimization. DRNets are designed with an interpretable latent space for encoding prior-knowledge domain constraints and seamlessly integrate constraint reasoning into neural network optimization. DRNets surpass previous approaches on crystal-structure phase map**, unraveling the Bi-Cu-V oxide phase diagram, and aiding the discovery of solar-fuels materials.
△ Less
Submitted 21 August, 2021;
originally announced August 2021.
-
Structure Amplification on Multi-layer Stochastic Block Models
Authors:
Xiaodong Xin,
Kun He,
Jialu Bao,
Bart Selman,
John E. Hopcroft
Abstract:
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our prev…
▽ More
Much of the complexity of social, biological, and engineered systems arises from a network of complex interactions connecting many basic components. Network analysis tools have been successful at uncovering latent structure termed communities in such networks. However, some of the most interesting structure can be difficult to uncover because it is obscured by the more dominant structure. Our previous work proposes a general structure amplification technique called HICODE that uncovers many layers of functional hidden structure in complex networks. HICODE incrementally weakens dominant structure through randomization allowing the hidden functionality to emerge, and uncovers these hidden structure in real-world networks that previous methods rarely uncover. In this work, we conduct a comprehensive and systematic theoretical analysis on the hidden community structure. In what follows, we define multi-layer stochastic block model, and provide theoretical support using the model on why the existence of hidden structure will make the detection of dominant structure harder compared with equivalent random noise. We then provide theoretical proofs that the iterative reducing methods could help promote the uncovering of hidden structure as well as boosting the detection quality of dominant structure.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Multi-Agent Curricula and Emergent Implicit Signaling
Authors:
Niko A. Grupen,
Daniel D. Lee,
Bart Selman
Abstract:
Emergent communication has made strides towards learning communication from scratch, but has focused primarily on protocols that resemble human language. In nature, multi-agent cooperation gives rise to a wide range of communication that varies in structure and complexity. In this work, we recognize the full spectrum of communication that exists in nature and propose studying lower-level communica…
▽ More
Emergent communication has made strides towards learning communication from scratch, but has focused primarily on protocols that resemble human language. In nature, multi-agent cooperation gives rise to a wide range of communication that varies in structure and complexity. In this work, we recognize the full spectrum of communication that exists in nature and propose studying lower-level communication. Specifically, we study emergent implicit signaling in the context of decentralized multi-agent learning in difficult, sparse reward environments. However, learning to coordinate in such environments is challenging. We propose a curriculum-driven strategy that combines: (i) velocity-based environment sha**, tailored to the skill level of the multi-agent team; and (ii) a behavioral curriculum that helps agents learn successful single-agent behaviors as a precursor to learning multi-agent behaviors. Pursuit-evasion experiments show that our approach learns effective coordination, significantly outperforming sophisticated analytical and learned policies. Our method completes the pursuit-evasion task even when pursuers move at half of the evader's speed, whereas the highest-performing baseline fails at 80% of the evader's speed. Moreover, we examine the use of implicit signals in coordination through position-based social influence. We show that pursuers trained with our strategy exchange more than twice as much information (in bits) than baseline methods, indicating that our method has learned, and relies heavily on, the exchange of implicit signals.
△ Less
Submitted 6 February, 2022; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Cooperative Multi-Agent Fairness and Equivariant Policies
Authors:
Niko A. Grupen,
Bart Selman,
Daniel D. Lee
Abstract:
We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then prove that it is possible to enforce team fairness duri…
▽ More
We study fairness through the lens of cooperative multi-agent learning. Our work is motivated by empirical evidence that naive maximization of team reward yields unfair outcomes for individual team members. To address fairness in multi-agent contexts, we introduce team fairness, a group-based fairness measure for multi-agent learning. We then prove that it is possible to enforce team fairness during policy optimization by transforming the team's joint policy into an equivariant map. We refer to our multi-agent learning strategy as Fairness through Equivariance (Fair-E) and demonstrate its effectiveness empirically. We then introduce Fairness through Equivariance Regularization (Fair-ER) as a soft-constraint version of Fair-E and show that it reaches higher levels of utility than Fair-E and fairer outcomes than non-equivariant policies. Finally, we present novel findings regarding the fairness-utility trade-off in multi-agent settings; showing that the magnitude of the trade-off is dependent on agent skill.
△ Less
Submitted 19 January, 2022; v1 submitted 10 June, 2021;
originally announced June 2021.
-
"This Browser is Lightning Fast": The Effects of Message Content on Perceived Performance
Authors:
Jess Hohenstein,
Bill Selman,
Gemma Petrie,
Jofish Kaye,
Rebecca Weiss
Abstract:
With technical performance being similar for various web browsers, improving user perceived performance is integral to optimizing browser quality. We investigated the importance of priming, which has a well-documented ability to affect people's beliefs, on users' perceptions of web browser performance. We studied 1495 participants who read either an article about performance improvements to Mozill…
▽ More
With technical performance being similar for various web browsers, improving user perceived performance is integral to optimizing browser quality. We investigated the importance of priming, which has a well-documented ability to affect people's beliefs, on users' perceptions of web browser performance. We studied 1495 participants who read either an article about performance improvements to Mozilla Firefox, an article about user interface updates to Firefox, or an article about self-driving cars, and then watched video clips of browser tasks. As the priming effect would suggest, we found that reading articles about Firefox increased participants' perceived performance of Firefox over the most widely used web browser, Google Chrome. In addition, we found that article content mattered, as the article about performance improvements led to higher performance ratings than the article about UI updates. Our findings demonstrate how perceived performance can be improved without making technical improvements and that designers and developers must consider a wider picture when trying to improve user attitudes about technology.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.
-
Low-Bandwidth Communication Emerges Naturally in Multi-Agent Learning Systems
Authors:
Niko A. Grupen,
Daniel D. Lee,
Bart Selman
Abstract:
In this work, we study emergent communication through the lens of cooperative multi-agent behavior in nature. Using insights from animal communication, we propose a spectrum from low-bandwidth (e.g. pheromone trails) to high-bandwidth (e.g. compositional language) communication that is based on the cognitive, perceptual, and behavioral capabilities of social agents. Through a series of experiments…
▽ More
In this work, we study emergent communication through the lens of cooperative multi-agent behavior in nature. Using insights from animal communication, we propose a spectrum from low-bandwidth (e.g. pheromone trails) to high-bandwidth (e.g. compositional language) communication that is based on the cognitive, perceptual, and behavioral capabilities of social agents. Through a series of experiments with pursuit-evasion games, we identify multi-agent reinforcement learning algorithms as a computational model for the low-bandwidth end of the communication spectrum.
△ Less
Submitted 8 December, 2020; v1 submitted 30 November, 2020;
originally announced November 2020.
-
Solving Hard AI Planning Instances Using Curriculum-Driven Deep Reinforcement Learning
Authors:
Dieqiao Feng,
Carla P. Gomes,
Bart Selman
Abstract:
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learnin…
▽ More
Despite significant progress in general AI planning, certain domains remain out of reach of current AI planning systems. Sokoban is a PSPACE-complete planning task and represents one of the hardest domains for current AI planners. Even domain-specific specialized search methods fail quickly due to the exponential search complexity on hard instances. Our approach based on deep reinforcement learning augmented with a curriculum-driven method is the first one to solve hard instances within one day of training while other modern solvers cannot solve these instances within any reasonable time limit. In contrast to prior efforts, which use carefully handcrafted pruning techniques, our approach automatically uncovers domain structure. Our results reveal that deep RL provides a promising framework for solving previously unsolved AI planning problems, provided a proper training curriculum can be devised.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Hidden Community Detection on Two-layer Stochastic Models: a Theoretical Perspective
Authors:
Jialu Bao,
Kun He,
Xiaodong Xin,
Bart Selman,
John E. Hopcroft
Abstract:
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide n…
▽ More
Hidden community is a new graph-theoretical concept recently proposed [4], in which the authors also propose a meta-approach called HICODE (Hidden Community Detection) for detecting hidden communities. HICODE is demonstrated through experiments that it is able to uncover previously overshadowed weak layers and uncover both weak and strong layers at a higher accuracy. However, the authors provide no theoretical guarantee for the performance. In this work, we focus on the theoretical analysis of HICODE on synthetic two-layer networks, where layers are independent of each other and each layer is generated by stochastic block model. We bridge their gap through two-layer stochastic block model networks in the following aspects: 1) we show that partitions that locally optimize modularity correspond to grounded layers, indicating modularity-optimizing algorithms can detect strong layers; 2) we prove that when reducing found layers, HICODE increases absolute modularities of all unreduced layers, showing its layer reduction step makes weak layers more detectable. Our work builds a solid theoretical base for HICODE, demonstrating that it is promising in uncovering both weak and strong layers of communities in two-layer networks.
△ Less
Submitted 12 March, 2020; v1 submitted 16 January, 2020;
originally announced January 2020.
-
A 20-Year Community Roadmap for Artificial Intelligence Research in the US
Authors:
Yolanda Gil,
Bart Selman
Abstract:
Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is proj…
▽ More
Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience.
Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.
△ Less
Submitted 7 August, 2019;
originally announced August 2019.
-
Understanding Batch Normalization
Authors:
Johan Bjorck,
Carla Gomes,
Bart Selman,
Kilian Q. Weinberger
Abstract:
Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a bett…
▽ More
Batch normalization (BN) is a technique to normalize activations in intermediate layers of deep neural networks. Its tendency to improve accuracy and speed up training have established BN as a favorite technique in deep learning. Yet, despite its enormous success, there remains little consensus on the exact reason and mechanism behind these improvements. In this paper we take a step towards a better understanding of BN, following an empirical approach. We conduct several experiments, and show that BN primarily enables training with larger learning rates, which is the cause for faster convergence and better generalization. For networks without BN we demonstrate how large gradient updates can result in diverging loss and activations growing uncontrollably with network depth, which limits possible learning rates. BN avoids this problem by constantly correcting activations to be zero-mean and of unit standard deviation, which enables larger gradient steps, yields faster convergence and may help bypass sharp local minima. We further show various ways in which gradients and activations of deep unnormalized networks are ill-behaved. We contrast our results against recent findings in random matrix theory, shedding new light on classical initialization schemes and their consequences.
△ Less
Submitted 30 November, 2018; v1 submitted 31 May, 2018;
originally announced June 2018.
-
XOR-Sampling for Network Design with Correlated Stochastic Events
Authors:
Xiaojian Wu,
Yexiang Xue,
Bart Selman,
Carla P. Gomes
Abstract:
Many network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches re…
▽ More
Many network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches rely on the assumption that edges are independent. In this paper, we consider a more realistic setting where multiple edges are not independent due to natural disasters or regional events that make the states of multiple edges stochastically correlated. We use Markov Random Fields to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. The experimental results on real road network data show that the policies produced by SAA with the XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.
△ Less
Submitted 23 May, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Solving Marginal MAP Problems with NP Oracles and Parity Constraints
Authors:
Yexiang Xue,
Zhiyuan Li,
Stefano Ermon,
Carla P. Gomes,
Bart Selman
Abstract:
Arising from many applications at the intersection of decision making and machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify the two main classes of inference, namely maximization (optimization) and marginal inference (counting), and are believed to have higher complexity than both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP Problem, which re…
▽ More
Arising from many applications at the intersection of decision making and machine learning, Marginal Maximum A Posteriori (Marginal MAP) Problems unify the two main classes of inference, namely maximization (optimization) and marginal inference (counting), and are believed to have higher complexity than both of them. We propose XOR_MMAP, a novel approach to solve the Marginal MAP Problem, which represents the intractable counting subproblem with queries to NP oracles, subject to additional parity constraints. XOR_MMAP provides a constant factor approximation to the Marginal MAP Problem, by encoding it as a single optimization in polynomial size of the original problem. We evaluate our approach in several machine learning and decision making applications, and show that our approach outperforms several state-of-the-art Marginal MAP solvers.
△ Less
Submitted 29 November, 2016; v1 submitted 8 October, 2016;
originally announced October 2016.
-
Watch-n-Patch: Unsupervised Learning of Actions and Relations
Authors:
Chenxia Wu,
Jiemi Zhang,
Ozan Sener,
Bart Selman,
Silvio Savarese,
Ashutosh Saxena
Abstract:
There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-wo…
▽ More
There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with. We then propose a new probabilistic model relating the words and the topics. It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works. We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching. For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects. Moreover, we develop a robotic system that watches people and reminds people by applying our action patching algorithm. Our robotic setup can be easily deployed on any assistive robot.
△ Less
Submitted 11 March, 2016;
originally announced March 2016.
-
Watch-Bot: Unsupervised Learning for Reminding Humans of Forgotten Actions
Authors:
Chenxia Wu,
Jiemi Zhang,
Bart Selman,
Silvio Savarese,
Ashutosh Saxena
Abstract:
We present a robotic system that watches a human using a Kinect v2 RGB-D sensor, detects what he forgot to do while performing an activity, and if necessary reminds the person using a laser pointer to point out the related object. Our simple setup can be easily deployed on any assistive robot.
Our approach is based on a learning algorithm trained in a purely unsupervised setting, which does not…
▽ More
We present a robotic system that watches a human using a Kinect v2 RGB-D sensor, detects what he forgot to do while performing an activity, and if necessary reminds the person using a laser pointer to point out the related object. Our simple setup can be easily deployed on any assistive robot.
Our approach is based on a learning algorithm trained in a purely unsupervised setting, which does not require any human annotations. This makes our approach scalable and applicable to variant scenarios. Our model learns the action/object co-occurrence and action temporal relations in the activity, and uses the learned rich relationships to infer the forgotten action and the related object. We show that our approach not only improves the unsupervised action segmentation and action cluster assignment performance, but also effectively detects the forgotten actions on a challenging human activity RGB-D video dataset. In robotic experiments, we show that our robot is able to remind people of forgotten actions successfully.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
Variable Elimination in the Fourier Domain
Authors:
Yexiang Xue,
Stefano Ermon,
Ronan Le Bras,
Carla P. Gomes,
Bart Selman
Abstract:
The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based o…
▽ More
The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements.
△ Less
Submitted 21 June, 2016; v1 submitted 17 August, 2015;
originally announced August 2015.
-
Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery
Authors:
Stefano Ermon,
Ronan Le Bras,
Santosh K. Suram,
John M. Gregoire,
Carla Gomes,
Bart Selman,
Robert B. van Dover
Abstract:
Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori…
▽ More
Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition, we propose a new pattern decomposition algorithm, called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains.
△ Less
Submitted 26 November, 2014;
originally announced November 2014.
-
On the Erdos Discrepancy Problem
Authors:
Ronan Le Bras,
Carla P. Gomes,
Bart Selman
Abstract:
According to the Erdős discrepancy conjecture, for any infinite $\pm 1$ sequence, there exists a homogeneous arithmetic progression of unbounded discrepancy. In other words, for any $\pm 1$ sequence $(x_1,x_2,...)$ and a discrepancy $C$, there exist integers $m$ and $d$ such that $|\sum_{i=1}^m x_{i \cdot d}| > C$. This is an $80$-year-old open problem and recent development proved that this conje…
▽ More
According to the Erdős discrepancy conjecture, for any infinite $\pm 1$ sequence, there exists a homogeneous arithmetic progression of unbounded discrepancy. In other words, for any $\pm 1$ sequence $(x_1,x_2,...)$ and a discrepancy $C$, there exist integers $m$ and $d$ such that $|\sum_{i=1}^m x_{i \cdot d}| > C$. This is an $80$-year-old open problem and recent development proved that this conjecture is true for discrepancies up to $2$. Paul Erdős also conjectured that this property of unbounded discrepancy even holds for the restricted case of completely multiplicative sequences (CMSs), namely sequences $(x_1,x_2,...)$ where $x_{a \cdot b} = x_{a} \cdot x_{b}$ for any $a,b \geq 1$. The longest CMS with discrepancy $2$ has been proven to be of size $246$. In this paper, we prove that any completely multiplicative sequence of size $127,646$ or more has discrepancy at least $4$, proving the Erdős discrepancy conjecture for CMSs of discrepancies up to $3$. In addition, we prove that this bound is tight and increases the size of the longest known sequence of discrepancy $3$ from $17,000$ to $127,645$. Finally, we provide inductive construction rules as well as streamlining methods to improve the lower bounds for sequences of higher discrepancies.
△ Less
Submitted 15 May, 2014;
originally announced July 2014.
-
Optimization With Parity Constraints: From Binary Codes to Discrete Integration
Authors:
Stefano Ermon,
Carla P. Gomes,
Ashish Sabharwal,
Bart Selman
Abstract:
Many probabilistic inference tasks involve summations over exponentially large sets. Recently, it has been shown that these problems can be reduced to solving a polynomial number of MAP inference queries for a model augmented with randomly generated parity constraints. By exploiting a connection with max-likelihood decoding of binary codes, we show that these optimizations are computationally hard…
▽ More
Many probabilistic inference tasks involve summations over exponentially large sets. Recently, it has been shown that these problems can be reduced to solving a polynomial number of MAP inference queries for a model augmented with randomly generated parity constraints. By exploiting a connection with max-likelihood decoding of binary codes, we show that these optimizations are computationally hard. Inspired by iterative message passing decoding algorithms, we propose an Integer Linear Programming (ILP) formulation for the problem, enhanced with new sparsification techniques to improve decoding performance. By solving the ILP through a sequence of LP relaxations, we get both lower and upper bounds on the partition function, which hold with high probability and are much tighter than those obtained with variational methods.
△ Less
Submitted 26 September, 2013;
originally announced September 2013.
-
Synthesizing Manipulation Sequences for Under-Specified Tasks using Unrolled Markov Random Fields
Authors:
Jaeyong Sung,
Bart Selman,
Ashutosh Saxena
Abstract:
Many tasks in human environments require performing a sequence of navigation and manipulation steps involving objects. In unstructured human environments, the location and configuration of the objects involved often change in unpredictable ways. This requires a high-level planning strategy that is robust and flexible in an uncertain environment. We propose a novel dynamic planning strategy, which…
▽ More
Many tasks in human environments require performing a sequence of navigation and manipulation steps involving objects. In unstructured human environments, the location and configuration of the objects involved often change in unpredictable ways. This requires a high-level planning strategy that is robust and flexible in an uncertain environment. We propose a novel dynamic planning strategy, which can be trained from a set of example sequences. High level tasks are expressed as a sequence of primitive actions or controllers (with appropriate parameters). Our score function, based on Markov Random Field (MRF), captures the relations between environment, controllers, and their arguments. By expressing the environment using sets of attributes, the approach generalizes well to unseen scenarios. We train the parameters of our MRF using a maximum margin learning method. We provide a detailed empirical validation of our overall framework demonstrating successful plan strategies for a variety of tasks.
△ Less
Submitted 24 June, 2014; v1 submitted 24 June, 2013;
originally announced June 2013.
-
Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization
Authors:
Stefano Ermon,
Carla P. Gomes,
Ashish Sabharwal,
Bart Selman
Abstract:
Integration is affected by the curse of dimensionality and quickly becomes intractable as the dimensionality of the problem grows. We propose a randomized algorithm that, with high probability, gives a constant-factor approximation of a general discrete integral defined over an exponentially large set. This algorithm relies on solving only a small number of instances of a discrete combinatorial op…
▽ More
Integration is affected by the curse of dimensionality and quickly becomes intractable as the dimensionality of the problem grows. We propose a randomized algorithm that, with high probability, gives a constant-factor approximation of a general discrete integral defined over an exponentially large set. This algorithm relies on solving only a small number of instances of a discrete combinatorial optimization problem subject to randomly generated parity constraints used as a hash function. As an application, we demonstrate that with a small number of MAP queries we can efficiently approximate the partition function of discrete graphical models, which can in turn be used, for instance, for marginal computation or model selection.
△ Less
Submitted 27 February, 2013;
originally announced February 2013.
-
Algorithm Portfolio Design: Theory vs. Practice
Authors:
Carla P. Gomes,
Bart Selman
Abstract:
Stochastic algorithms are among the best for solving computationally hard search and reasoning problems. The runtime of such procedures is characterized by a random variable. Different algorithms give rise to different probability distributions. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single…
▽ More
Stochastic algorithms are among the best for solving computationally hard search and reasoning problems. The runtime of such procedures is characterized by a random variable. Different algorithms give rise to different probability distributions. One can take advantage of such differences by combining several algorithms into a portfolio, and running them in parallel or interleaving them on a single processor. We provide a detailed evaluation of the portfolio approach on distributions of hard combinatorial search problems. We show under what conditions the protfolio approach can have a dramatic computational advantage over the best traditional methods.
△ Less
Submitted 6 February, 2013;
originally announced February 2013.
-
A Bayesian Approach to Tackling Hard Computational Problems
Authors:
Eric J. Horvitz,
Yongshao Ruan,
Carla P. Gomes,
Henry Kautz,
Bart Selman,
David Maxwell Chickering
Abstract:
We are develo** a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typi…
▽ More
We are develo** a general framework for using learned Bayesian models for decision-theoretic control of search and reasoningalgorithms. We illustrate the approach on the specific task of controlling both general and domain-specific solvers on a hard class of structured constraint satisfaction problems. A successful strategyfor reducing the high (and even infinite) variance in running time typically exhibited by backtracking search algorithms is to cut off and restart the search if a solution is not found within a certainamount of time. Previous work on restart strategies have employed fixed cut off values. We show how to create a dynamic cut off strategy by learning a Bayesian model that predicts the ultimate length of a trial based on observing the early behavior of the search algorithm. Furthermore, we describe the general conditions under which a dynamic restart strategy can outperform the theoretically optimal fixed strategy.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Uniform Solution Sampling Using a Constraint Solver As an Oracle
Authors:
Stefano Ermon,
Carla P. Gomes,
Bart Selman
Abstract:
We consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces,…
▽ More
We consider the problem of sampling from solutions defined by a set of hard constraints on a combinatorial space. We propose a new sampling technique that, while enforcing a uniform exploration of the search space, leverages the reasoning power of a systematic constraint solver in a black-box scheme. We present a series of challenging domains, such as energy barriers and highly asymmetric spaces, that reveal the difficulties introduced by hard constraints. We demonstrate that standard approaches such as Simulated Annealing and Gibbs Sampling are greatly affected, while our new technique can overcome many of these difficulties. Finally, we show that our sampling scheme naturally defines a new approximate model counting technique, which we empirically show to be very accurate on a range of benchmark problems.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Survey Propagation Revisited
Authors:
Lukas Kroc,
Ashish Sabharwal,
Bart Selman
Abstract:
Survey propagation (SP) is an exciting new technique that has been remarkably successful at solving very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. In a promising attempt at understanding the success of SP, it was recently shown that SP can be viewed as a form of belief propagation, computing marginal probabilities over certain objects called cov…
▽ More
Survey propagation (SP) is an exciting new technique that has been remarkably successful at solving very large hard combinatorial problems, such as determining the satisfiability of Boolean formulas. In a promising attempt at understanding the success of SP, it was recently shown that SP can be viewed as a form of belief propagation, computing marginal probabilities over certain objects called covers of a formula. This explanation was, however, shortly dismissed by experiments suggesting that non-trivial covers simply do not exist for large formulas. In this paper, we show that these experiments were misleading: not only do covers exist for large hard random formulas, SP is surprisingly accurate at computing marginals over these covers despite the existence of many cycles in the formulas. This re-opens a potentially simpler line of reasoning for understanding SP, in contrast to some alternative lines of explanation that have been proposed assuming covers do not exist.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.
-
Understanding Sampling Style Adversarial Search Methods
Authors:
Raghuram Ramanujan,
Ashish Sabharwal,
Bart Selman
Abstract:
UCT has recently emerged as an exciting new adversarial reasoning technique based on cleverly balancing exploration and exploitation in a Monte-Carlo sampling setting. It has been particularly successful in the game of Go but the reasons for its success are not well understood and attempts to replicate its success in other domains such as Chess have failed. We provide an in-depth analysis of the p…
▽ More
UCT has recently emerged as an exciting new adversarial reasoning technique based on cleverly balancing exploration and exploitation in a Monte-Carlo sampling setting. It has been particularly successful in the game of Go but the reasons for its success are not well understood and attempts to replicate its success in other domains such as Chess have failed. We provide an in-depth analysis of the potential of UCT in domain-independent settings, in cases where heuristic values are available, and the effect of enhancing random playouts to more informed playouts between two weak minimax players. To provide further insights, we develop synthetic game tree instances and discuss interesting properties of UCT, both empirically and analytically.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Playing games against nature: optimal policies for renewable resource allocation
Authors:
Stefano Ermon,
Jon Conrad,
Carla P. Gomes,
Bart Selman
Abstract:
In this paper we introduce a class of Markov decision processes that arise as a natural model for many renewable resource allocation problems. Upon extending results from the inventory control literature, we prove that they admit a closed form solution and we show how to exploit this structure to speed up its computation. We consider the application of the proposed framework to several problems ar…
▽ More
In this paper we introduce a class of Markov decision processes that arise as a natural model for many renewable resource allocation problems. Upon extending results from the inventory control literature, we prove that they admit a closed form solution and we show how to exploit this structure to speed up its computation. We consider the application of the proposed framework to several problems arising in very different domains, and as part of the ongoing effort in the emerging field of Computational Sustainability we discuss in detail its application to the Northern Pacific Halibut marine fishery. Our approach is applied to a model based on real world data, obtaining a policy with a guaranteed lower bound on the utility function that is structurally very different from the one currently employed.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Unstructured Human Activity Detection from RGBD Images
Authors:
Jaeyong Sung,
Colin Ponce,
Bart Selman,
Ashutosh Saxena
Abstract:
Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured human activity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and poin…
▽ More
Being able to detect and recognize human activities is essential for several applications, including personal assistive robotics. In this paper, we perform detection and recognition of unstructured human activity in unstructured environments. We use a RGBD sensor (Microsoft Kinect) as the input sensor, and compute a set of features based on human pose and motion, as well as based on image and pointcloud information. Our algorithm is based on a hierarchical maximum entropy Markov model (MEMM), which considers a person's activity as composed of a set of sub-activities. We infer the two-layered graph structure using a dynamic programming approach. We test our algorithm on detecting and recognizing twelve different activities performed by four people in different environments, such as a kitchen, a living room, an office, etc., and achieve good performance even when the person was not seen before in the training set.
△ Less
Submitted 14 February, 2012; v1 submitted 1 July, 2011;
originally announced July 2011.
-
Structure and Problem Hardness: Goal Asymmetry and DPLL Proofs in<br> SAT-Based Planning
Authors:
Joerg Hoffmann,
Carla Gomes,
Bart Selman
Abstract:
In Verification and in (optimal) AI Planning, a successful method is to formulate the application as boolean satisfiability (SAT), and solve it with state-of-the-art DPLL-based procedures. There is a lack of understanding of why this works so well. Focussing on the Planning context, we identify a form of problem structure concerned with the symmetrical or asymmetrical nature of the cost of achie…
▽ More
In Verification and in (optimal) AI Planning, a successful method is to formulate the application as boolean satisfiability (SAT), and solve it with state-of-the-art DPLL-based procedures. There is a lack of understanding of why this works so well. Focussing on the Planning context, we identify a form of problem structure concerned with the symmetrical or asymmetrical nature of the cost of achieving the individual planning goals. We quantify this sort of structure with a simple numeric parameter called AsymRatio, ranging between 0 and 1. We run experiments in 10 benchmark domains from the International Planning Competitions since 2000; we show that AsymRatio is a good indicator of SAT solver performance in 8 of these domains. We then examine carefully crafted synthetic planning domains that allow control of the amount of structure, and that are clean enough for a rigorous analysis of the combinatorial search space. The domains are parameterized by size, and by the amount of structure. The CNFs we examine are unsatisfiable, encoding one planning step less than the length of the optimal plan. We prove upper and lower bounds on the size of the best possible DPLL refutations, under different settings of the amount of structure, as a function of size. We also identify the best possible sets of branching variables (backdoors). With minimum AsymRatio, we prove exponential lower bounds, and identify minimal backdoors of size linear in the number of variables. With maximum AsymRatio, we identify logarithmic DPLL refutations (and backdoors), showing a doubly exponential gap between the two structural extreme cases. The reasons for this behavior -- the proof arguments -- illuminate the prototypical patterns of structure causing the empirical behavior observed in the competition benchmarks.
△ Less
Submitted 26 February, 2007; v1 submitted 29 January, 2007;
originally announced January 2007.
-
From spin glasses to hard satisfiable formulas
Authors:
Haixia Jia,
Cristopher Moore,
Bart Selman
Abstract:
We introduce a highly structured family of hard satisfiable 3-SAT formulas corresponding to an ordered spin-glass model from statistical physics. This model has provably "glassy" behavior; that is, it has many local optima with large energy barriers between them, so that local search algorithms get stuck and have difficulty finding the true ground state, i.e., the unique satisfying assignment. We…
▽ More
We introduce a highly structured family of hard satisfiable 3-SAT formulas corresponding to an ordered spin-glass model from statistical physics. This model has provably "glassy" behavior; that is, it has many local optima with large energy barriers between them, so that local search algorithms get stuck and have difficulty finding the true ground state, i.e., the unique satisfying assignment. We test the hardness of our formulas with two Davis-Putnam solvers, Satz and zChaff, the recently introduced Survey Propagation (SP), and two local search algorithms, Walksat and Record-to-Record Travel (RRT). We compare our formulas to random 3-XOR-SAT formulas and to two other generators of hard satisfiable instances, the minimum disagreement parity formulas of Crawford et al., and Hirsch's hgen. For the complete solvers the running time of our formulas grows exponentially in sqrt(n), and exceeds that of random 3-XOR-SAT formulas for small problem sizes. SP is unable to solve our formulas with as few as 25 variables. For Walksat, our formulas appear to be harder than any other known generator of satisfiable instances. Finally, our formulas can be solved efficiently by RRT but only if the parameter d is tuned to the height of the barriers between local minima, and we use this parameter to measure the barrier heights in random 3-XOR-SAT formulas as well.
△ Less
Submitted 17 October, 2012; v1 submitted 9 August, 2004;
originally announced August 2004.
-
2+p-SAT: Relation of Typical-Case Complexity to the Nature of the Phase Transition
Authors:
R. Monasson,
R. Zecchina,
S. Kirkpatrick,
B. Selman,
L. Troyansky
Abstract:
Heuristic methods for solution of problems in the NP-Complete class of decision problems often reach exact solutions, but fail badly at "phase boundaries", across which the decision to be reached changes from almost always having one value to almost having a different value. We report an analytic solution and experimental investigations of the phase transition that occurs in the limit of very la…
▽ More
Heuristic methods for solution of problems in the NP-Complete class of decision problems often reach exact solutions, but fail badly at "phase boundaries", across which the decision to be reached changes from almost always having one value to almost having a different value. We report an analytic solution and experimental investigations of the phase transition that occurs in the limit of very large problems in K-SAT. The nature of its "random first-order" phase transition, seen at values of K large enough to make the computational cost of solving typical instances increase exponenitally with problem size, suggest a mechanism for the cost increase. There has been evidence for features like the "backbone" of frozen inputs which characterizes the UNSAT phase in K-SAT in the study of models of disordered materials, but this feature and this transition are uniquely accessible to analysis in K-SAT. The random first order transition combines properties of the 1st order (discontinuous onset of order) and 2nd order (with power law scaling, e.g. of the width of the the critical region in a finite system) transitions known in the physics of pure solids. Such transitions should occur in other combinatoric problems in the large N limit. Finally, improved search heuristics may be developed when a "backbone" is known to exist.
△ Less
Submitted 6 October, 1999;
originally announced October 1999.