-
Reinforcement Learning for Sociohydrology
Authors:
Tirthankar Roy,
Shivendra Srivastava,
Beichen Zhang
Abstract:
In this study, we discuss how reinforcement learning (RL) provides an effective and efficient framework for solving sociohydrology problems. The efficacy of RL for these types of problems is evident because of its ability to update policies in an iterative manner - something that is also foundational to sociohydrology, where we are interested in representing the co-evolution of human-water interac…
▽ More
In this study, we discuss how reinforcement learning (RL) provides an effective and efficient framework for solving sociohydrology problems. The efficacy of RL for these types of problems is evident because of its ability to update policies in an iterative manner - something that is also foundational to sociohydrology, where we are interested in representing the co-evolution of human-water interactions. We present a simple case study to demonstrate the implementation of RL in a problem of runoff reduction through management decisions related to changes in land-use land-cover (LULC). We then discuss the benefits of RL for these types of problems and share our perspectives on the future research directions in this area.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Belief-State Query Policies for Planning With Preferences Under Partial Observability
Authors:
Daniel Bramblett,
Siddharth Srivastava
Abstract:
Planning in real-world settings often entails addressing partial observability while aligning with users' preferences. We present a novel framework for expressing users' preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) preferences in the setting of goal-oriented partially observable Markov decision processes (gPOMDPs). We present the f…
▽ More
Planning in real-world settings often entails addressing partial observability while aligning with users' preferences. We present a novel framework for expressing users' preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) preferences in the setting of goal-oriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such preferences and prove that while the expected value of a BSQ preference is not a convex function w.r.t its parameters, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior while guaranteeing user preference compliance. Theoretical analysis proves that our algorithms converge to the optimal preference-compliant behavior in the limit. Empirical results show that BSQ preferences provide a computationally feasible approach for planning with preferences in partially observable settings.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
MathDivide: Improved mathematical reasoning by large language models
Authors:
Saksham Sahai Srivastava,
Ashutosh Gandhi
Abstract:
Large language models have been proven to be capable of handling complex linguistic and cognitive tasks. Therefore their usage has been extended to tasks requiring logical reasoning ability such as Mathematics. In this paper, we propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems. Each of the subproblems is formulated as an algebraic e…
▽ More
Large language models have been proven to be capable of handling complex linguistic and cognitive tasks. Therefore their usage has been extended to tasks requiring logical reasoning ability such as Mathematics. In this paper, we propose a prompting technique called MathDivide that breaks down the mathematical problem into simpler subproblems. Each of the subproblems is formulated as an algebraic expression whose value is evaluated by the Python code generated by the LLM for the corresponding algebraic expression. The values fed to the Python code are the numerical values provided in the problem statement. The solutions for the subproblems are composed together to obtain the final answer for the problem statement. Finally, the final answer is compared to the correct answer. If the final answer matches the correct answer, it is produced as output else a refinement prompt is fed to the LLM. We experiment with this prompting technique on both closed-source LLM models and open-source LLM models using GSM8K dataset. The results obtained demonstrate that MathDivide was able to significantly outperform the leading prompting technique called Math-prompter.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation
Authors:
Yunhao Ge,
Yihe Tang,
Jiashu Xu,
Cem Gokmen,
Chengshu Li,
Wensi Ai,
Benjamin Jose Martinez,
Arman Aydin,
Mona Anvari,
Ayush K Chakravarthy,
Hong-Xing Yu,
Josiah Wong,
Sanjana Srivastava,
Sharon Lee,
Shengxin Zha,
Laurent Itti,
Yunzhu Li,
Roberto Martín-Martín,
Miao Liu,
Pengchuan Zhang,
Ruohan Zhang,
Li Fei-Fei,
Jiajun Wu
Abstract:
The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and renderin…
▽ More
The systematic evaluation and understanding of computer vision models under varying conditions require large amounts of data with comprehensive and customized labels, which real-world vision datasets rarely satisfy. While current synthetic data generators offer a promising alternative, particularly for embodied AI tasks, they often fall short for computer vision tasks due to low asset and rendering quality, limited diversity, and unrealistic physical properties. We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models, based on the newly developed embodied AI benchmark, BEHAVIOR-1K. BVS supports a large number of adjustable parameters at the scene level (e.g., lighting, object placement), the object level (e.g., joint configuration, attributes such as "filled" and "folded"), and the camera level (e.g., field of view, focal length). Researchers can arbitrarily vary these parameters during data generation to perform controlled experiments. We showcase three example application scenarios: systematically evaluating the robustness of models across different continuous axes of domain shift, evaluating scene understanding models on the same set of images, and training and evaluating simulation-to-real transfer for a novel vision task: unary and binary state prediction. Project website: https://behavior-vision-suite.github.io/
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Catalyzing Social Interactions in Mixed Reality using ML Recommendation Systems
Authors:
Sparsh Srivastava,
Rohan Arora
Abstract:
We create an innovative mixed reality-first social recommendation model, utilizing features uniquely collected through mixed reality (MR) systems to promote social interaction, such as gaze recognition, proximity, noise level, congestion level, and conversational intensity. We further extend these models to include right-time features to deliver timely notifications. We measure performance metrics…
▽ More
We create an innovative mixed reality-first social recommendation model, utilizing features uniquely collected through mixed reality (MR) systems to promote social interaction, such as gaze recognition, proximity, noise level, congestion level, and conversational intensity. We further extend these models to include right-time features to deliver timely notifications. We measure performance metrics across various models by creating a new intersection of user features, MR features, and right-time features. We create four model types trained on different combinations of the feature classes, where we compare the baseline model trained on the class of user features against the models trained on MR features, right-time features, and a combination of all of the feature classes. Due to limitations in data collection and cost, we observe performance degradation in the right-time, mixed reality, and combination models. Despite these challenges, we introduce optimizations to improve accuracy across all models by over 14 percentage points, where the best performing model achieved 24% greater accuracy.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Quantifying Social Presence in Mixed Reality: A Contemporary Review of Techniques and Innovations
Authors:
Sparsh Srivastava
Abstract:
This literature review investigates the transformative potential of mixed reality (MR) technology, where we explore the intersection of contemporary technological advancements, modern deep learning recommendation systems, and social psychology frameworks. This interdisciplinary study informs the understanding of MR's role in improving social presence, catalyzing novel social interactions, and enha…
▽ More
This literature review investigates the transformative potential of mixed reality (MR) technology, where we explore the intersection of contemporary technological advancements, modern deep learning recommendation systems, and social psychology frameworks. This interdisciplinary study informs the understanding of MR's role in improving social presence, catalyzing novel social interactions, and enhancing the quality of interpersonal communication in the real world. We also discuss the challenges and barriers blocking the wide-spread adoption of social networking in MR, such as device constraints, privacy and accessibility concerns, and social norms. Through carefully structured, closed-environment experiments with diverse participants of varying levels of digital literacy, we measure the differences in social dynamics, frequency, quality, and duration of interactions, and levels of social anxiety between MR-enhanced, mobile-enhanced, and control condition participants.
△ Less
Submitted 26 April, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry
Authors:
Shubhadip Dasgupta,
Satwik Pate,
Divya Rathore,
L. G. Divyanth,
Ayan Das,
Anshuman Nayak,
Subhadip Dey,
Asim Biswas,
David C. Weindorf,
Bin Li,
Sergio Henrique Godinho Silva,
Bruno Teixeira Ribeiro,
Sanjay Srivastava,
Somsubhra Chakraborty
Abstract:
This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and t…
▽ More
This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Transfer Learning with Point Transformers
Authors:
Kartik Gupta,
Rahul Vippala,
Sahima Srivastava
Abstract:
Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model…
▽ More
Point Transformers are near state-of-the-art models for classification, segmentation, and detection tasks on Point Cloud data. They utilize a self attention based mechanism to model large range spatial dependencies between multiple point sets. In this project we explore two things: classification performance of these attention based networks on ModelNet10 dataset and then, we use the trained model to classify 3D MNIST dataset after finetuning. We also train the model from scratch on 3D MNIST dataset to compare the performance of finetuned and from-scratch model on the MNIST dataset. We observe that since the two datasets have a large difference in the degree of the distributions, transfer learned models do not outperform the from-scratch models in this case. Although we do expect transfer learned models to converge faster since they already know the lower level edges, corners, etc features from the ModelNet10 dataset.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Using Explainable AI and Hierarchical Planning for Outreach with Robots
Authors:
Daksh Dobhal,
Jayesh Nagpal,
Rushang Karia,
Pulkit Verma,
Rashmeet Kaur Nayyar,
Naman Shah,
Siddharth Srivastava
Abstract:
Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts the complexities of robot planning can be challenging. This work presents an open-source platform that simplifies the process using a visual interface that completely abstracts the complex internals of hierarchical planning that robot…
▽ More
Understanding how robots plan and execute tasks is crucial in today's world, where they are becoming more prevalent in our daily lives. However, teaching non-experts the complexities of robot planning can be challenging. This work presents an open-source platform that simplifies the process using a visual interface that completely abstracts the complex internals of hierarchical planning that robots use for performing task and motion planning. Using the principles developed in the field of explainable AI, this intuitive platform enables users to create plans for robots to complete tasks, and provides helpful hints and natural language explanations for errors. The platform also has a built-in simulator to demonstrate how robots execute submitted plans. This platform's efficacy was tested in a user study on university students with little to no computer science background. Our results show that this platform is highly effective in teaching novice users the intuitions of robot task planning.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal Specifications
Authors:
Rushang Karia,
Daksh Dobhal,
Daniel Bramblett,
Pulkit Verma,
Siddharth Srivastava
Abstract:
Stakeholders often describe system requirements using natural language which are then converted to formal syntax by a domain-expert leading to increased design costs. This paper assesses the capabilities of Large Language Models (LLMs) in converting between natural language descriptions and formal specifications. Existing work has evaluated the capabilities of LLMs in generating formal syntax such…
▽ More
Stakeholders often describe system requirements using natural language which are then converted to formal syntax by a domain-expert leading to increased design costs. This paper assesses the capabilities of Large Language Models (LLMs) in converting between natural language descriptions and formal specifications. Existing work has evaluated the capabilities of LLMs in generating formal syntax such as source code but such experiments are typically hand-crafted and use problems that are likely to be in the training set of LLMs, and often require human-annotated datasets. We propose an approach that can use two copies of an LLM in conjunction with an off-the-shelf verifier to automatically evaluate its translation abilities without any additional human input. Our approach generates formal syntax using language grammars to automatically generate a dataset. We conduct an empirical evaluation to measure the accuracy of this translation task and show that SOTA LLMs cannot adequately solve this task, limiting their current utility in the design of complex systems.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation
Authors:
Chengshu Li,
Ruohan Zhang,
Josiah Wong,
Cem Gokmen,
Sanjana Srivastava,
Roberto Martín-Martín,
Chen Wang,
Gabrael Levine,
Wensi Ai,
Benjamin Martinez,
Hang Yin,
Michael Lingelbach,
Minjune Hwang,
Ayano Hiranaka,
Sujay Garlanka,
Arman Aydin,
Sharon Lee,
Jiankai Sun,
Mona Anvari,
Manasi Sharma,
Dhruva Bansal,
Samuel Hunter,
Kyu-Young Kim,
Alan Lou,
Caleb R Matthews
, et al. (10 additional authors not shown)
Abstract:
We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with…
▽ More
We present BEHAVIOR-1K, a comprehensive simulation benchmark for human-centered robotics. BEHAVIOR-1K includes two components, guided and motivated by the results of an extensive survey on "what do you want robots to do for you?". The first is the definition of 1,000 everyday activities, grounded in 50 scenes (houses, gardens, restaurants, offices, etc.) with more than 9,000 objects annotated with rich physical and semantic properties. The second is OMNIGIBSON, a novel simulation environment that supports these activities via realistic physics simulation and rendering of rigid bodies, deformable bodies, and liquids. Our experiments indicate that the activities in BEHAVIOR-1K are long-horizon and dependent on complex manipulation skills, both of which remain a challenge for even state-of-the-art robot learning solutions. To calibrate the simulation-to-reality gap of BEHAVIOR-1K, we provide an initial study on transferring solutions learned with a mobile manipulator in a simulated apartment to its real-world counterpart. We hope that BEHAVIOR-1K's human-grounded nature, diversity, and realism make it valuable for embodied AI and robot learning research. Project website: https://behavior.stanford.edu.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Effective Fault Localization using Probabilistic and Grou** Approach
Authors:
Saksham Sahai Srivastava,
Arpita Dutta,
Rajib Mall
Abstract:
Context: Fault localization (FL) is the key activity while debugging a program. Any improvement to this activity leads to significant improvement in total software development cost. There is an internal linkage between the program spectrum and test execution result. Conditional probability in statistics captures the probability of occurring one event in relationship to one or more other events. Ob…
▽ More
Context: Fault localization (FL) is the key activity while debugging a program. Any improvement to this activity leads to significant improvement in total software development cost. There is an internal linkage between the program spectrum and test execution result. Conditional probability in statistics captures the probability of occurring one event in relationship to one or more other events. Objectives: The aim of this paper is to use the conception of conditional probability to design an effective fault localization technique. Methods: In the paper, we present a fault localization technique that derives the association between statement coverage information and test case execution result using condition probability statistics. This association with the failed test case result shows the fault containing the probability of that specific statement. Subsequently, we use a grou** method to refine the obtained statement ranking sequence for better fault localization. Results: We evaluated the effectiveness of proposed method over eleven open-source data sets. Our obtained results show that on average, the proposed CGFL method is 24.56% more effective than other contemporary fault localization methods such as D*, Tarantula, Ochiai, Crosstab, BPNN, RBFNN, DNN, and CNN. Conclusion: We devised an effective fault localization technique by combining the conditional probabilistic method with failed test case execution-based approach. Our experimental evaluation shows our proposed method outperforms the existing fault localization techniques.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Bridge the Future: High-Performance Networks in Confidential VMs without Trusted I/O devices
Authors:
Mengyuan Li,
Shashvat Srivastava,
Mengjia Yan
Abstract:
Trusted I/O (TIO) is an appealing solution to improve I/O performance for confidential VMs (CVMs), with the potential to eliminate broad sources of I/O overhead. However, this paper emphasizes that not all types of I/O can derive substantial benefits from TIO, particularly network I/O. Given the obligatory use of encryption protocols for network traffic in CVM's threat model, TIO's approach of I/O…
▽ More
Trusted I/O (TIO) is an appealing solution to improve I/O performance for confidential VMs (CVMs), with the potential to eliminate broad sources of I/O overhead. However, this paper emphasizes that not all types of I/O can derive substantial benefits from TIO, particularly network I/O. Given the obligatory use of encryption protocols for network traffic in CVM's threat model, TIO's approach of I/O encryption over the PCIe bus becomes redundant. Furthermore, TIO solutions need to expand the Trusted Computing Base (TCB) to include TIO devices and are commercially unavailable.
Motivated by these insights, the goal of this paper is to propose a software solution that helps CVMs immediately benefit from high-performance networks, while confining trust only to the on-chip CVM. We present FOLIO, a software solution crafted from a secure and efficient Data Plane Development Kit (DPDK) extension compatible with the latest version of AMD Secure Encrypted Virtualization (SEV), a.k.a., Secure Nested Paging (SNP). Our design is informed by a thorough analysis of all possible factors that impact SNP VM's network performance. By extensively removing overhead sources, we arrive at a design that approaches the efficiency of an optimal TIO-based configuration. Evaluation shows that FOLIO has a performance dip less than 6% relative to the optimal TIO configuration, while only relying on off-the-shelf CPUs.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
Authors:
Saurabh Srivastava,
Annarose M B,
Anto P V,
Shashank Menon,
Ajay Sukumar,
Adwaith Samod T,
Alan Philipose,
Stevin Prince,
Sooraj Thomas
Abstract:
We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with…
▽ More
We propose a framework for robust evaluation of reasoning capabilities of language models, using functional variants of benchmarks. Models that solve a reasoning test should exhibit no difference in performance over the static version of a problem compared to a snapshot of the functional variant. We have rewritten the relevant fragment of the MATH benchmark into its functional variant MATH(), with functionalization of other benchmarks to follow. When evaluating current state-of-the-art models over snapshots of MATH(), we find a reasoning gap -- the percentage difference between the static and functional accuracies. We find reasoning gaps from 58.35% to 80.31% among the state-of-the-art closed and open weights models that perform well on static benchmarks, with the caveat that the gaps are likely to be smaller with more sophisticated prompting strategies. Here we show that models which anecdotally have good reasoning performance over real-world tasks, have quantifiable lower gaps, motivating the open problem of building "gap 0" models. Code for evaluation and new evaluation datasets, three MATH() snapshots, are publicly available at https://github.com/consequentai/fneval/.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
From Reals to Logic and Back: Inventing Symbolic Vocabularies, Actions, and Models for Planning from Raw Data
Authors:
Naman Shah,
Jayesh Nagpal,
Pulkit Verma,
Siddharth Srivastava
Abstract:
Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Re…
▽ More
Hand-crafted, logic-based state and action representations have been widely used to overcome the intractable computational complexity of long-horizon robot planning problems, including task and motion planning problems. However, creating such representations requires experts with strong intuitions and detailed knowledge about the robot and the tasks it may need to accomplish in a given setting. Removing this dependency on human intuition is a highly active research area.
This paper presents the first approach for autonomously learning generalizable, logic-based relational representations for abstract states and actions starting from unannotated high-dimensional, real-valued robot trajectories. The learned representations constitute auto-invented PDDL-like domain models. Empirical results in deterministic settings show that powerful abstract representations can be learned from just a handful of robot trajectories; the learned relational representations include but go beyond classical, intuitive notions of high-level actions; and that the learned models allow planning algorithms to scale to tasks that were previously beyond the scope of planning without hand-crafted abstractions.
△ Less
Submitted 4 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Asynchronous Distributed Coordinated Hybrid Precoding in Multi-cell mmWave Wireless Networks
Authors:
Meesam Jafri,
Suraj Srivastava,
Sunil Kumar,
Aditya K. Jagannatham,
Lajos Hanzo
Abstract:
Asynchronous distributed hybrid beamformers (ADBF) are conceived for minimizing the total transmit power subject to signal-to-interference-plus-noise ratio (SINR) constraints at the users. Our design requires only limited information exchange between the base stations (BSs) of the mmWave multi-cell coordinated (MCC) networks considered. To begin with, a semidefinite relaxation (SDR)-based fully-di…
▽ More
Asynchronous distributed hybrid beamformers (ADBF) are conceived for minimizing the total transmit power subject to signal-to-interference-plus-noise ratio (SINR) constraints at the users. Our design requires only limited information exchange between the base stations (BSs) of the mmWave multi-cell coordinated (MCC) networks considered. To begin with, a semidefinite relaxation (SDR)-based fully-digital (FD) beamformer is designed for a centralized MCC system. Subsequently, a Bayesian learning (BL) technique is harnessed for decomposing the FD beamformer into its analog and baseband components and construct a hybrid transmit precoder (TPC). However, the centralized TPC design requires global channel state information (CSI), hence it results in a high signaling overhead. An alternating direction based method of multipliers (ADMM) technique is developed for a synchronous distributed beamformer (SDBF) design, which relies only on limited information exchange among the BSs, thus reducing the signaling overheads required by the centralized TPC design procedure.
However, the SDBF design is challenging, since it requires the updates from the BSs to be strictly synchronized. As a remedy, an ADBF framework is developed that mitigates the inter-cell interference (ICI) and also control the asynchrony in the system.
Furthermore, the above ADBF framework is also extended to the robust ADBF (R-ADBF) algorithm that incorporates the CSI uncertainty into the design procedure for minimizing the the worst-case transmit power. Our simulation results illustrate both the enhanced performance and the improved convergence properties of the ADMM-based ADBF and R-ADBF schemes.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Epistemic Exploration for Generalizable Planning and Learning in Non-Stationary Settings
Authors:
Rushang Karia,
Pulkit Verma,
Alberto Speranzon,
Siddharth Srivastava
Abstract:
This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed fra…
▽ More
This paper introduces a new approach for continual planning and model learning in relational, non-stationary stochastic environments. Such capabilities are essential for the deployment of sequential decision-making systems in the uncertain and constantly evolving real world. Working in such practical settings with unknown (and non-stationary) transition systems and changing tasks, the proposed framework models gaps in the agent's current state of knowledge and uses them to conduct focused, investigative explorations. Data collected using these explorations is used for learning generalizable probabilistic models for solving the current task despite continual changes in the environment dynamics. Empirical evaluations on several non-stationary benchmark domains show that this approach significantly outperforms planning and RL baselines in terms of sample complexity. Theoretical results show that the system exhibits desirable convergence properties when stationarity holds.
△ Less
Submitted 6 June, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
De-amplifying Bias from Differential Privacy in Language Model Fine-tuning
Authors:
Sanjari Srivastava,
Piotr Mardziel,
Zhikhun Zhang,
Archana Ahlawat,
Anupam Datta,
John C Mitchell
Abstract:
Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworth…
▽ More
Fairness and privacy are two important values machine learning (ML) practitioners often seek to operationalize in models. Fairness aims to reduce model bias for social/demographic sub-groups. Privacy via differential privacy (DP) mechanisms, on the other hand, limits the impact of any individual's training data on the resulting model. The trade-offs between privacy and fairness goals of trustworthy ML pose a challenge to those wishing to address both. We show that DP amplifies gender, racial, and religious bias when fine-tuning large language models (LLMs), producing models more biased than ones fine-tuned without DP. We find the cause of the amplification to be a disparity in convergence of gradients across sub-groups. Through the case of binary gender bias, we demonstrate that Counterfactual Data Augmentation (CDA), a known method for addressing bias, also mitigates bias amplification by DP. As a consequence, DP and CDA together can be used to fine-tune models while maintaining both fairness and privacy.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
Authors:
Kerem Zaman,
Leshem Choshen,
Shashank Srivastava
Abstract:
Model fusion research aims to aggregate the knowledge of multiple models to enhance performance by combining their weights. In this work, we study the inverse, investigating whether and how can model fusion interfere and reduce unwanted knowledge. We delve into the effects of model fusion on the evolution of learned shortcuts, social biases, and memorization capabilities in fine-tuned language mod…
▽ More
Model fusion research aims to aggregate the knowledge of multiple models to enhance performance by combining their weights. In this work, we study the inverse, investigating whether and how can model fusion interfere and reduce unwanted knowledge. We delve into the effects of model fusion on the evolution of learned shortcuts, social biases, and memorization capabilities in fine-tuned language models. Through several experiments covering text classification and generation tasks, our analysis highlights that shared knowledge among models is usually enhanced during model fusion, while unshared knowledge is usually lost or forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Leveraging Multiple Teachers for Test-Time Adaptation of Language-Guided Classifiers
Authors:
Kangda Wei,
Sayan Ghosh,
Rakesh R. Menon,
Shashank Srivastava
Abstract:
Recent approaches have explored language-guided classifiers capable of classifying examples from novel tasks when provided with task-specific natural language explanations, instructions or prompts (Sanh et al., 2022; R. Menon et al., 2022). While these classifiers can generalize in zero-shot settings, their task performance often varies substantially between different language explanations in unpr…
▽ More
Recent approaches have explored language-guided classifiers capable of classifying examples from novel tasks when provided with task-specific natural language explanations, instructions or prompts (Sanh et al., 2022; R. Menon et al., 2022). While these classifiers can generalize in zero-shot settings, their task performance often varies substantially between different language explanations in unpredictable ways (Lu et al., 2022; Gonen et al., 2022). Also, current approaches fail to leverage unlabeled examples that may be available in many scenarios. Here, we introduce TALC, a framework that uses data programming to adapt a language-guided classifier for a new task during inference when provided with explanations from multiple teachers and unlabeled test examples. Our results show that TALC consistently outperforms a competitive baseline from prior work by an impressive 9.3% (relative improvement). Further, we demonstrate the robustness of TALC to variations in the quality and quantity of provided explanations, highlighting its potential in scenarios where learning from multiple teachers or a crowd is involved. Our code is available at: https://github.com/WeiKangda/TALC.git.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There Outlier Words?
Authors:
Siddhant Jaydeep Mahajani,
Shashank Srivastava,
Alan F. Smeaton
Abstract:
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-defined weight indicating its sentiment polarity. These are usually manually assigned but the accuracy of these when compared against machine leaning based approaches to computing sentiment, are not known. It may be that there are lexical entries whose sentiment values cause a lexicon-based…
▽ More
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-defined weight indicating its sentiment polarity. These are usually manually assigned but the accuracy of these when compared against machine leaning based approaches to computing sentiment, are not known. It may be that there are lexical entries whose sentiment values cause a lexicon-based approach to give results which are very different to a machine learning approach. In this paper we compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach which is part of the Azure Cognitive Services family of APIs which is easy to use. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences. Our findings are that the importance of a word depends on the domain and there are no standout lexical entries which systematically cause differences in sentiment scores.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
OmniVec: Learning robust representations with cross modal sharing
Authors:
Siddharth Srivastava,
Gaurav Sharma
Abstract:
Majority of research in learning based methods has been towards designing and training networks for specific tasks. However, many of the learning based tasks, across modalities, share commonalities and could be potentially tackled in a joint framework. We present an approach in such direction, to learn multiple tasks, in multiple modalities, with a unified architecture. The proposed network is com…
▽ More
Majority of research in learning based methods has been towards designing and training networks for specific tasks. However, many of the learning based tasks, across modalities, share commonalities and could be potentially tackled in a joint framework. We present an approach in such direction, to learn multiple tasks, in multiple modalities, with a unified architecture. The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads. We first pre-train it by self-supervised masked training, followed by sequential training for the different tasks. We train the network on all major modalities, e.g.\ visual, audio, text and 3D, and report results on $22$ diverse and challenging public benchmarks. We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks. We also show generalization of the trained network on cross-modal tasks as well as unseen datasets and tasks.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models
Authors:
Yiyuan Li,
Rakesh R. Menon,
Sayan Ghosh,
Shashank Srivastava
Abstract:
Generalized quantifiers (e.g., few, most) are used to indicate the proportions predicates are satisfied (for example, some apples are red). One way to interpret quantifier semantics is to explicitly bind these satisfactions with percentage scopes (e.g., 30%-40% of apples are red). This approach can be helpful for tasks like logic formalization and surface-form quantitative reasoning (Gordon and Sc…
▽ More
Generalized quantifiers (e.g., few, most) are used to indicate the proportions predicates are satisfied (for example, some apples are red). One way to interpret quantifier semantics is to explicitly bind these satisfactions with percentage scopes (e.g., 30%-40% of apples are red). This approach can be helpful for tasks like logic formalization and surface-form quantitative reasoning (Gordon and Schubert, 2010; Roy et al., 2015). However, it remains unclear if recent foundation models possess this ability, as they lack direct training signals. To explore this, we introduce QuRe, a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences featuring percentage-equipped predicates. We explore quantifier comprehension in language models using PRESQUE, a framework that combines natural language inference and the Rational Speech Acts framework. Experimental results on the HVD dataset and QuRe illustrate that PRESQUE, employing pragmatic reasoning, performs 20% better than a literal reasoning baseline when predicting quantifier percentage scopes, with no additional training required.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
List Decoding of Tanner and Expander Amplified Codes from Distance Certificates
Authors:
Fernando Granha Jeronimo,
Shashank Srivastava,
Madhur Tulsiani
Abstract:
We develop new list decoding algorithms for Tanner codes and distance-amplified codes based on bipartite spectral expanders. We show that proofs exhibiting lower bounds on the minimum distance of these codes can be used as certificates discoverable by relaxations in the Sum-of-Squares (SoS) semidefinite programming hierarchy. Combining these certificates with certain entropic proxies to ensure tha…
▽ More
We develop new list decoding algorithms for Tanner codes and distance-amplified codes based on bipartite spectral expanders. We show that proofs exhibiting lower bounds on the minimum distance of these codes can be used as certificates discoverable by relaxations in the Sum-of-Squares (SoS) semidefinite programming hierarchy. Combining these certificates with certain entropic proxies to ensure that the solutions to the relaxations cover the entire list, then leads to algorithms for list decoding several families of codes up to the Johnson bound.
We prove the following:
- We show that the LDPC Tanner codes of Sipser-Spielman [IEEE Trans. Inf. Theory 1996] and Zémor [IEEE Trans. Inf. Theory 2001] with alphabet size $q$, block-length $n$ and distance $δ$, based on an expander graph with degree $d$, can be list-decoded up to distance $\mathcal{J}_q(δ) - ε$ in time $n^{O_{d,q}(1/ε^4)}$, where $\mathcal{J}_q(δ)$ denotes the Johnson bound.
- We show that the codes obtained via the expander-based distance amplification procedure of Alon, Edmonds and Luby [FOCS 1995] can be list-decoded close to the Johnson bound using the SoS hierarchy, by reducing the list decoding problem to unique decoding of the base code. In particular, starting from \emph{any} base code unique-decodable up to distance $δ$, one can obtain near-MDS codes with rate $R$ and distance $1-R - ε$, list-decodable up to the Johnson bound in time $n^{O_{ε, δ}(1)}$.
- We show that the locally testable codes of Dinur et al. [STOC 2022] with alphabet size $q$, block-length $n$ and distance $δ$ based on a square Cayley complex with generator sets of size $d$, can be list-decoded up to distance $\mathcal{J}_q(δ) - ε$ in time $n^{O_{d,q}(1/ε^{4})}$, where $\mathcal{J}_q(δ)$ denotes the Johnson bound.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving
Authors:
Tushar Choudhary,
Vikrant Dewangan,
Shivam Chandhok,
Shubham Priyadarshan,
Anushka Jain,
Arun K. Singh,
Siddharth Srivastava,
Krishna Murthy Jatavallabhula,
K. Madhava Krishna
Abstract:
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representation…
▽ More
Talk2BEV is a large vision-language model (LVLM) interface for bird's-eye view (BEV) maps in autonomous driving contexts. While existing perception systems for autonomous driving scenarios have largely focused on a pre-defined (closed) set of object categories and driving scenarios, Talk2BEV blends recent advances in general-purpose language and vision models with BEV-structured map representations, eliminating the need for task-specific models. This enables a single system to cater to a variety of autonomous driving tasks encompassing visual and spatial reasoning, predicting the intents of traffic actors, and decision-making based on visual cues. We extensively evaluate Talk2BEV on a large number of scene understanding tasks that rely on both the ability to interpret free-form natural language queries, and in grounding these queries to the visual context embedded into the language-enhanced BEV map. To enable further research in LVLMs for autonomous driving scenarios, we develop and release Talk2BEV-Bench, a benchmark encompassing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
△ Less
Submitted 14 November, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
Instances Need More Care: Rewriting Prompts for Instances with LLMs in the Loop Yields Better Zero-Shot Performance
Authors:
Saurabh Srivastava,
Chengyue Huang,
Weiguo Fan,
Ziyu Yao
Abstract:
Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing task generalizability. Despite its advancements, current methods using trigger phrases such as "Let's think step by step" remain limited. This study introduces PRomPTed, an approach that optimizes the zero-shot prompts for individual task instances followin…
▽ More
Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing task generalizability. Despite its advancements, current methods using trigger phrases such as "Let's think step by step" remain limited. This study introduces PRomPTed, an approach that optimizes the zero-shot prompts for individual task instances following an innovative manner of "LLMs in the loop". Our comprehensive evaluation across 13 datasets and 10 task types based on GPT-4 reveals that PRomPTed significantly outperforms both the naive zero-shot approaches and a strong baseline (i.e., "Output Refinement") which refines the task output instead of the input prompt. Our experimental results also confirmed the generalization of this advantage to the relatively weaker GPT-3.5. Even more intriguingly, we found that leveraging GPT-3.5 to rewrite prompts for the stronger GPT-4 not only matches but occasionally exceeds the efficacy of using GPT-4 as the prompt rewriter. Our research thus presents a huge value in not only enhancing zero-shot LLM performance but also potentially enabling supervising LLMs with their weaker counterparts, a capability attracting much interest recently. Finally, our additional experiments confirm the generalization of the advantages to open-source LLMs such as Mistral 7B and Mixtral 8x7B.
△ Less
Submitted 11 June, 2024; v1 submitted 3 October, 2023;
originally announced October 2023.
-
The intersection of video capsule endoscopy and artificial intelligence: addressing unique challenges using machine learning
Authors:
Shan Guleria,
Benjamin Schwartz,
Yash Sharma,
Philip Fernandes,
James Jablonski,
Sodiq Adewole,
Sanjana Srivastava,
Fisher Rhoads,
Michael Porter,
Michelle Yeghyayan,
Dylan Hyatt,
Andrew Copland,
Lubaina Ehsan,
Donald Brown,
Sana Syed
Abstract:
Introduction: Technical burdens and time-intensive review processes limit the practical utility of video capsule endoscopy (VCE). Artificial intelligence (AI) is poised to address these limitations, but the intersection of AI and VCE reveals challenges that must first be overcome. We identified five challenges to address. Challenge #1: VCE data are stochastic and contains significant artifact. Cha…
▽ More
Introduction: Technical burdens and time-intensive review processes limit the practical utility of video capsule endoscopy (VCE). Artificial intelligence (AI) is poised to address these limitations, but the intersection of AI and VCE reveals challenges that must first be overcome. We identified five challenges to address. Challenge #1: VCE data are stochastic and contains significant artifact. Challenge #2: VCE interpretation is cost-intensive. Challenge #3: VCE data are inherently imbalanced. Challenge #4: Existing VCE AIMLT are computationally cumbersome. Challenge #5: Clinicians are hesitant to accept AIMLT that cannot explain their process.
Methods: An anatomic landmark detection model was used to test the application of convolutional neural networks (CNNs) to the task of classifying VCE data. We also created a tool that assists in expert annotation of VCE data. We then created more elaborate models using different approaches including a multi-frame approach, a CNN based on graph representation, and a few-shot approach based on meta-learning.
Results: When used on full-length VCE footage, CNNs accurately identified anatomic landmarks (99.1%), with gradient weighted-class activation map** showing the parts of each frame that the CNN used to make its decision. The graph CNN with weakly supervised learning (accuracy 89.9%, sensitivity of 91.1%), the few-shot model (accuracy 90.8%, precision 91.4%, sensitivity 90.9%), and the multi-frame model (accuracy 97.5%, precision 91.5%, sensitivity 94.8%) performed well. Discussion: Each of these five challenges is addressed, in part, by one of our AI-based models. Our goal of producing high performance using lightweight models that aim to improve clinician confidence was achieved.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Benchmarking LLM powered Chatbots: Methods and Metrics
Authors:
Debarag Banerjee,
Pooja Singh,
Arjun Avadhanam,
Saksham Srivastava
Abstract:
Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we prop…
▽ More
Autonomous conversational agents, i.e. chatbots, are becoming an increasingly common mechanism for enterprises to provide support to customers and partners. In order to rate chatbots, especially ones powered by Generative AI tools like Large Language Models (LLMs) we need to be able to accurately assess their performance. This is where chatbot benchmarking becomes important. In this paper, we propose the use of a novel benchmark that we call the E2E (End to End) benchmark, and show how the E2E benchmark can be used to evaluate accuracy and usefulness of the answers provided by chatbots, especially ones powered by LLMs. We evaluate an example chatbot at different levels of sophistication based on both our E2E benchmark, as well as other available metrics commonly used in the state of art, and observe that the proposed benchmark show better results compared to others. In addition, while some metrics proved to be unpredictable, the metric associated with the E2E benchmark, which uses cosine similarity performed well in evaluating chatbots. The performance of our best models shows that there are several benefits of using the cosine similarity score as a metric in the E2E benchmark.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
FP-IRL: Fokker-Planck-based Inverse Reinforcement Learning -- A Physics-Constrained Approach to Markov Decision Processes
Authors:
Chengyang Huang,
Siddhartha Srivastava,
Xun Huan,
Krishna Garikipati
Abstract:
Inverse Reinforcement Learning (IRL) is a compelling technique for revealing the rationale underlying the behavior of autonomous agents. IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories. However, IRL needs a transition function, and most algorithms assume it is known or can be estimated in advance from data. It therefore becomes…
▽ More
Inverse Reinforcement Learning (IRL) is a compelling technique for revealing the rationale underlying the behavior of autonomous agents. IRL seeks to estimate the unknown reward function of a Markov decision process (MDP) from observed agent trajectories. However, IRL needs a transition function, and most algorithms assume it is known or can be estimated in advance from data. It therefore becomes even more challenging when such transition dynamics is not known a-priori, since it enters the estimation of the policy in addition to determining the system's evolution. When the dynamics of these agents in the state-action space is described by stochastic differential equations (SDE) in It^{o} calculus, these transitions can be inferred from the mean-field theory described by the Fokker-Planck (FP) equation. We conjecture there exists an isomorphism between the time-discrete FP and MDP that extends beyond the minimization of free energy (in FP) and maximization of the reward (in MDP). We identify specific manifestations of this isomorphism and use them to create a novel physics-aware IRL algorithm, FP-IRL, which can simultaneously infer the transition and reward functions using only observed trajectories. We employ variational system identification to infer the potential function in FP, which consequently allows the evaluation of reward, transition, and policy by leveraging the conjecture. We demonstrate the effectiveness of FP-IRL by applying it to a synthetic benchmark and a biological problem of cancer cell dynamics, where the transition function is inaccessible.
△ Less
Submitted 17 June, 2023;
originally announced June 2023.
-
Autonomous Capability Assessment of Sequential Decision-Making Systems in Stochastic Settings (Extended Version)
Authors:
Pulkit Verma,
Rushang Karia,
Siddharth Srivastava
Abstract:
It is essential for users to understand what their AI systems can and can't do in order to use them safely. However, the problem of enabling users to assess AI systems with sequential decision-making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requi…
▽ More
It is essential for users to understand what their AI systems can and can't do in order to use them safely. However, the problem of enabling users to assess AI systems with sequential decision-making (SDM) capabilities is relatively understudied. This paper presents a new approach for modeling the capabilities of black-box AI systems that can plan and act, along with the possible effects and requirements for executing those capabilities in stochastic settings. We present an active-learning approach that can effectively interact with a black-box SDM system and learn an interpretable probabilistic model describing its capabilities. Theoretical analysis of the approach identifies the conditions under which the learning process is guaranteed to converge to the correct model of the agent; empirical evaluations on different agents and simulated scenarios show that this approach is few-shot generalizable and can effectively describe the capabilities of arbitrary black-box SDM agents in a sample-efficient manner.
△ Less
Submitted 28 October, 2023; v1 submitted 7 June, 2023;
originally announced June 2023.
-
An Autoencoder-based Snow Drought Index
Authors:
Sinan Rasiya Koya,
Kanak Kanti Kar,
Shivendra Srivastava,
Tsegaye Tadesse,
Mark Svoboda,
Tirthankar Roy
Abstract:
In several regions across the globe, snow has a significant impact on hydrology. The amounts of water that infiltrate the ground and flow as runoff are driven by the melting of snow. Therefore, it is crucial to study the magnitude and effect of snowmelt. Snow droughts, resulting from reduced snow storage, can drastically impact the water supplies in basins where snow predominates, such as in the w…
▽ More
In several regions across the globe, snow has a significant impact on hydrology. The amounts of water that infiltrate the ground and flow as runoff are driven by the melting of snow. Therefore, it is crucial to study the magnitude and effect of snowmelt. Snow droughts, resulting from reduced snow storage, can drastically impact the water supplies in basins where snow predominates, such as in the western United States. Hence, it is important to detect the time and severity of snow droughts efficiently. We propose Snow Drought Response Index or SnoDRI, a novel indicator that could be used to identify and quantify snow drought occurrences. Our index is calculated using cutting-edge ML algorithms from various snow-related variables. The self-supervised learning of an autoencoder is combined with mutual information in the model. In this study, we use random forests for feature extraction for SnoDRI and assess the importance of each variable. We use reanalysis data (NLDAS-2) from 1981 to 2021 for the Pacific United States to study the efficacy of the new snow drought index. We evaluate the index by confirming the coincidence of its interpretation and the actual snow drought incidents.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
MAILEX: Email Event and Argument Extraction
Authors:
Saurabh Srivastava,
Gaurav Singh,
Shou Matsumoto,
Ali Raz,
Paulo Costa,
Joshua Poore,
Ziyu Yao
Abstract:
In this work, we present the first dataset, MailEx, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes 1.5K email threads and ~4K emails, which are annotated with totally ~8K event instances. To understand the task challenges, we conducted a series…
▽ More
In this work, we present the first dataset, MailEx, for performing event extraction from conversational email threads. To this end, we first proposed a new taxonomy covering 10 event types and 76 arguments in the email domain. Our final dataset includes 1.5K email threads and ~4K emails, which are annotated with totally ~8K event instances. To understand the task challenges, we conducted a series of experiments comparing three types of approaches, i.e., fine-tuned sequence labeling, fine-tuned generative extraction, and few-shot in-context learning. Our results showed that the task of email event extraction is far from being addressed, due to challenges lying in, e.g., extracting non-continuous, shared trigger spans, extracting non-named entity arguments, and modeling the email conversational history. Our work thus suggests more future investigations in this domain-specific event extraction task.
△ Less
Submitted 20 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
MaNtLE: Model-agnostic Natural Language Explainer
Authors:
Rakesh R. Menon,
Kerem Zaman,
Shashank Srivastava
Abstract:
Understanding the internal reasoning behind the predictions of machine learning systems is increasingly vital, given their rising adoption and acceptance. While previous approaches, such as LIME, generate algorithmic explanations by attributing importance to input features for individual examples, recent research indicates that practitioners prefer examining language explanations that explain sub-…
▽ More
Understanding the internal reasoning behind the predictions of machine learning systems is increasingly vital, given their rising adoption and acceptance. While previous approaches, such as LIME, generate algorithmic explanations by attributing importance to input features for individual examples, recent research indicates that practitioners prefer examining language explanations that explain sub-groups of examples. In this paper, we introduce MaNtLE, a model-agnostic natural language explainer that analyzes multiple classifier predictions and generates faithful natural language explanations of classifier rationale for structured classification tasks. MaNtLE uses multi-task training on thousands of synthetic classification tasks to generate faithful explanations. Simulated user studies indicate that, on average, MaNtLE-generated explanations are at least 11% more faithful compared to LIME and Anchors explanations across three tasks. Human evaluations demonstrate that users can better predict model behavior using explanations from MaNtLE compared to other techniques
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Beyond Labels: Empowering Human Annotators with Natural Language Explanations through a Novel Active-Learning Architecture
Authors:
Bingsheng Yao,
Ishan **dal,
Lucian Popa,
Yannis Katsis,
Sayan Ghosh,
Lihong He,
Yuxuan Lu,
Shashank Srivastava,
Yunyao Li,
James Hendler,
Dakuo Wang
Abstract:
Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to suppo…
▽ More
Real-world domain experts (e.g., doctors) rarely annotate only a decision label in their day-to-day workflow without providing explanations. Yet, existing low-resource learning techniques, such as Active Learning (AL), that aim to support human annotators mostly focus on the label while neglecting the natural language explanation of a data point. This work proposes a novel AL architecture to support experts' real-world need for label and explanation annotations in low-resource scenarios. Our AL architecture leverages an explanation-generation model to produce explanations guided by human explanations, a prediction model that utilizes generated explanations toward prediction faithfully, and a novel data diversity-based AL sampling strategy that benefits from the explanation annotations. Automated and human evaluations demonstrate the effectiveness of incorporating explanations into AL sampling and the improved human annotation efficiency and trustworthiness with our AL architecture. Additional ablation studies illustrate the potential of our AL architecture for transfer learning, generalizability, and integration with large language models (LLMs). While LLMs exhibit exceptional explanation-generation capabilities for relatively simple tasks, their effectiveness in complex real-world tasks warrants further in-depth study.
△ Less
Submitted 23 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Robust Hybrid Transceiver Designs for Linear Decentralized Estimation in mmWave MIMO IoT Networks in the Face of Imperfect CSI
Authors:
Priyanka Maity,
Kunwar Pritiraj Rajput,
Suraj Srivastava,
Naveen K. D. Venkategowda,
Aditya K. Jagannatham,
Lajos Hanzo
Abstract:
Hybrid transceivers are designed for linear decentralized estimation (LDE) in a mmWave multiple-input multiple-output (MIMO) IoT network (IoTNe). For a noiseless fusion center (FC), it is demonstrated that the MSE performance is determined by the number of RF chains used at each IoT node (IoTNo). Next, the minimum-MSE RF transmit precoders (TPCs) and receive combiner (RC) matrices are designed for…
▽ More
Hybrid transceivers are designed for linear decentralized estimation (LDE) in a mmWave multiple-input multiple-output (MIMO) IoT network (IoTNe). For a noiseless fusion center (FC), it is demonstrated that the MSE performance is determined by the number of RF chains used at each IoT node (IoTNo). Next, the minimum-MSE RF transmit precoders (TPCs) and receive combiner (RC) matrices are designed for this setup using the dominant array response vectors, and subsequently, a closed-form expression is obtained for the baseband (BB) TPC at each IoTNo using Cauchy's interlacing theorem. For a realistic noisy FC, it is shown that the resultant mean squared error (MSE) minimization problem is non-convex. To address this challenge, a block-coordinate descent-based iterative scheme is proposed to obtain the fully digital TPC and RC matrices followed by the simultaneous orthogonal matching pursuit (SOMP) technique for decomposing the fully-digital transceiver into its corresponding RF and BB components. A theoretical proof of the convergence is also presented for the proposed iterative design procedure. Furthermore, robust hybrid transceiver designs are also derived for a practical scenario in the face of channel state information (CSI) uncertainty. The centralized MMSE lower bound has also been derived that benchmarks the performance of the proposed LDE schemes. Finally, our numerical results characterize the performance of the proposed transceivers as well as corroborate our various analytical propositions.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing
Authors:
Hao Yan,
Saurabh Srivastava,
Yintao Tai,
Sida I. Wang,
Wen-tau Yih,
Ziyu Yao
Abstract:
Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than the traditional one-shot semantic parsing. However, prior work has heavily relied on human-annotated feedback data to train the interactive semantic parser, which is prohibitively expensive and not scalable. In this work, w…
▽ More
Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than the traditional one-shot semantic parsing. However, prior work has heavily relied on human-annotated feedback data to train the interactive semantic parser, which is prohibitively expensive and not scalable. In this work, we propose a new task of simulating NL feedback for interactive semantic parsing. We accompany the task with a novel feedback evaluator. The evaluator is specifically designed to assess the quality of the simulated feedback, based on which we decide the best feedback simulator from our proposed variants. On a text-to-SQL dataset, we show that our feedback simulator can generate high-quality NL feedback to boost the error correction ability of a specific parser. In low-data settings, our feedback simulator can help achieve comparable error correction performance as trained using the costly, full set of human annotations.
△ Less
Submitted 4 June, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
Learning and Reasoning Multifaceted and Longitudinal Data for Poverty Estimates and Livelihood Capabilities of Lagged Regions in Rural India
Authors:
Atharva Kulkarni,
Raya Das,
Ravi S. Srivastava,
Tanmoy Chakraborty
Abstract:
Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to ex…
▽ More
Poverty is a multifaceted phenomenon linked to the lack of capabilities of households to earn a sustainable livelihood, increasingly being assessed using multidimensional indicators. Its spatial pattern depends on social, economic, political, and regional variables. Artificial intelligence has shown immense scope in analyzing the complexities and nuances of poverty. The proposed project aims to examine the poverty situation of rural India for the period of 1990-2022 based on the quality of life and livelihood indicators. The districts will be classified into `advanced', `catching up', `falling behind', and `lagged' regions. The project proposes to integrate multiple data sources, including conventional national-level large sample household surveys, census surveys, and proxy variables like daytime, and nighttime data from satellite images, and communication networks, to name a few, to provide a comprehensive view of poverty at the district level. The project also intends to examine causation and longitudinal analysis to examine the reasons for poverty. Poverty and inequality could be widening in develo** countries due to demographic and growth-agglomerating policies. Therefore, targeting the lagging regions and the vulnerable population is essential to eradicate poverty and improve the quality of life to achieve the goal of `zero poverty'. Thus, the study also focuses on the districts with a higher share of the marginal section of the population compared to the national average to trace the performance of development indicators and their association with poverty in these regions.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Machine Learning and the Future of Bayesian Computation
Authors:
Steven Winter,
Trevor Campbell,
Lizhen Lin,
Sanvesh Srivastava,
David B. Dunson
Abstract:
Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional model…
▽ More
Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty through the posterior distribution. Practical posterior computation is commonly performed via MCMC, which can be computationally infeasible for high dimensional models with many observations. In this article we discuss the potential to improve posterior computation using ideas from machine learning. Concrete future directions are explored in vignettes on normalizing flows, Bayesian coresets, distributed Bayesian inference, and variational inference.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages
Authors:
Sudhanshu Srivastava,
Ishika Gupta,
Anusha Prakash,
Jom Kuriakose,
Hema A. Murthy
Abstract:
Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility in speaking styles along with fast training and synthesis while being computationally less intense. HTS performs well even in low-resource scenarios. The primary drawback is that the voice quality is poor compared to that of E2E systems. A hybrid approach combining HMM-based feature generation and neural-network-based HiFi-GAN…
▽ More
Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility in speaking styles along with fast training and synthesis while being computationally less intense. HTS performs well even in low-resource scenarios. The primary drawback is that the voice quality is poor compared to that of E2E systems. A hybrid approach combining HMM-based feature generation and neural-network-based HiFi-GAN vocoder to improve HTS synthesis quality is proposed. HTS is trained on high-resolution mel-spectrograms instead of conventional mel generalized coefficients (MGC), and the output mel-spectrogram corresponding to the input text is used in a HiFi-GAN vocoder trained on Indic languages, to produce naturalness that is equivalent to that of E2E systems, as evidenced from the DMOS and PC tests.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
Identifying and Manipulating the Personality Traits of Language Models
Authors:
Graham Caron,
Shashank Srivastava
Abstract:
Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For exa…
▽ More
Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For example, is a language model such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), language models such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their `Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.
△ Less
Submitted 20 December, 2022;
originally announced December 2022.
-
Predicting Citi Bike Demand Evolution Using Dynamic Graphs
Authors:
Alexander Saff,
Mayur Bhandary,
Siddharth Srivastava
Abstract:
Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.
Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
LaSQuE: Improved Zero-Shot Classification from Explanations Through Quantifier Modeling and Curriculum Learning
Authors:
Sayan Ghosh,
Rakesh R Menon,
Shashank Srivastava
Abstract:
A hallmark of human intelligence is the ability to learn new concepts purely from language. Several recent approaches have explored training machine learning models via natural language supervision. However, these approaches fall short in leveraging linguistic quantifiers (such as 'always' or 'rarely') and mimicking humans in compositionally learning complex tasks. Here, we present LaSQuE, a metho…
▽ More
A hallmark of human intelligence is the ability to learn new concepts purely from language. Several recent approaches have explored training machine learning models via natural language supervision. However, these approaches fall short in leveraging linguistic quantifiers (such as 'always' or 'rarely') and mimicking humans in compositionally learning complex tasks. Here, we present LaSQuE, a method that can learn zero-shot classifiers from language explanations by using three new strategies - (1) modeling the semantics of linguistic quantifiers in explanations (including exploiting ordinal strength relationships, such as 'always' > 'likely'), (2) aggregating information from multiple explanations using an attention-based mechanism, and (3) model training via curriculum learning. With these strategies, LaSQuE outperforms prior work, showing an absolute gain of up to 7% in generalizing to unseen real-world classification tasks.
△ Less
Submitted 18 December, 2022;
originally announced December 2022.
-
Hierarchical Decomposition and Analysis for Generalized Planning
Authors:
Siddharth Srivastava
Abstract:
This paper presents new methods for analyzing and evaluating generalized plans that can solve broad classes of related planning problems. Although synthesis and learning of generalized plans has been a longstanding goal in AI, it remains challenging due to fundamental gaps in methods for analyzing the scope and utility of a given generalized plan. This paper addresses these gaps by develo** a ne…
▽ More
This paper presents new methods for analyzing and evaluating generalized plans that can solve broad classes of related planning problems. Although synthesis and learning of generalized plans has been a longstanding goal in AI, it remains challenging due to fundamental gaps in methods for analyzing the scope and utility of a given generalized plan. This paper addresses these gaps by develo** a new conceptual framework along with proof techniques and algorithmic processes for assessing termination and goal-reachability related properties of generalized plans. We build upon classic results from graph theory to decompose generalized plans into smaller components that are then used to derive hierarchical termination arguments. These methods can be used to determine the utility of a given generalized plan, as well as to guide the synthesis and learning processes for generalized plans. We present theoretical as well as empirical results illustrating the scope of this new approach. Our analysis shows that this approach significantly extends the class of generalized plans that can be assessed automatically, thereby reducing barriers in the synthesis and learning of reliable generalized plans.
△ Less
Submitted 26 June, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback
Authors:
Josh Abramson,
Arun Ahuja,
Federico Carnevale,
Petko Georgiev,
Alex Goldin,
Alden Hung,
Jessica Landon,
Jirka Lhotka,
Timothy Lillicrap,
Alistair Muldal,
George Powell,
Adam Santoro,
Guy Scully,
Sanjana Srivastava,
Tamara von Glehn,
Greg Wayne,
Nathaniel Wong,
Chen Yan,
Rui Zhu
Abstract:
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulate…
▽ More
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback. Here we demonstrate how to use reinforcement learning from human feedback (RLHF) to improve upon simulated, embodied agents trained to a base level of competency with imitation learning. First, we collected data of humans interacting with agents in a simulated 3D world. We then asked annotators to record moments where they believed that agents either progressed toward or regressed from their human-instructed goal. Using this annotation data we leveraged a novel method - which we call "Inter-temporal Bradley-Terry" (IBT) modelling - to build a reward model that captures human judgments. Agents trained to optimise rewards delivered from IBT reward models improved with respect to all of our metrics, including subsequent human judgment during live interactions with agents. Altogether our results demonstrate how one can successfully leverage human judgments to improve agent behaviour, allowing us to use reinforcement learning in complex, embodied domains without programmatic reward functions. Videos of agent behaviour may be found at https://youtu.be/v_Z9F2_eKk4.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages
Authors:
Anusha Prakash,
Arun Kumar,
Ashish Seth,
Bhagyashree Mukherjee,
Ishika Gupta,
Jom Kuriakose,
Jordan Fernandes,
K V Vikram,
Mano Ranjith Kumar M,
Metilda Sagaya Mary,
Mohammad Wajahat,
Mohana N,
Mudit Batra,
Navina K,
Nihal John George,
Nithya Ravi,
Pruthwik Mishra,
Sudhanshu Srivastava,
Vasista Sai Lodagala,
Vandan Mujadia,
Kada Sai Venkata Vineeth,
Vrunda Sukhadia,
Dipti Sharma,
Hema Murthy,
Pushpak Bhattacharya
, et al. (2 additional authors not shown)
Abstract:
Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages…
▽ More
Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
LMPriors: Pre-Trained Language Models as Task-Specific Priors
Authors:
Kristy Choi,
Chris Cundy,
Sanjari Srivastava,
Stefano Ermon
Abstract:
Particularly in low-data regimes, an outstanding challenge in machine learning is develo** principled techniques for augmenting our models with suitable priors. This is to encourage them to learn in ways that are compatible with our understanding of the world. But in contrast to generic priors such as shrinkage or sparsity, we draw inspiration from the recent successes of large-scale language mo…
▽ More
Particularly in low-data regimes, an outstanding challenge in machine learning is develo** principled techniques for augmenting our models with suitable priors. This is to encourage them to learn in ways that are compatible with our understanding of the world. But in contrast to generic priors such as shrinkage or sparsity, we draw inspiration from the recent successes of large-scale language models (LMs) to construct task-specific priors distilled from the rich knowledge of LMs. Our method, Language Model Priors (LMPriors), incorporates auxiliary natural language metadata about the task -- such as variable names and descriptions -- to encourage downstream model outputs to be consistent with the LM's common-sense reasoning based on the metadata. Empirically, we demonstrate that LMPriors improve model performance in settings where such natural language descriptions are available, and perform well on several tasks that benefit from such prior knowledge, such as feature selection, causal inference, and safe reinforcement learning.
△ Less
Submitted 22 October, 2022;
originally announced October 2022.
-
What do Large Language Models Learn beyond Language?
Authors:
Avinash Madasu,
Shashank Srivastava
Abstract:
Large language models (LMs) have rapidly become a mainstay in Natural Language Processing. These models are known to acquire rich linguistic knowledge from training on large amounts of text. In this paper, we investigate if pre-training on text also confers these models with helpful `inductive biases' for non-linguistic reasoning. On a set of 19 diverse non-linguistic tasks involving quantitative…
▽ More
Large language models (LMs) have rapidly become a mainstay in Natural Language Processing. These models are known to acquire rich linguistic knowledge from training on large amounts of text. In this paper, we investigate if pre-training on text also confers these models with helpful `inductive biases' for non-linguistic reasoning. On a set of 19 diverse non-linguistic tasks involving quantitative computations, recognizing regular expressions and reasoning over strings. We find that pretrained models significantly outperform comparable non-pretrained neural models. This remains true also in experiments with training non-pretrained models with fewer parameters to account for model regularization effects. We further explore the effect of text domain on LMs by pretraining models from text from different domains and provenances. Our experiments surprisingly reveal that the positive effects of pre-training persist even when pretraining on multi-lingual text or computer code, and even for text generated from synthetic languages. Our findings suggest a hitherto unexplored deep connection between pre-training and inductive learning abilities of language models.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Learning Dynamic Abstract Representations for Sample-Efficient Reinforcement Learning
Authors:
Mehdi Dadvar,
Rashmeet Kaur Nayyar,
Siddharth Srivastava
Abstract:
In many real-world problems, the learning agent needs to learn a problem's abstractions and solution simultaneously. However, most such abstractions need to be designed and refined by hand for different problems and domains of application. This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning. Starting with state variables and a…
▽ More
In many real-world problems, the learning agent needs to learn a problem's abstractions and solution simultaneously. However, most such abstractions need to be designed and refined by hand for different problems and domains of application. This paper presents a novel top-down approach for constructing state abstractions while carrying out reinforcement learning. Starting with state variables and a simulator, it presents a novel domain-independent approach for dynamically computing an abstraction based on the dispersion of Q-values in abstract states as the agent continues acting and learning. Extensive empirical evaluation on multiple domains and problems shows that this approach automatically learns abstractions that are finely-tuned to the problem, yield powerful sample efficiency, and result in the RL agent significantly outperforming existing approaches.
△ Less
Submitted 8 December, 2022; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Multi-Task Option Learning and Discovery for Stochastic Path Planning
Authors:
Naman Shah,
Siddharth Srivastava
Abstract:
This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon stochastic path planning problems. Starting with a vanilla RL formulation with a stochastic dynamics simulator and an occupancy matrix of the environment, our approach computes useful options with policies as well as high-level paths that compose the discovered options. Our main contributions are (1)…
▽ More
This paper addresses the problem of reliably and efficiently solving broad classes of long-horizon stochastic path planning problems. Starting with a vanilla RL formulation with a stochastic dynamics simulator and an occupancy matrix of the environment, our approach computes useful options with policies as well as high-level paths that compose the discovered options. Our main contributions are (1) data-driven methods for creating abstract states that serve as endpoints for helpful options, (2) methods for computing option policies using auto-generated option guides in the form of dense pseudo-reward functions, and (3) an overarching algorithm for composing the computed options. We show that this approach yields strong guarantees of executability and solvability: under fairly general conditions, the computed option guides lead to composable option policies and consequently ensure downward refinability. Empirical evaluation on a range of robots, environments, and tasks shows that this approach effectively transfers knowledge across related tasks and that it outperforms existing approaches by a significant margin.
△ Less
Submitted 8 December, 2022; v1 submitted 30 September, 2022;
originally announced October 2022.
-
Generative Visual Prompt: Unifying Distributional Control of Pre-Trained Generative Models
Authors:
Chen Henry Wu,
Saman Motamed,
Shaunak Srivastava,
Fernando De la Torre
Abstract:
Generative models (e.g., GANs, diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a particular region of the output space or sampling evenly over a range of characteristics. For efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control o…
▽ More
Generative models (e.g., GANs, diffusion models) learn the underlying data distribution in an unsupervised manner. However, many applications of interest require sampling from a particular region of the output space or sampling evenly over a range of characteristics. For efficient sampling in these scenarios, we propose Generative Visual Prompt (PromptGen), a framework for distributional control over pre-trained generative models by incorporating knowledge of other off-the-shelf models. PromptGen defines control as energy-based models (EBMs) and samples images in a feed-forward manner by approximating the EBM with invertible neural networks, avoiding optimization at inference. Our experiments demonstrate how PromptGen can efficiently sample from several unconditional generative models (e.g., StyleGAN2, StyleNeRF, diffusion autoencoder, NVAE) in a controlled or/and de-biased manner using various off-the-shelf models: (1) with the CLIP model as control, PromptGen can sample images guided by text, (2) with image classifiers as control, PromptGen can de-bias generative models across a set of attributes or attribute combinations, and (3) with inverse graphics models as control, PromptGen can sample images of the same identity in different poses. (4) Finally, PromptGen reveals that the CLIP model shows a "reporting bias" when used as control, and PromptGen can further de-bias this controlled distribution in an iterative manner. The code is available at https://github.com/ChenWu98/Generative-Visual-Prompt.
△ Less
Submitted 17 October, 2022; v1 submitted 14 September, 2022;
originally announced September 2022.