Search | arXiv e-print repository

InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation

Authors: Gaurav Sahu, Abhay Puri, Juan Rodriguez, Alexandre Drouin, Perouz Taslakian, Valentina Zantedeschi, Alexandre Lacoste, David Vazquez, Nicolas Chapados, Christopher Pal, Sai Rajeswar Mudumba, Issam Hadj Laradji

Abstract: Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We introduce InsightBench, a benchmark dataset with three key features. First, it consists of 31 datasets representing diverse business use cases such as finance and incident management, each accompanied by a carefully curated set of insights planted in the datasets.… ▽ More Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We introduce InsightBench, a benchmark dataset with three key features. First, it consists of 31 datasets representing diverse business use cases such as finance and incident management, each accompanied by a carefully curated set of insights planted in the datasets. Second, unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics, including formulating questions, interpreting answers, and generating a summary of insights and actionable steps. Third, we conducted comprehensive quality assurance to ensure that each dataset in the benchmark had clear goals and included relevant and meaningful questions and analysis. Furthermore, we implement a two-way evaluation mechanism using LLaMA-3-Eval as an effective, open-source evaluator method to assess agents' ability to extract insights. We also propose AgentPoirot, our baseline data analysis agent capable of performing end-to-end data analytics. Our evaluation on InsightBench shows that AgentPoirot outperforms existing approaches (such as Pandas Agent) that focus on resolving single queries. We also compare the performance of open- and closed-source LLMs and various evaluation strategies. Overall, this benchmark serves as a testbed to motivate further development in comprehensive data analytics and can be accessed here: https://github.com/ServiceNow/insight-bench. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03471 [pdf, other]

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Authors: Benno Krojer, Dheeraj Vattikonda, Luis Lara, Varun Jampani, Eva Portelance, Christopher Pal, Siva Reddy

Abstract: An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static dat… ▽ More An image editing model should be able to perform diverse edits, ranging from object replacement, changing attributes or style, to performing actions or movement, which require many forms of reasoning. Current general instruction-guided editing models have significant shortcomings with action and reasoning-centric edits. Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g. physical dynamics, temporality and spatial reasoning. To this end, we meticulously curate the AURORA Dataset (Action-Reasoning-Object-Attribute), a collection of high-quality training data, human-annotated and curated from videos and simulation engines. We focus on a key aspect of quality training data: triplets (source image, prompt, target image) contain a single meaningful visual change described by the prompt, i.e., truly minimal changes between source and target images. To demonstrate the value of our dataset, we evaluate an AURORA-finetuned model on a new expert-curated benchmark (AURORA-Bench) covering 8 diverse editing tasks. Our model significantly outperforms previous editing models as judged by human raters. For automatic evaluations, we find important flaws in previous metrics and caution their use for semantically hard editing tasks. Instead, we propose a new automatic metric that focuses on discriminative understanding. We hope that our efforts : (1) curating a quality training dataset and an evaluation benchmark, (2) develo** critical evaluations, and (3) releasing a state-of-the-art model, will fuel further progress on general image editing. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Submitted to NeurIPS (Dataset & Benchmarks)

arXiv:2407.02362 [pdf, other]

Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Authors: Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

Abstract: Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MAD… ▽ More Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators. △ Less

Submitted 7 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.11811 [pdf, other]

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Authors: Joao Monteiro, Pierre-Andre Noel, Etienne Marcotte, Sai Rajeswar, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Abstract: Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training se… ▽ More Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includes encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.05630 [pdf, other]

Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

Authors: Ge Ya Luo, Zhi Hao Luo, Anthony Gosselin, Alexia Jolicoeur-Martineau, Christopher Pal

Abstract: With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor… ▽ More With recent advances in video prediction, controllable video generation has been attracting more attention. Generating high fidelity videos according to simple and flexible conditioning is of particular interest. To this end, we propose a controllable video generation model using pixel level renderings of 2D or 3D bounding boxes as conditioning. In addition, we also create a bounding box predictor that, given the initial and ending frames' bounding boxes, can predict up to 15 bounding boxes per frame for all the frames in a 25-frame clip. We perform experiments across 3 well-known AV video datasets: KITTI, Virtual-KITTI 2 and BDD100k. △ Less

Submitted 21 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04940 [pdf, other]

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Authors: Matthew Fortier, Mats L. Richter, Oliver Sonnentag, Chris Pal

Abstract: Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote… ▽ More Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO$_2$ emissions. The importance of predicting carbon fluxes has led to the emerging field of data-driven carbon flux modelling (DDCFM), which uses statistical techniques to predict carbon fluxes from biophysical data. However, the field lacks a standardized dataset to promote comparisons between models. To address this gap, we present CarbonSense, the first machine learning-ready dataset for DDCFM. CarbonSense integrates measured carbon fluxes, meteorological predictors, and satellite imagery from 385 locations across the globe, offering comprehensive coverage and facilitating robust model training. Additionally, we provide a baseline model using a current state-of-the-art DDCFM approach and a novel transformer based model. Our experiments illustrate the potential gains that multimodal deep learning techniques can bring to this domain. By providing these resources, we aim to lower the barrier to entry for other deep learning researchers to develop new models and drive new advances in carbon flux modelling. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 9 content pages, 11 reference pages, 9 appendix pages

arXiv:2405.13022 [pdf, other]

LLMs can learn self-restraint through iterative self-reflection

Authors: Alexandre Piché, Aristides Milios, Dzmitry Bahdanau, Chris Pal

Abstract: In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likeli… ▽ More In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of "self-reflection" consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention. △ Less

Submitted 3 July, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.15420 [pdf, other]

XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference

Authors: João Monteiro, Étienne Marcotte, Pierre-André Noël, Valentina Zantedeschi, David Vázquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Abstract: In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right contex… ▽ More In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference information. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2403.19918 [pdf, other]

CtRL-Sim: Reactive and Controllable Driving Agents with Offline Reinforcement Learning

Authors: Luke Rowe, Roger Girgis, Anthony Gosselin, Bruno Carrez, Florian Golemo, Felix Heide, Liam Paull, Christopher Pal

Abstract: Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or… ▽ More Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviours. In this work, we take an alternative approach and propose CtRL-Sim, a method that leverages return-conditioned offline reinforcement learning to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through a physics-enhanced Nocturne simulator to generate a diverse offline reinforcement learning dataset, annotated with various reward terms. With this dataset, we train a return-conditioned multi-agent behaviour model that allows for fine-grained manipulation of agent behaviours by modifying the desired returns for the various reward components. This capability enables the generation of a wide range of driving behaviours beyond the scope of the initial dataset, including adversarial behaviours. We demonstrate that CtRL-Sim can generate diverse and realistic safety-critical scenarios while providing fine-grained control over agent behaviours. △ Less

Submitted 14 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 21 pages, 9 figures, 8 tables

arXiv:2403.14443 [pdf, other]

Language Models Can Reduce Asymmetry in Information Markets

Authors: Nasim Rahaman, Martin Weiss, Manuel Wüthrich, Yoshua Bengio, Li Erran Li, Chris Pal, Bernhard Schölkopf

Abstract: This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The c… ▽ More This work addresses the buyer's inspection paradox for information markets. The paradox is that buyers need to access information to determine its value, while sellers need to limit access to prevent theft. To study this, we introduce an open-source simulated digital marketplace where intelligent agents, powered by language models, buy and sell information on behalf of external participants. The central mechanism enabling this marketplace is the agents' dual capabilities: they not only have the capacity to assess the quality of privileged information but also come equipped with the ability to forget. This ability to induce amnesia allows vendors to grant temporary access to proprietary information, significantly reducing the risk of unauthorized retention while enabling agents to accurately gauge the information's relevance to specific queries or tasks. To perform well, agents must make rational decisions, strategically explore the marketplace through generated sub-queries, and synthesize answers from purchased information. Concretely, our experiments (a) uncover biases in language models leading to irrational behavior and evaluate techniques to mitigate these biases, (b) investigate how price affects demand in the context of informational goods, and (c) show that inspection and higher budgets both lead to higher quality outcomes. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.05236 [pdf, other]

Modeling Fault Recovery and Transient Stability of Grid-Forming Converters Equipped With Current Reference Limitation

Authors: Ali Arjomandi-Nezhad, Yifei Guo, Bikash C. Pal, Guangya Yang

Abstract: When grid-forming (GFM) inverter-based resources (IBRs) face severe grid disturbances (e.g., short-circuit faults), the current limitation mechanism may be triggered. Consequently, the GFM IBRs enter the current-saturation mode, inducing nonlinear dynamical behaviors and posing great challenges to the post-disturbance transient angle stability. This paper presents a systematic study to reveal the… ▽ More When grid-forming (GFM) inverter-based resources (IBRs) face severe grid disturbances (e.g., short-circuit faults), the current limitation mechanism may be triggered. Consequently, the GFM IBRs enter the current-saturation mode, inducing nonlinear dynamical behaviors and posing great challenges to the post-disturbance transient angle stability. This paper presents a systematic study to reveal the fault recovery behaviors of a GFM IBR and identify the risk of instability. A closed-form expression for the necessary condition that a GFM IBR returns from the current-saturation mode to the normal operation mode is presented. Based on these analyses, it has been inferred that the angle of the magnitude-saturated current significantly affects the post-fault recovery and transient stability; with different angle selection, the system may follow multiple post-fault trajectories depending on those conditions: 1) Convergence to a normal stable equilibrium point (SEP), 2) convergence to a saturated stable equilibrium point (satSEP), or 3) divergence (instability). In this paper, the circumstances under which a GFM IBR cannot escape from the current-saturation mode are thoroughly investigated. The theoretical analyses are verified by dynamic simulations. △ Less

Submitted 9 July, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 13 pages, 22 figures

arXiv:2402.06143 [pdf, other]

Reinforcement Learning for Blind Stair Climbing with Legged and Wheeled-Legged Robots

Authors: Simon Chamorro, Victor Klemm, Miguel de la Iglesia Valls, Christopher Pal, Roland Siegwart

Abstract: In recent years, legged and wheeled-legged robots have gained prominence for tasks in environments predominantly created for humans across various domains. One significant challenge faced by many of these robots is their limited capability to navigate stairs, which hampers their functionality in multi-story environments. This study proposes a method aimed at addressing this limitation, employing r… ▽ More In recent years, legged and wheeled-legged robots have gained prominence for tasks in environments predominantly created for humans across various domains. One significant challenge faced by many of these robots is their limited capability to navigate stairs, which hampers their functionality in multi-story environments. This study proposes a method aimed at addressing this limitation, employing reinforcement learning to develop a versatile controller applicable to a wide range of robots. In contrast to the conventional velocity-based controllers, our approach builds upon a position-based formulation of the RL task, which we show to be vital for stair climbing. Furthermore, the methodology leverages an asymmetric actor-critic structure, enabling the utilization of privileged information from simulated environments during training while eliminating the reliance on exteroceptive sensors during real-world deployment. Another key feature of the proposed approach is the incorporation of a boolean observation within the controller, enabling the activation or deactivation of a stair-climbing mode. We present our results on different quadrupeds and bipedal robots in simulation and showcase how our method allows the balancing robot Ascento to climb 15cm stairs in the real world, a task that was previously impossible for this robot. △ Less

Submitted 8 February, 2024; originally announced February 2024.

Comments: Video: https://youtu.be/Ec6ar8BVJh4

arXiv:2402.01788 [pdf, other]

LitLLM: A Toolkit for Scientific Literature Review

Authors: Shubham Agarwal, Issam H. Laradji, Laurent Charlin, Christopher Pal

Abstract: Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual in… ▽ More Conducting literature reviews for scientific papers is essential for understanding research, its limitations, and building on existing work. It is a tedious task which makes an automatic literature review generator appealing. Unfortunately, many existing works that generate such reviews using Large Language Models (LLMs) have significant limitations. They tend to hallucinate-generate non-actual information-and ignore the latest research they have not been trained on. To address these limitations, we propose a toolkit that operates on Retrieval Augmented Generation (RAG) principles, specialized prompting and instructing techniques with the help of LLMs. Our system first initiates a web search to retrieve relevant papers by summarizing user-provided abstracts into keywords using an off-the-shelf LLM. Authors can enhance the search by supplementing it with relevant papers or keywords, contributing to a tailored retrieval process. Second, the system re-ranks the retrieved papers based on the user-provided abstract. Finally, the related work section is generated based on the re-ranked results and the abstract. There is a substantial reduction in time and effort for literature review compared to traditional methods, establishing our toolkit as an efficient alternative. Our open-source toolkit is accessible at https://github.com/shubhamagarwal92/LitLLM and Huggingface space (https://huggingface.co/spaces/shubhamagarwal92/LitLLM) with the video demo at https://youtu.be/E2ggOZBAFw0. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.13876 [pdf, other]

Capture the Flag: Uncovering Data Insights with Large Language Models

Authors: Issam Laradji, Perouz Taslakian, Sai Rajeswar, Valentina Zantedeschi, Alexandre Lacoste, Nicolas Chapados, David Vazquez, Christopher Pal, Alexandre Drouin

Abstract: The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. However, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasonin… ▽ More The extraction of a small number of relevant insights from vast amounts of data is a crucial component of data-driven decision-making. However, accomplishing this task requires considerable technical skills, domain expertise, and human labor. This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data, leveraging recent advances in reasoning and code generation techniques. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset. We further propose two proof-of-concept agents, with different inner workings, and compare their ability to capture such flags in a real-world sales dataset. While the work reported here is preliminary, our results are sufficiently interesting to mandate future exploration by the community. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 14 pages, 1 figure, Foundation Models for Decision Making Workshop at NeurIPS 2023

arXiv:2312.11556 [pdf, other]

StarVector: Generating Scalable Vector Graphics Code from Images

Authors: Juan A. Rodriguez, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, David Vazquez, Christopher Pal, Marco Pedersoli

Abstract: Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simple… ▽ More Scalable Vector Graphics (SVGs) have become integral in modern image rendering applications due to their infinite scalability in resolution, versatile usability, and editing capabilities. SVGs are particularly popular in the fields of web development and graphic design. Existing approaches for SVG modeling using deep learning often struggle with generating complex SVGs and are restricted to simpler ones that require extensive processing and simplification. This paper introduces StarVector, a multimodal SVG generation model that effectively integrates Code Generation Large Language Models (CodeLLMs) and vision models. Our approach utilizes a CLIP image encoder to extract visual representations from pixel-based images, which are then transformed into visual tokens via an adapter module. These visual tokens are pre-pended to the SVG token embeddings, and the sequence is modeled by the StarCoder model using next-token prediction, effectively learning to align the visual and code tokens. This enables StarVector to generate unrestricted SVGs that accurately represent pixel images. To evaluate StarVector's performance, we present SVG-Bench, a comprehensive benchmark for evaluating SVG methods across multiple datasets and relevant metrics. Within this benchmark, we introduce novel datasets including SVG-Stack, a large-scale dataset of real-world SVG examples, and use it to pre-train StarVector as a large foundation model for SVGs. Our results demonstrate significant enhancements in visual quality and complexity handling over current methods, marking a notable advancement in SVG generation technology. Code and models: https://github.com/joanrod/star-vector △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2310.13817 [pdf, other]

Deep Learning Based Forecasting-Aided State Estimation in Active Distribution Networks

Authors: Malek Alduhaymi, Ravindra Singh, Firdous Ul Nazir, Bikash C. Pal

Abstract: Operating an active distribution network (ADN) in the absence of enough measurements, the presence of distributed energy resources, and poor knowledge of responsive demand behaviour is a huge challenge. This paper introduces systematic modelling of demand response behaviour which is then included in Forecasting Aided State Estimation (FASE) for better control of the network. There are several inno… ▽ More Operating an active distribution network (ADN) in the absence of enough measurements, the presence of distributed energy resources, and poor knowledge of responsive demand behaviour is a huge challenge. This paper introduces systematic modelling of demand response behaviour which is then included in Forecasting Aided State Estimation (FASE) for better control of the network. There are several innovative elements in tuning parameters of FASE-based, demand profiling, and aggregation. The comprehensive case studies for three UK representative demand scenarios in 2023, 2035, and 2050 demonstrated the effectiveness of the proposed approach. △ Less

Submitted 20 October, 2023; originally announced October 2023.

arXiv:2310.11382 [pdf]

Condensate droplet roaming on nanostructured superhydrophobic surfaces

Authors: Cheuk Wing Edmond Lam, Kartik Regulagadda, Matteo Donati, Abinash Tripathy, Gopal Chandra Pal, Chander Shekhar Sharma, Athanasios Milionis, Dimos Poulikakos

Abstract: Jum** of coalescing condensate droplets from superhydrophobic surfaces is an interesting phenomenon which yields marked heat transfer enhancement over the more explored gravity-driven droplet removal mode in surface condensation, a phase change process of central interest to applications ranging from energy to water harvesting. However, when condensate microdroplets coalesce, they can also spont… ▽ More Jum** of coalescing condensate droplets from superhydrophobic surfaces is an interesting phenomenon which yields marked heat transfer enhancement over the more explored gravity-driven droplet removal mode in surface condensation, a phase change process of central interest to applications ranging from energy to water harvesting. However, when condensate microdroplets coalesce, they can also spontaneously propel themselves omnidirectionally on the surface independent of gravity and grow by feeding from droplets they sweep along the way. Here we observe and explain the physics behind this phenomenon of roaming of coalescing condensate microdroplets on solely nanostructured superhydrophobic surfaces, where the microdroplets are orders of magnitude larger than the underlaying surface nanotexture. We quantify and show that it is the inherent asymmetries in droplet adhesion during condensation, arising from the stochastic nature of nucleation within the nanostructures, that generates the tangential momentum driving the roaming motion. Subsequent dewetting during this conversion initiates a vivid roaming and successive coalescence process, preventing condensate flooding of the surface, and enhancing surface renewal. Finally, we show that the more efficient conversion process of roaming from excess surface energy to kinetic energy results in significantly improved heat transfer efficiency over condensate droplet jum**, the mechanism currently understood as maximum. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2309.11592 [pdf, other]

Parallel-mentoring for Offline Model-based Optimization

Authors: Can Chen, Christopher Beckham, Zixuan Liu, Xue Liu, Christopher Pal

Abstract: We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often re… ▽ More We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots and DNA sequences. A common approach trains a proxy on the static dataset to approximate the black-box objective function and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that: (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose \textit{parallel-mentoring} as an effective and novel method that facilitates mentoring among parallel proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case and our method consists of two modules. The first module, \textit{voting-based pairwise supervision}, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporate ranking supervision signals from all proxies and enable mutual mentoring. However, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an \textit{adaptive soft-labeling} module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here. △ Less

Submitted 10 October, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2308.01020 [pdf, other]

A Model Predictive Approach for Enhancing Transient Stability of Grid-Forming Converters

Authors: Ali Arjomandi-Nezhad, Yifei Guo, Bikash C. Pal, Damiano Varagnolo

Abstract: A model predictive control (MPC) method for enhancing post-fault transient stability of a grid-forming (GFM) inverter based resources (IBRs) is developed in this paper. This proposed controller is activated as soon as the converter enters into the post-fault current-saturation mode. It aims at mitigating the instability arising from insufficient deceleration due to current saturation and thus impr… ▽ More A model predictive control (MPC) method for enhancing post-fault transient stability of a grid-forming (GFM) inverter based resources (IBRs) is developed in this paper. This proposed controller is activated as soon as the converter enters into the post-fault current-saturation mode. It aims at mitigating the instability arising from insufficient deceleration due to current saturation and thus improving the transient stability of a GFM-IBR. The MPC approach optimises the post-fault trajectory of GFM IBRs by introducing appropriate corrective phase angle jumps and active power references where the post-fault dynamics of GFM IBRs are addressed. These two signals provide controllability over GFM IBR's post-fault trajectory. This paper addresses the mitigation of oscillations between current-saturation mode and normal mode by forced saturation if conditions for remaining in the normal mode do not hold. The performance of the proposal is tested via dynamic simulations under various grid conditions and compared with other existing strategies. The results demonstrate significant improvement in transient stability. △ Less

Submitted 8 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

Comments: 14 pages, 19 figures

arXiv:2306.09539 [pdf, other]

Block-State Transformers

Authors: Mahan Fathi, Jonathan Pilault, Orhan Firat, Christopher Pal, Pierre-Luc Bacon, Ross Goroshin

Abstract: State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks.… ▽ More State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed. △ Less

Submitted 30 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: NeurIPS'23 - Thirty-seventh Conference on Neural Information Processing Systems

arXiv:2306.04620 [pdf, other]

Goal-conditioned GFlowNets for Controllable Multi-Objective Molecular Design

Authors: Julien Roy, Pierre-Luc Bacon, Christopher Pal, Emmanuel Bengio

Abstract: In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound for pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn… ▽ More In recent years, in-silico molecular design has received much attention from the machine learning community. When designing a new compound for pharmaceutical applications, there are usually multiple properties of such molecules that need to be optimised: binding energy to the target, synthesizability, toxicity, EC50, and so on. While previous approaches have employed a scalarization scheme to turn the multi-objective problem into a preference-conditioned single objective, it has been established that this kind of reduction may produce solutions that tend to slide towards the extreme points of the objective space when presented with a problem that exhibits a concave Pareto front. In this work we experiment with an alternative formulation of goal-conditioned molecular generation to obtain a more controllable conditional model that can uniformly explore solutions along the entire Pareto front. △ Less

Submitted 29 June, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: 14 pages

arXiv:2306.01729 [pdf, other]

Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans

Authors: Stefania Raimondo, Christopher Pal, Xiaotian Liu, David Vazquez, Hector Palacios

Abstract: Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating helpful and fluent responses. However, for complex tasks one must also correctly do all of these things over multiple steps, and in a specific order. While large pre-trained language models can be fine-tuned end-to-end to create multi-… ▽ More Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating helpful and fluent responses. However, for complex tasks one must also correctly do all of these things over multiple steps, and in a specific order. While large pre-trained language models can be fine-tuned end-to-end to create multi-step task-oriented dialogue agents that generate fluent text, our experiments confirm that this approach alone cannot reliably perform new multi-step tasks that are unseen during training. To address these limitations, we augment the dialogue contexts given to \textmd{text2text} transformers with known \textit{valid workflow names} and \textit{action plans}. Action plans consist of sequences of actions required to accomplish a task, and are encoded as simple sequences of keywords (e.g. verify-identity, pull-up-account, reset-password, etc.). We perform extensive experiments on the Action-Based Conversations Dataset (ABCD) with T5-small, base and large models, and show that such models: a) are able to more readily generalize to unseen workflows by following the provided plan, and b) are able to generalize to executing unseen actions if they are provided in the plan. In contrast, models are unable to fully accomplish new multi-step tasks when they are not provided action plan information, even when given new valid workflow names. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.00637 [pdf, other]

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

Authors: Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

Abstract: We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly… ▽ More We introduce Würstchen, a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness for large-scale text-to-image diffusion models. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation used to guide the diffusion process. This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results. Our approach also improves the quality of text-conditioned image generation based on our user preference study. The training requirements of our approach consists of 24,602 A100-GPU hours - compared to Stable Diffusion 2.1's 200,000 GPU hours. Our approach also requires less training data to achieve these results. Furthermore, our compact latent representations allows us to perform inference over twice as fast, slashing the usual costs and carbon footprint of a state-of-the-art (SOTA) diffusion model significantly, without compromising the end performance. In a broader comparison against SOTA models our approach is substantially more efficient and compares favorably in terms of image quality. We believe that this work motivates more emphasis on the prioritization of both performance and computational accessibility. △ Less

Submitted 29 September, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Corresponding to "Würstchen v2"

Journal ref: The Twelfth International Conference on Learning Representations (ICLR), 2024

arXiv:2305.16397 [pdf, other]

Are Diffusion Models Vision-And-Language Reasoners?

Authors: Benno Krojer, Elinor Poole-Dayan, Vikram Voleti, Christopher Pal, Siva Reddy

Abstract: Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innov… ▽ More Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these diffusion-based generative models to automatic fine-grained quantitative evaluation of high-level phenomena such as compositionality. Towards this goal, we perform two innovations. First, we transform diffusion-based models (in our case, Stable Diffusion) for any image-text matching (ITM) task using a novel method called DiffusionITM. Second, we introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis. We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like like CLEVR and Winoground. We further boost its compositional performance with a transfer setup by fine-tuning on MS-COCO while retaining generative capabilities. We also measure the stereotypical bias in diffusion models, and find that Stable Diffusion 2.1 is, for the most part, less biased than Stable Diffusion 1.5. Overall, our results point in an exciting direction bringing discriminative and generative model evaluation closer. We will release code and benchmark setup soon. △ Less

Submitted 2 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Accepted to NeurIPS 2023

arXiv:2305.00970 [pdf, other]

ArK: Augmented Reality with Knowledge Interactive Emergent Ability

Authors: Qiuyuan Huang, Jae Sung Park, Abhinav Gupta, Paul Bennett, Ran Gong, Subhojit Som, Baolin Peng, Owais Khan Mohammed, Chris Pal, Ye** Choi, Jianfeng Gao

Abstract: Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite a… ▽ More Despite the growing adoption of mixed reality and interactive AI agents, it remains challenging for these systems to generate high quality 2D/3D scenes in unseen environments. The common practice requires deploying an AI agent to collect large amounts of data for model training for every new task. This process is costly, or even impossible, for many domains. In this study, we develop an infinite agent that learns to transfer knowledge memory from general foundation models (e.g. GPT4, DALLE) to novel domains or scenarios for scene understanding and generation in the physical or virtual world. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK), which leverages knowledge-memory to generate scenes in unseen physical world and virtual reality environments. The knowledge interactive emergent ability (Figure 1) is demonstrated as the observation learns i) micro-action of cross-modality: in multi-modality models to collect a large amount of relevant knowledge memory data for each interaction task (e.g., unseen scene understanding) from the physical reality; and ii) macro-behavior of reality-agnostic: in mix-reality environments to improve interactions that tailor to different characterized roles, target variables, collaborative information, and so on. We validate the effectiveness of ArK on the scene generation and editing tasks. We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes, compared to baselines, demonstrating the potential benefit of incorporating ArK in generative AI for applications such as metaverse and gaming simulation. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Report number: EFI-94-11

arXiv:2304.13722 [pdf, other]

Controllable Image Generation via Collage Representations

Authors: Arantxa Casanova, Marlène Careil, Adriana Romero-Soriano, Christopher J. Pal, Jakob Verbeek, Michal Drozdzal

Abstract: Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To enable fine-grained controllability, however, text-based models require long prompts, whose details may be ignored by the model. On the other hand, layout-based… ▽ More Recent advances in conditional generative image models have enabled impressive results. On the one hand, text-based conditional models have achieved remarkable generation quality, by leveraging large-scale datasets of image-text pairs. To enable fine-grained controllability, however, text-based models require long prompts, whose details may be ignored by the model. On the other hand, layout-based conditional models have also witnessed significant advances. These models rely on bounding boxes or segmentation maps for precise spatial conditioning in combination with coarse semantic labels. The semantic labels, however, cannot be used to express detailed appearance characteristics. In this paper, we approach fine-grained scene controllability through image collages which allow a rich visual description of the desired scene as well as the appearance and location of the objects therein, without the need of class nor attribute labels. We introduce "mixing and matching scenes" (M&Ms), an approach that consists of an adversarially trained generative image model which is conditioned on appearance features and spatial positions of the different elements in a collage, and integrates these into a coherent image. We train our model on the OpenImages (OI) dataset and evaluate it on collages derived from OI and MS-COCO datasets. Our experiments on the OI dataset show that M&Ms outperforms baselines in terms of fine-grained scene controllability while being very competitive in terms of image quality and sample diversity. On the MS-COCO dataset, we highlight the generalization ability of our model by outperforming DALL-E in terms of the zero-shot FID metric, despite using two magnitudes fewer parameters and data. Collage based generative models have the potential to advance content creation in an efficient and effective way as they are intuitive to use and yield high quality generations. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.05330 [pdf]

Controlled coalescence-induced droplet jum** on flexible superhydrophobic substrates

Authors: Gopal Chandra Pal, Siddharth SS, Manish Agarwal, Chander Shekhar Sharma

Abstract: Sessile droplets coalescing on superhydrophobic substrates spontaneously jump from the surface. In this process, the excess surface energy available at the initiation of coalescence overcomes the minimal surface adhesion and manifests as sufficient kinetic energy to propel the droplets away from the substrate. Here, we show that the coalescence induced droplet jum** velocity is significantly cur… ▽ More Sessile droplets coalescing on superhydrophobic substrates spontaneously jump from the surface. In this process, the excess surface energy available at the initiation of coalescence overcomes the minimal surface adhesion and manifests as sufficient kinetic energy to propel the droplets away from the substrate. Here, we show that the coalescence induced droplet jum** velocity is significantly curtailed if the superhydrophobic substrate is flexible in nature. Through detailed experimental measurements and numerical simulations, we demonstrate that the droplet jum** velocity and jum** height can be reduced by as much as 40 % and 64%, respectively, by synergistically tuning the substrate stiffness and substrate frequency. We show that this hitherto unexplored aspect of droplet coalescence jum** can be gainfully exploited in water harvesting from dew and fog harvesting. Additionally, through an exemplar butterfly wing substrate, we demonstrate that this effect is likely to manifest on many natural superhydrophobic substrates due to their inherent flexibility. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2304.03866 [pdf, other]

Conservative objective models are a special kind of contrastive divergence-based energy model

Authors: Christopher Beckham, Christopher Pal

Abstract: In this work we theoretically show that conservative objective models (COMs) for offline model-based optimisation (MBO) are a special kind of contrastive divergence-based energy model, one where the energy function represents both the unconditional probability of the input and the conditional probability of the reward variable. While the initial formulation only samples modes from its learned dist… ▽ More In this work we theoretically show that conservative objective models (COMs) for offline model-based optimisation (MBO) are a special kind of contrastive divergence-based energy model, one where the energy function represents both the unconditional probability of the input and the conditional probability of the reward variable. While the initial formulation only samples modes from its learned distribution, we propose a simple fix that replaces its gradient ascent sampler with a Langevin MCMC sampler. This gives rise to a special probabilistic model where the probability of sampling an input is proportional to its predicted reward. Lastly, we show that better samples can be obtained if the model is decoupled so that the unconditional and conditional probabilities are modelled separately. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2302.07400 [pdf, other]

Score-based Diffusion Models in Function Space

Authors: Jae Hyun Lim, Nikola B. Kovachki, Ricardo Baptista, Christopher Beckham, Kamyar Azizzadenesheli, Jean Kossaifi, Vikram Voleti, Jiaming Song, Karsten Kreis, Jan Kautz, Christopher Pal, Arash Vahdat, Anima Anandkumar

Abstract: Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many… ▽ More Diffusion models have recently emerged as a powerful framework for generative modeling. They consist of a forward process that perturbs input data with Gaussian white noise and a reverse process that learns a score function to generate samples by denoising. Despite their tremendous success, they are mostly formulated on finite-dimensional spaces, e.g. Euclidean, limiting their applications to many domains where the data has a functional form such as in scientific computing and 3D geometric data analysis. In this work, we introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. In DDOs, the forward process perturbs input functions gradually using a Gaussian process. The generative process is formulated by integrating a function-valued Langevin dynamic. Our approach requires an appropriate notion of the score for the perturbed data distribution, which we obtain by generalizing denoising score matching to function spaces that can be infinite-dimensional. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost that is independent of the data resolution. We theoretically and numerically verify the applicability of our approach on a set of problems, including generating solutions to the Navier-Stokes equation viewed as the push-forward distribution of forcings from a Gaussian Random Field (GRF). △ Less

Submitted 22 November, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: 52 pages

MSC Class: 46B09 (Primary); 60J22 (Secondary) ACM Class: I.2.6; J.2

arXiv:2302.05507 [pdf, other]

Language Decision Transformers with Exponential Tilt for Interactive Text Environments

Authors: Nicolas Gontier, Pau Rodriguez, Issam Laradji, David Vazquez, Christopher Pal

Abstract: Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt… ▽ More Text-based game environments are challenging because agents must deal with long sequences of text, execute compositional actions using text and learn from sparse rewards. We address these challenges by proposing Language Decision Transformers (LDTs), a framework that is based on transformer language models and decision transformers (DTs). Our LDTs extend DTs with 3 components: (1) exponential tilt to guide the agent towards high obtainable goals, (2) novel goal conditioning methods yielding better results than the traditional return-to-go (sum of all future rewards), and (3) a model of future observations that improves agent performance. LDTs are the first to address offline RL with DTs on these challenging games. Our experiments show that LDTs achieve the highest scores among many different types of agents on some of the most challenging Jericho games, such as Enchanter. △ Less

Submitted 17 November, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: 19 pages, 6 figures, 5 tables

arXiv:2212.01639 [pdf, other]

Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests

Authors: Christopher Beckham, Martin Weiss, Florian Golemo, Sina Honari, Derek Nowrouzezahrai, Christopher Pal

Abstract: Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a… ▽ More Different types of mental rotation tests have been used extensively in psychology to understand human visual reasoning and perception. Understanding what an object or visual scene would look like from another viewpoint is a challenging problem that is made even harder if it must be performed from a single image. We explore a controlled setting whereby questions are posed about the properties of a scene if that scene was observed from another viewpoint. To do this we have created a new version of the CLEVR dataset that we call CLEVR Mental Rotation Tests (CLEVR-MRT). Using CLEVR-MRT we examine standard methods, show how they fall short, then explore novel neural architectures that involve inferring volumetric representations of a scene. These volumes can be manipulated via camera-conditioned transformations to answer the question. We examine the efficacy of different model variants through rigorous ablations and demonstrate the efficacy of volumetric representations. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: Accepted for publication to Pattern Recognition journal

arXiv:2211.14487 [pdf, other]

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

Authors: Mats L. Richter, Christopher Pal

Abstract: Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experi… ▽ More Minimal changes to neural architectures (e.g. changing a single hyperparameter in a key layer), can lead to significant gains in predictive performance in Convolutional Neural Networks (CNNs). In this work, we present a new approach to receptive field analysis that can yield these types of theoretical and empirical performance gains across twenty well-known CNN architectures examined in our experiments. By further develo** and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model. This allows us to optimize the parameter-efficiency of a given architecture at low cost. Our method is computationally simple and can be done in an automated manner or even manually with minimal effort for most common architectures. We demonstrate the effectiveness of this approach by increasing parameter efficiency across past and current top-performing CNN-architectures. Specifically, our approach is able to improve ImageNet1K performance across a wide range of well-known, state-of-the-art (SOTA) model classes, including: VGG Nets, MobileNetV1, MobileNetV3, NASNet A (mobile), MnasNet, EfficientNet, and ConvNeXt - leading to a new SOTA result for each model class. △ Less

Submitted 26 November, 2022; originally announced November 2022.

arXiv:2211.10747 [pdf, other]

Exploring validation metrics for offline model-based optimisation with diffusion models

Authors: Christopher Beckham, Alexandre Piche, David Vazquez, Christopher Pal

Abstract: In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle, which is expensive to compute since it involves executing a real world process. In offline MBO we wish to do so without assuming access to such an oracle during training or validation, with mak… ▽ More In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle, which is expensive to compute since it involves executing a real world process. In offline MBO we wish to do so without assuming access to such an oracle during training or validation, with makes evaluation non-straightforward. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. Measuring the mean reward of generated candidates over this approximation is one such `validation metric', whereas we are interested in a more fundamental question which is finding which validation metrics correlate the most with the ground truth. This involves proposing validation metrics and quantifying them over many datasets for which the ground truth is known, for instance simulated environments. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation, which is the ultimate goal behind leveraging generative models for MBO. While our evaluation framework is model agnostic we specifically evaluate denoising diffusion models due to their state-of-the-art performance, as well as derive interesting insights such as ranking the most effective validation metrics as well as discussing important hyperparameters. △ Less

Submitted 13 January, 2024; v1 submitted 19 November, 2022; originally announced November 2022.

arXiv:2211.02348 [pdf, other]

A General Purpose Neural Architecture for Geospatial Systems

Authors: Nasim Rahaman, Martin Weiss, Frederik Träuble, Francesco Locatello, Alexandre Lacoste, Yoshua Bengio, Chris Pal, Li Erran Li, Bernhard Schölkopf

Abstract: Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications. However, collaboration between these actors is difficult due to the heterogeneous nature of geospatial data modalities (e.g., multi-spectral images of various resolutions, timeseries, weather data) and diversity of tasks… ▽ More Geospatial Information Systems are used by researchers and Humanitarian Assistance and Disaster Response (HADR) practitioners to support a wide variety of important applications. However, collaboration between these actors is difficult due to the heterogeneous nature of geospatial data modalities (e.g., multi-spectral images of various resolutions, timeseries, weather data) and diversity of tasks (e.g., regression of human activity indicators or detecting forest fires). In this work, we present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias, pre-trained on large amounts of unlabelled earth observation data in a self-supervised manner. We envision how such a model may facilitate cooperation between members of the community. We show preliminary results on the first step of the roadmap, where we instantiate an architecture that can process a wide variety of geospatial data modalities and demonstrate that it can achieve competitive performance with domain-specific architectures on tasks relating to the U.N.'s Sustainable Development Goals. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: Presented at AI + HADR Workshop at NeurIPS 2022

arXiv:2211.01233 [pdf, other]

Attention-based Neural Cellular Automata

Authors: Mattie Tesfaldet, Derek Nowrouzezahrai, Christopher Pal

Abstract: Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of $\textit{attention-based}$ NCAs formed using a spatially localized$\unicode{x2014}$yet globally organized… ▽ More Recent extensions of Cellular Automata (CA) have incorporated key ideas from modern deep learning, dramatically extending their capabilities and catalyzing a new family of Neural Cellular Automata (NCA) techniques. Inspired by Transformer-based architectures, our work presents a new class of $\textit{attention-based}$ NCAs formed using a spatially localized$\unicode{x2014}$yet globally organized$\unicode{x2014}$self-attention scheme. We introduce an instance of this class named $\textit{Vision Transformer Cellular Automata}$ (ViTCA). We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing ViTCA to a U-Net, a U-Net-based CA baseline (UNetCA), and a Vision Transformer (ViT). When comparing across architectures configured to similar parameter complexity, ViTCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of ViTCA, an analysis of its effect on cell states, and an investigation on its inductive biases. Finally, we examine its learned representations via linear probes on its converged cell state hidden representations, yielding, on average, superior results when compared to our U-Net, ViT, and UNetCA baselines. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: NeurIPS 2022

arXiv:2210.15251 [pdf, other]

Optimal control for production inventory system with various cost criterion

Authors: Subrata Golui, Chandan Pal, Manikandan R., Abhay Sobhanan

Abstract: In this article, we investigate a dynamic control problem of a production-inventory system. Here, demands arrive at the production unit according to a Poisson process and are processed in an FCFS manner. The processing time of the customers' demand is the exponential distribution. The production manufacturers produce the items on a make-to-order basis to meet customer demands. The production is ru… ▽ More In this article, we investigate a dynamic control problem of a production-inventory system. Here, demands arrive at the production unit according to a Poisson process and are processed in an FCFS manner. The processing time of the customers' demand is the exponential distribution. The production manufacturers produce the items on a make-to-order basis to meet customer demands. The production is run until the inventory level becomes sufficiently large. We assume that an item's production time follows exponential distribution and the amount of time for the produced item to reach the retail shop is negligible. Also, we assume that no new customer joins the queue when there is a void inventory. This yields an explicit product-form solution for the steady-state probability vector of the system. The optimal policy that minimizes the discounted/average/pathwise average total cost per production is derived using a Markov decision process approach. We find optimal policy using value/policy iteration algorithms. Numerical examples are discussed to verify the proposed algorithms. △ Less

Submitted 27 October, 2022; originally announced October 2022.

Comments: 5 figures

MSC Class: 93E20 (Primary) 49L20; 60J27 (Secondary)

arXiv:2210.12282

Bridging the Gap Between Target Networks and Functional Regularization

Authors: Alexandre Piche, Valentin Thomas, Joseph Marino, Rafael Pardinas, Gian Maria Marconi, Christopher Pal, Mohammad Emtiyaz Khan

Abstract: Bootstrap** is behind much of the successes of Deep Reinforcement Learning. However, learning the value function via bootstrap** often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the opti… ▽ More Bootstrap** is behind much of the successes of Deep Reinforcement Learning. However, learning the value function via bootstrap** often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer. This regularizer has disadvantages such as being inflexible and non convex. To overcome these issues, we propose an explicit Functional Regularization that is a convex regularizer in function space and can easily be tuned. We analyze the convergence of our method theoretically and empirically demonstrate that replacing Target Networks with the more theoretically grounded Functional Regularization approach leads to better sample efficiency and performance improvements. △ Less

Submitted 3 January, 2024; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: The published version of this paper (TMLR 2023) is available at arXiv:2106.02613 and https://openreview.net/forum?id=BFvoemrmqX

arXiv:2210.12272 [pdf, other]

Implicit Offline Reinforcement Learning via Supervised Learning

Authors: Alexandre Piche, Rafael Pardinas, David Vazquez, Igor Mordatch, Chris Pal

Abstract: Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels. It is as simple as supervised learning and Behavior Cloning (BC), but takes advantage of return information. On datasets collected by policies of similar expertise, implicit BC has been shown to match or outperform exp… ▽ More Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels. It is as simple as supervised learning and Behavior Cloning (BC), but takes advantage of return information. On datasets collected by policies of similar expertise, implicit BC has been shown to match or outperform explicit BC. Despite the benefits of using implicit models to learn robotic skills via BC, offline RL via Supervised Learning algorithms have been limited to explicit models. We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets. Furthermore, we show the close relationship between our implicit methods and other popular RL via Supervised Learning algorithms to provide a unified framework. Finally, we demonstrate the effectiveness of our method on high-dimension manipulation and locomotion tasks. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.12254 [pdf, other]

Score-based Denoising Diffusion with Non-Isotropic Gaussian Noise Models

Authors: Vikram Voleti, Christopher Pal, Adam Oberman

Abstract: Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that is now possible to create with neural generative models. However, most contemporary state-of-the-art methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where non-isotropic Gaussian distributions are used.… ▽ More Generative models based on denoising diffusion techniques have led to an unprecedented increase in the quality and diversity of imagery that is now possible to create with neural generative models. However, most contemporary state-of-the-art methods are derived from a standard isotropic Gaussian formulation. In this work we examine the situation where non-isotropic Gaussian distributions are used. We present the key mathematical derivations for creating denoising diffusion models using an underlying non-isotropic Gaussian noise model. We also provide initial experiments with the CIFAR-10 dataset to help verify empirically that this more general modeling approach can also yield high-quality samples. △ Less

Submitted 22 November, 2022; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: NeurIPS 2022 Workshop ; 4 pages, 1 page of references, 18 pages of appendix, 2 figures

Journal ref: NeurIPS 2022 Workshop on Score-Based Methods

arXiv:2210.08031 [pdf, other]

Neural Attentive Circuits

Authors: Nasim Rahaman, Martin Weiss, Francesco Locatello, Chris Pal, Yoshua Bengio, Bernhard Schölkopf, Li Erran Li, Nicolas Ballas

Abstract: Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data us… ▽ More Recent work has seen the development of general purpose neural architectures that can be trained to perform tasks across diverse data modalities. General purpose models typically make few assumptions about the underlying data-structure and are known to perform well in the large-data regime. At the same time, there has been growing interest in modular neural architectures that represent the data using sparsely interacting modules. These models can be more robust out-of-distribution, computationally efficient, and capable of sample-efficient adaptation to new data. However, they tend to make domain-specific assumptions about the data, and present challenges in how module behavior (i.e., parameterization) and connectivity (i.e., their layout) can be jointly learned. In this work, we introduce a general purpose, yet modular neural architecture called Neural Attentive Circuits (NACs) that jointly learns the parameterization and a sparse connectivity of neural modules without using domain knowledge. NACs are best understood as the combination of two systems that are jointly trained end-to-end: one that determines the module configuration and the other that executes it on an input. We demonstrate qualitatively that NACs learn diverse and meaningful module configurations on the NLVR2 dataset without additional supervision. Quantitatively, we show that by incorporating modularity in this way, NACs improve upon a strong non-modular baseline in terms of low-shot adaptation on CIFAR and CUBs dataset by about 10%, and OOD robustness on Tiny ImageNet-R by about 2.5%. Further, we find that NACs can achieve an 8x speedup at inference time while losing less than 3% performance. Finally, we find NACs to yield competitive results on diverse data modalities spanning point-cloud classification, symbolic processing and text-classification from ASCII bytes, thereby confirming its general purpose nature. △ Less

Submitted 19 October, 2022; v1 submitted 14 October, 2022; originally announced October 2022.

Comments: To appear at NeurIPS 2022

arXiv:2210.07453 [pdf, ps, other]

Using Graph Algorithms to Pretrain Graph Completion Transformers

Authors: Jonathan Pilault, Michael Galkin, Bahare Fatemi, Perouz Taslakian, David Vasquez, Christopher Pal

Abstract: Recent work on Graph Neural Networks has demonstrated that self-supervised pretraining can further enhance performance on downstream graph, link, and node classification tasks. However, the efficacy of pretraining tasks has not been fully investigated for downstream large knowledge graph completion tasks. Using a contextualized knowledge graph embedding approach, we investigate five different pret… ▽ More Recent work on Graph Neural Networks has demonstrated that self-supervised pretraining can further enhance performance on downstream graph, link, and node classification tasks. However, the efficacy of pretraining tasks has not been fully investigated for downstream large knowledge graph completion tasks. Using a contextualized knowledge graph embedding approach, we investigate five different pretraining signals, constructed using several graph algorithms and no external data, as well as their combination. We leverage the versatility of our Transformer-based model to explore graph structure generation pretraining tasks (i.e. path and k-hop neighborhood generation), typically inapplicable to most graph embedding methods. We further propose a new path-finding algorithm guided by information gain and find that it is the best-performing pretraining task across three downstream knowledge graph completion datasets. While using our new path-finding algorithm as a pretraining signal provides 2-3% MRR improvements, we show that pretraining on all signals together gives the best knowledge graph completion results. In a multitask setting that combines all pretraining tasks, our method surpasses the latest and strong performing knowledge graph embedding methods on all metrics for FB15K-237, on MRR and Hit@1 for WN18RRand on MRR and hit@10 for JF17K (a knowledge hypergraph dataset). △ Less

Submitted 27 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2208.08274 [pdf, other]

SMPL-IK: Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows

Authors: Vikram Voleti, Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Louis-Simon Ménard, Christopher Pal

Abstract: Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We cal… ▽ More Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, pose authoring can be made more flexible with SMPL-IK by allowing users to modify gender and body shape while posing a character. Additionally, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Finally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2208.02377 [pdf, other]

Improving Meta-Learning Generalization with Activation-Based Early-Stop**

Authors: Simon Guiroy, Christopher Pal, Gonçalo Mordido, Sarath Chandar

Abstract: Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stop** is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stop** mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a me… ▽ More Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stop** is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stop** mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a meta-validation set drawn from the training (source) dataset. This is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset (OOD) and can potentially have a large distributional shift with the meta-validation set. In this work, we propose Activation Based Early-stop** (ABE), an alternative to using validation-based early-stop** for meta-learning. Specifically, we analyze the evolution, during meta-training, of the neural activations at each hidden layer, on a small set of unlabelled support examples from a single task of the target tasks distribution, as this constitutes a minimal and justifiably accessible information from the target problem. Our experiments show that simple, label agnostic statistics on the activations offer an effective way to estimate how the target generalization evolves over time. At each hidden layer, we characterize the activation distributions, from their first and second order moments, then further summarized along the feature dimensions, resulting in a compact yet intuitive characterization in a four-dimensional space. Detecting when, throughout training time, and at which layer, the target activation trajectory diverges from the activation trajectory of the source data, allows us to perform early-stop** and improve generalization in a large array of few-shot transfer learning settings, across different algorithms, source and target datasets. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted at CoLLAs 2022. To be published in Proceedings of Machine Learning Research (PMLR)

arXiv:2206.12067 [pdf, ps, other]

Nonzero-Sum Risk-Sensitive Stochastic Differential Games: A Multi-parameter Eigenvalue Problem Approach

Authors: Mrinal K. Ghosh, K. Suresh Kumar, Chandan Pal, Somnath Pradhan

Abstract: We study nonzero-sum stochastic differential games with risk-sensitive ergodic cost criterion. Under certain conditions, using multi-parameter eigenvalue approach, we establish the existence of a Nash equilibrium in the space of stationary Markov strategies. We achieve our results by studying the relevant systems of coupled HJB equations. Exploiting the stochastic representation of the principal e… ▽ More We study nonzero-sum stochastic differential games with risk-sensitive ergodic cost criterion. Under certain conditions, using multi-parameter eigenvalue approach, we establish the existence of a Nash equilibrium in the space of stationary Markov strategies. We achieve our results by studying the relevant systems of coupled HJB equations. Exploiting the stochastic representation of the principal eigenfunctions we completely characterize Nash equilibrium points in the space of stationary Markov strategies. △ Less

Submitted 24 June, 2022; originally announced June 2022.

arXiv:2206.09962 [pdf, other]

Model-Free Optimal Control of Inverter for Dynamic Voltage Support

Authors: Yifei Guo, Bikash C. Pal, Rabih A. Jabr

Abstract: Inverter-based resources (IBRs) are required to provide dynamic voltage support (DVS) during voltage dips to enhance the low-voltage ride-through capability. In this paper, we develop a model-free control method to achieve the optimal DVS (ODVS) without relying on the knowledge of grid parameters. Delving into the optimum trajectory of the ODVS problem, it is found that either the current constrai… ▽ More Inverter-based resources (IBRs) are required to provide dynamic voltage support (DVS) during voltage dips to enhance the low-voltage ride-through capability. In this paper, we develop a model-free control method to achieve the optimal DVS (ODVS) without relying on the knowledge of grid parameters. Delving into the optimum trajectory of the ODVS problem, it is found that either the current constraint and the maximum active power constraint of IBRs are binding or one of the constraints is binding. This inspires us to search for the optimum in a closed-loop way by a perturb-and-observe (P&O)-based optimum seeking (OS) controller with either the power factor angle or the reactive current being the manipulated (perturbed) variable. The system is guaranteed to converge asymptotically to the optimum provided the stepsize sequence is diminishing and non-summable. The proposed model-free optimal control is finally implemented within a single-stage photovoltaic (PV) system, where dynamic simulations demonstrate the optimal and fast DVS performance △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2205.11690 [pdf, other]

Workflow Discovery from Dialogues in the Low Data Regime

Authors: Amine El Hattami, Stefania Raimondo, Issam Laradji, David Vazquez, Pau Rodriguez, Chris Pal

Abstract: Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of hel** clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow ma… ▽ More Text-based dialogues are now widely used to solve real-world problems. In cases where solution strategies are already known, they can sometimes be codified into workflows and used to guide humans or artificial agents through the task of hel** clients. We introduce a new problem formulation that we call Workflow Discovery (WD) in which we are interested in the situation where a formal workflow may not yet exist. Still, we wish to discover the set of actions that have been taken to resolve a particular problem. We also examine a sequence-to-sequence (Seq2Seq) approach for this novel task. We present experiments where we extract workflows from dialogues in the Action-Based Conversations Dataset (ABCD). Since the ABCD dialogues follow known workflows to guide agents, we can evaluate our ability to extract such workflows using ground truth sequences of actions. We propose and evaluate an approach that conditions models on the set of possible actions, and we show that using this strategy, we can improve WD performance. Our conditioning approach also improves zero-shot and few-shot WD performance when transferring learned models to unseen domains within and across datasets. Further, on ABCD a modified variant of our Seq2Seq method achieves state-of-the-art performance on related but different problems of Action State Tracking (AST) and Cascading Dialogue Success (CDS) across many evaluation metrics. △ Less

Submitted 11 February, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.09853 [pdf, other]

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Authors: Vikram Voleti, Alexia Jolicoeur-Martineau, Christopher Pal

Abstract: Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a ge… ▽ More Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction frameworks are typically not capable of simultaneously handling other video-related tasks such as unconditional generation or interpolation. In this work, we devise a general-purpose framework called Masked Conditional Video Diffusion (MCVD) for all of these video synthesis tasks using a probabilistic conditional score-based denoising diffusion model, conditioned on past and/or future frames. We train the model in a manner where we randomly and independently mask all the past frames or all the future frames. This novel but straightforward setup allows us to train a single model that is capable of executing a broad range of video tasks, specifically: future/past prediction -- when only future/past frames are masked; unconditional generation -- when both past and future frames are masked; and interpolation -- when neither past nor future frames are masked. Our experiments show that this approach can generate high-quality frames for diverse types of videos. Our MCVD models are built from simple non-recurrent 2D-convolutional architectures, conditioning on blocks of frames and generating blocks of frames. We generate videos of arbitrary lengths autoregressively in a block-wise manner. Our approach yields SOTA results across standard video prediction and interpolation benchmarks, with computation times for training models measured in 1-12 days using $\le$ 4 GPUs. Project page: https://mask-cond-video-diffusion.github.io ; Code : https://github.com/voletiv/mcvd-pytorch △ Less

Submitted 12 October, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

Comments: NeurIPS 2022 ; 10 pages, 4 figures, 7 tables

arXiv:2203.16662 [pdf, other]

Overcoming challenges in leveraging GANs for few-shot data augmentation

Authors: Christopher Beckham, Issam Laradji, Pau Rodriguez, David Vazquez, Derek Nowrouzezahrai, Christopher Pal

Abstract: In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues re… ▽ More In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems. △ Less

Submitted 8 August, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: v3 of the paper, various changes including better figures, CIFAR-100 results, and precision-recall metrics

arXiv:2201.03790 [pdf, ps, other]

Discrete-time Zero-Sum Games for Markov chains with risk-sensitive average cost criterion

Authors: Mrinal K. Ghosh, Subrata Golui, Chandan Pal, Somnath Pradhan

Abstract: We study zero-sum stochastic games for controlled discrete time Markov chains with risk-sensitive average cost criterion with countable state space and Borel action spaces. The payoff function is nonnegative and possibly unbounded. Under a certain Lyapunov stability assumption on the dynamics, we establish the existence of a value and saddle point equilibrium. Further we completely characterize al… ▽ More We study zero-sum stochastic games for controlled discrete time Markov chains with risk-sensitive average cost criterion with countable state space and Borel action spaces. The payoff function is nonnegative and possibly unbounded. Under a certain Lyapunov stability assumption on the dynamics, we establish the existence of a value and saddle point equilibrium. Further we completely characterize all possible saddle point strategies in the class of stationary Markov strategies. Finally, we present and analyze an illustrative example. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 28 pages

MSC Class: 91A15; 91A25

arXiv:2201.01787 [pdf, other]

Does Entity Abstraction Help Generative Transformers Reason?

Authors: Nicolas Gontier, Siva Reddy, Christopher Pal

Abstract: We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requiring different forms of logical reasoning: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (ProofWriter), (3) multi-hop question answering (HotpotQA), and (4) conversational question answering (CoQ… ▽ More We study the utility of incorporating entity type abstractions into pre-trained Transformers and test these methods on four NLP tasks requiring different forms of logical reasoning: (1) compositional language understanding with text-based relational reasoning (CLUTRR), (2) abductive reasoning (ProofWriter), (3) multi-hop question answering (HotpotQA), and (4) conversational question answering (CoQA). We propose and empirically explore three ways to add such abstraction: (i) as additional input embeddings, (ii) as a separate sequence to encode, and (iii) as an auxiliary prediction task for the model. Overall, our analysis demonstrates that models with abstract entity knowledge performs better than without it. The best abstraction aware models achieved an overall accuracy of 88.8% and 91.8% compared to the baseline model achieving 62.9% and 89.8% on CLUTRR and ProofWriter respectively. However, for HotpotQA and CoQA, we find that F1 scores improve by only 0.5% on average. Our results suggest that the benefit of explicit abstraction is significant in formally defined logical reasoning settings requiring many reasoning hops, but point to the notion that it is less beneficial for NLP tasks having less formal logical structure. △ Less

Submitted 21 November, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: TMLR 2022; 28 pages; 9 tables; 1 figure

Showing 1–50 of 156 results for author: Pal, C