-
LiveBench: A Challenging, Contamination-Free LLM Benchmark
Authors:
Colin White,
Samuel Dooley,
Manley Roberts,
Arka Pal,
Ben Feuer,
Siddhartha Jain,
Ravid Shwartz-Ziv,
Neel Jain,
Khalid Saifullah,
Siddartha Naidu,
Chinmay Hegde,
Yann LeCun,
Tom Goldstein,
Willie Neiswanger,
Micah Goldblum
Abstract:
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In…
▽ More
Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In this work, we introduce a new benchmark for LLMs designed to be immune to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. We release LiveBench, the first benchmark that (1) contains frequently-updated questions from recent information sources, (2) scores answers automatically according to objective ground-truth values, and (3) contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis. To achieve this, LiveBench contains questions that are based on recently-released math competitions, arXiv papers, news articles, and datasets, and it contains harder, contamination-free versions of tasks from previous benchmarks such as Big-Bench Hard, AMPS, and IFEval. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 110B in size. LiveBench is difficult, with top models achieving below 65% accuracy. We release all questions, code, and model answers. Questions will be added and updated on a monthly basis, and we will release new tasks and harder versions of tasks over time so that LiveBench can distinguish between the capabilities of LLMs as they improve in the future. We welcome community engagement and collaboration for expanding the benchmark tasks and models.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
ALINA: Advanced Line Identification and Notation Algorithm
Authors:
Mohammed Abdul Hafeez Khan,
Parth Ganeriwala,
Siddhartha Bhattacharyya,
Natasha Neogi,
Raja Muthalagu
Abstract:
Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annot…
▽ More
Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annotation framework, Advanced Line Identification and Notation Algorithm (ALINA), which can be used for labeling taxiway datasets that consist of different camera perspectives and variable weather attributes (sunny and cloudy). Additionally, the CIRCular threshoLd pixEl Discovery And Traversal (CIRCLEDAT) algorithm has been proposed, which is an integral step in determining the pixels corresponding to taxiway line markings. Once the pixels are identified, ALINA generates corresponding pixel coordinate annotations on the frame. Using this approach, 60,249 frames from the taxiway dataset, AssistTaxi have been labeled. To evaluate the performance, a context-based edge map (CBEM) set was generated manually based on edge features and connectivity. The detection rate after testing the annotated labels with the CBEM set was recorded as 98.45%, attesting its dependability and effectiveness.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies
Authors:
Junlin Wang,
Siddhartha Jain,
Dejiao Zhang,
Baishakhi Ray,
Varun Kumar,
Ben Athiwaratkun
Abstract:
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces…
▽ More
A diverse array of reasoning strategies has been proposed to elicit the capabilities of large language models. However, in this paper, we point out that traditional evaluations which focus solely on performance metrics miss a key factor: the increased effectiveness due to additional compute. By overlooking this aspect, a skewed view of strategy efficiency is often presented. This paper introduces a framework that incorporates the compute budget into the evaluation, providing a more informative comparison that takes into account both performance metrics and computational cost. In this budget-aware perspective, we find that complex reasoning strategies often don't surpass simpler baselines purely due to algorithmic ingenuity, but rather due to the larger computational resources allocated. When we provide a simple baseline like chain-of-thought self-consistency with comparable compute resources, it frequently outperforms reasoning strategies proposed in the literature. In this scale-aware perspective, we find that unlike self-consistency, certain strategies such as multi-agent debate or Reflexion can become worse if more compute budget is utilized.
△ Less
Submitted 14 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
Online Fair Allocation of Perishable Resources
Authors:
Siddhartha Banerjee,
Chamsi Hssaine,
Sean R. Sinclair
Abstract:
We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of perishable resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. The goal is to construct a sequence of allocations that…
▽ More
We consider a practically motivated variant of the canonical online fair allocation problem: a decision-maker has a budget of perishable resources to allocate over a fixed number of rounds. Each round sees a random number of arrivals, and the decision-maker must commit to an allocation for these individuals before moving on to the next round. The goal is to construct a sequence of allocations that is envy-free and efficient. Our work makes two important contributions toward this problem: we first derive strong lower bounds on the optimal envy-efficiency trade-off that demonstrate that a decision-maker is fundamentally limited in what she can hope to achieve relative to the no-perishing setting; we then design an algorithm achieving these lower bounds which takes as input $(i)$ a prediction of the perishing order, and $(ii)$ a desired bound on envy. Given the remaining budget in each period, the algorithm uses forecasts of future demand and perishing to adaptively choose one of two carefully constructed guardrail quantities. We demonstrate our algorithm's strong numerical performance - and state-of-the-art, perishing-agnostic algorithms' inefficacy - on simulations calibrated to a real-world dataset.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels
Authors:
Abhay Deshpande,
Liyiming Ke,
Quinn Pfeifer,
Abhishek Gupta,
Siddhartha S. Srinivasa
Abstract:
We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by l…
▽ More
We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by learning a locally continuous dynamics model from demonstrations to guide the agent back toward expert states. Through extensive experiments on peg insertion and fine gras**, we provide the first empirical validation that CCIL can significantly improve imitation learning performance despite discontinuities present in contact-rich manipulation. We find that: (1) real-world manipulation exhibits sufficient local smoothness to apply CCIL, (2) generated corrective labels are most beneficial in low-data regimes, and (3) label filtering based on estimated dynamics model error enables performance gains. To effectively apply CCIL to robotic domains, we offer a practical instantiation of the framework and insights into design choices and hyperparameter selection. Our work demonstrates CCIL's practicality for alleviating compounding errors in imitation learning on physical robots.
△ Less
Submitted 3 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Poseidon: Efficient Foundation Models for PDEs
Authors:
Maximilian Herde,
Bogdan Raonić,
Tobias Rohner,
Roger Käppeli,
Roberto Molinaro,
Emmanuel de Bézenac,
Siddhartha Mishra
Abstract:
We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on…
▽ More
We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that Poseidon exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. Poseidon also generalizes very well to new physics that is not seen during pretraining. Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks. Taken together, our results showcase the surprising ability of Poseidon to learn effective representations from a very small set of PDEs during pretraining in order to generalize well to unseen and unrelated PDEs downstream, demonstrating its potential as an effective, general purpose PDE foundation model. Finally, the Poseidon model as well as underlying pretraining and downstream datasets are open sourced, with code being available at https://github.com/camlab-ethz/poseidon and pretrained models and datasets at https://huggingface.co/camlab-ethz.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
A Patient-Specific Framework for Autonomous Spinal Fixation via a Steerable Drilling Robot
Authors:
Susheela Sharma,
Sarah Go,
Zeynep Yakay,
Yash Kulkarni,
Siddhartha Kapuria,
Jordan P. Amadio,
Reza Rajebi,
Mohsen Khadem,
Nassir Navab,
Farshid Alambeigi
Abstract:
In this paper, with the goal of enhancing the minimally invasive spinal fixation procedure in osteoporotic patients, we propose a first-of-its-kind image-guided robotic framework for performing an autonomous and patient-specific procedure using a unique concentric tube steerable drilling robot (CT-SDR). Particularly, leveraging a CT-SDR, we introduce the concept of J-shape drilling based on a pre-…
▽ More
In this paper, with the goal of enhancing the minimally invasive spinal fixation procedure in osteoporotic patients, we propose a first-of-its-kind image-guided robotic framework for performing an autonomous and patient-specific procedure using a unique concentric tube steerable drilling robot (CT-SDR). Particularly, leveraging a CT-SDR, we introduce the concept of J-shape drilling based on a pre-operative trajectory planned in CT scan of a patient followed by appropriate calibration, registration, and navigation steps to safely execute this trajectory in real-time using our unique robotic setup. To thoroughly evaluate the performance of our framework, we performed several experiments on two different vertebral phantoms designed based on CT scan of real patients.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect
Authors:
Siddhartha K. Vemuri,
Raj Sanjay Shah,
Sashank Varma
Abstract:
How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin).…
▽ More
How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin). Recent research looking for human-like typicality effects in language and vision models has focused on models of a single modality, tested only a small number of concepts, and found only modest correlations with human typicality ratings. The current study expands this behavioral evaluation of models by considering a broader range of language (N = 8) and vision (N = 10) model architectures. It also evaluates whether the combined typicality predictions of vision + language model pairs, as well as a multimodal CLIP-based model, are better aligned with human typicality judgments than those of models of either modality alone. Finally, it evaluates the models across a broader range of concepts (N = 27) than prior studies. There were three important findings. First, language models better align with human typicality judgments than vision models. Second, combined language and vision models (e.g., AlexNet + MiniLM) better predict the human typicality data than the best-performing language model (i.e., MiniLM) or vision model (i.e., ViT-Huge) alone. Third, multimodal models (i.e., CLIP ViT) show promise for explaining human typicality judgments. These results advance the state-of-the-art in aligning the conceptual representations of ML models and humans. A methodological contribution is the creation of a new image set for testing the conceptual alignment of vision models.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
DiffuBox: Refining 3D Object Detection with Point Diffusion
Authors:
Xiangyu Chen,
Zhenzhen Liu,
Katie Z Luo,
Siddhartha Datta,
Adhitya Polavaram,
Yan Wang,
Yurong You,
Boyi Li,
Marco Pavone,
Wei-Lun Chao,
Mark Campbell,
Bharath Hariharan,
Kilian Q. Weinberger
Abstract:
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-…
▽ More
Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box's location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
AGS-GNN: Attribute-guided Sampling for Graph Neural Networks
Authors:
Siddhartha Shankar Das,
S M Ferdous,
Mahantesh M Halappanavar,
Edoardo Serra,
Alex Pothen
Abstract:
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) Wh…
▽ More
We propose AGS-GNN, a novel attribute-guided sampling algorithm for Graph Neural Networks (GNNs) that exploits node features and connectivity structure of a graph while simultaneously adapting for both homophily and heterophily in graphs. (In homophilic graphs vertices of the same class are more likely to be connected, and vertices of different classes tend to be linked in heterophilic graphs.) While GNNs have been successfully applied to homophilic graphs, their application to heterophilic graphs remains challenging. The best-performing GNNs for heterophilic graphs do not fit the sampling paradigm, suffer high computational costs, and are not inductive. We employ samplers based on feature-similarity and feature-diversity to select subsets of neighbors for a node, and adaptively capture information from homophilic and heterophilic neighborhoods using dual channels. Currently, AGS-GNN is the only algorithm that we know of that explicitly controls homophily in the sampled subgraph through similar and diverse neighborhood samples. For diverse neighborhood sampling, we employ submodularity, which was not used in this context prior to our work. The sampling distribution is pre-computed and highly parallel, achieving the desired scalability. Using an extensive dataset consisting of 35 small ($\le$ 100K nodes) and large (>100K nodes) homophilic and heterophilic graphs, we demonstrate the superiority of AGS-GNN compare to the current approaches in the literature. AGS-GNN achieves comparable test accuracy to the best-performing heterophilic GNNs, even outperforming methods using the entire graph for node classification. AGS-GNN also converges faster compared to methods that sample neighborhoods randomly, and can be incorporated into existing GNN models that employ node or graph sampling.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
FUSE: Fast Unified Simulation and Estimation for PDEs
Authors:
Levi E. Lingsch,
Dana Grund,
Siddhartha Mishra,
Georgios Kissas
Abstract:
The joint prediction of continuous fields and statistical estimation of the underlying discrete parameters is a common problem for many physical systems, governed by PDEs. Hitherto, it has been separately addressed by employing operator learning surrogates for field prediction while using simulation-based inference (and its variants) for statistical parameter determination. Here, we argue that sol…
▽ More
The joint prediction of continuous fields and statistical estimation of the underlying discrete parameters is a common problem for many physical systems, governed by PDEs. Hitherto, it has been separately addressed by employing operator learning surrogates for field prediction while using simulation-based inference (and its variants) for statistical parameter determination. Here, we argue that solving both problems within the same framework can lead to consistent gains in accuracy and robustness. To this end, We propose a novel and flexible formulation of the operator learning problem that allows jointly predicting continuous quantities and inferring distributions of discrete parameters, and thus amortizing the cost of both the inverse and the surrogate models to a joint pre-training step. We present the capabilities of the proposed methodology for predicting continuous and discrete biomarkers in full-body haemodynamics simulations under different levels of missing information. We also consider a test case for atmospheric large-eddy simulation of a two-dimensional dry cold bubble, where we infer both continuous time-series and information about the systems conditions. We present comparisons against different baselines to showcase significantly increased accuracy in both the inverse and the surrogate tasks.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Designing Skill-Compatible AI: Methodologies and Frameworks in Chess
Authors:
Karim Hamade,
Reid McIlroy-Young,
Siddhartha Sen,
Jon Kleinberg,
Ashton Anderson
Abstract:
Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achie…
▽ More
Powerful artificial intelligence systems are often used in settings where they must interact with agents that are computationally much weaker, for example when they work alongside humans or operate in complex environments where some tasks are handled by algorithms, heuristics, or other entities of varying computational power. For AI agents to successfully interact in these settings, however, achieving superhuman performance alone is not sufficient; they also need to account for suboptimal actions or idiosyncratic style from their less-skilled counterparts. We propose a formal evaluation framework for assessing the compatibility of near-optimal AI with interaction partners who may have much lower levels of skill; we use popular collaborative chess variants as model systems to study and develop AI agents that can successfully interact with lower-skill entities. Traditional chess engines designed to output near-optimal moves prove to be inadequate partners when paired with engines of various lower skill levels in this domain, as they are not designed to consider the presence of other agents. We contribute three methodologies to explicitly create skill-compatible AI agents in complex decision-making settings, and two chess game frameworks designed to foster collaboration between powerful AI agents and less-skilled partners. On these frameworks, our agents outperform state-of-the-art chess AI (based on AlphaZero) despite being weaker in conventional chess, demonstrating that skill-compatibility is a tangible trait that is qualitatively and measurably distinct from raw performance. Our evaluations further explore and clarify the mechanisms by which our agents achieve skill-compatibility.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
ContextQ: Generated Questions to Support Meaningful Parent-Child Dialogue While Co-Reading
Authors:
Griffin Dietz Smith,
Siddhartha Prasad,
Matt J. Davidson,
Leah Findlater,
R. Benjamin Shapiro
Abstract:
Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated d…
▽ More
Much of early literacy education happens at home with caretakers reading books to young children. Prior research demonstrates how having dialogue with children during co-reading can develop critical reading readiness skills, but most adult readers are unsure if and how to lead effective conversations. We present ContextQ, a tablet-based reading application to unobtrusively present auto-generated dialogic questions to caretakers to support this dialogic reading practice. An ablation study demonstrates how our method of encoding educator expertise into the question generation pipeline can produce high-quality output; and through a user study with 12 parent-child dyads (child age: 4-6), we demonstrate that this system can serve as a guide for parents in leading contextually meaningful dialogue, leading to significantly more conversational turns from both the parent and the child and deeper conversations with connections to the child's everyday life.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Authors:
**hyuk Lee,
Zhuyun Dai,
Xiaoqi Ren,
Blair Chen,
Daniel Cer,
Jeremy R. Cole,
Kai Hui,
Michael Boratko,
Rajvi Kapadia,
Wen Ding,
Yi Luan,
Sai Meher Karthik Duddu,
Gustavo Hernandez Abrego,
Weiqiang Shi,
Nithi Gupta,
Aditya Kusupati,
Prateek Jain,
Siddhartha Reddy Jonnalagadda,
Ming-Wei Chang,
Iftekhar Naim
Abstract:
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each…
▽ More
We present Gecko, a compact and versatile text embedding model. Gecko achieves strong retrieval performance by leveraging a key idea: distilling knowledge from large language models (LLMs) into a retriever. Our two-step distillation process begins with generating diverse, synthetic paired data using an LLM. Next, we further refine the data quality by retrieving a set of candidate passages for each query, and relabeling the positive and hard negative passages using the same LLM. The effectiveness of our approach is demonstrated by the compactness of the Gecko. On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size. Gecko with 768 embedding dimensions achieves an average score of 66.31, competing with 7x larger models and 5x higher dimensional embeddings.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
Learning to Infer Generative Template Programs for Visual Concepts
Authors:
R. Kenny Jones,
Siddhartha Chaudhuri,
Daniel Ritchie
Abstract:
People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related ta…
▽ More
People grasp flexible visual concepts from a few examples. We explore a neurosymbolic system that learns how to infer programs that capture visual concepts in a domain-general fashion. We introduce Template Programs: programmatic expressions from a domain-specific language that specify structural and parametric patterns common to an input concept. Our framework supports multiple concept-related tasks, including few-shot generation and co-segmentation through parsing. We develop a learning paradigm that allows us to train networks that infer Template Programs directly from visual datasets that contain concept grou**s. We run experiments across multiple visual domains: 2D layouts, Omniglot characters, and 3D shapes. We find that our method outperforms task-specific alternatives, and performs competitively against domain-specific approaches for the limited domains where they exist.
△ Less
Submitted 9 June, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Multi-Sample Long Range Path Planning under Sensing Uncertainty for Off-Road Autonomous Driving
Authors:
Matt Schmittle,
Rohan Baijal,
Brian Hou,
Siddhartha Srinivasa,
Byron Boots
Abstract:
We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is determinization, where one converts a stochastic world into a deterministic one and plans under this simplification. Thi…
▽ More
We focus on the problem of long-range dynamic replanning for off-road autonomous vehicles, where a robot plans paths through a previously unobserved environment while continuously receiving noisy local observations. An effective approach for planning under sensing uncertainty is determinization, where one converts a stochastic world into a deterministic one and plans under this simplification. This makes the planning problem tractable, but the cost of following the planned path in the real world may be different than in the determinized world. This causes collisions if the determinized world optimistically ignores obstacles, or causes unnecessarily long routes if the determinized world pessimistically imagines more obstacles. We aim to be robust to uncertainty over potential worlds while still achieving the efficiency benefits of determinization. We evaluate algorithms for dynamic replanning on a large real-world dataset of challenging long-range planning problems from the DARPA RACER program. Our method, Dynamic Replanning via Evaluating and Aggregating Multiple Samples (DREAMS), outperforms other determinization-based approaches in terms of combined traversal time and collision cost. https://sites.google.com/cs.washington.edu/dreams/
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Redundancy Transmission in UAV-Aided LoRa Networks Featuring Wake-Up Radios
Authors:
Siddhartha S. Borkotoky
Abstract:
We consider a LoRa sensor network featuring a UAV-mounted gateway for collecting sensor data (messages). Wake-up radios (WuR) are employed to inform the sensors of the UAV's arrival. Building on an existing random access scheme for such setups, we propose and evaluate two redundancy transmission protocols for enhancing the reliability of the data transfer. One protocol employs fountain-coded trans…
▽ More
We consider a LoRa sensor network featuring a UAV-mounted gateway for collecting sensor data (messages). Wake-up radios (WuR) are employed to inform the sensors of the UAV's arrival. Building on an existing random access scheme for such setups, we propose and evaluate two redundancy transmission protocols for enhancing the reliability of the data transfer. One protocol employs fountain-coded transmissions, whereas the other performs message replication. Our results illustrate how redundancy transmission can be beneficial under the time constraints imposed by the UAV's limited hovering duration, and how node density, hovering time, and sensor energy budget impact the performance of the schemes.
△ Less
Submitted 26 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
Authors:
Yu Yang,
Siddhartha Mishra,
Jeffrey N Chiang,
Baharan Mirzasoleiman
Abstract:
Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), whic…
▽ More
Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), which leverages training trajectories from small models to guide the data selection for larger models. We demonstrate through extensive experiments that S2L significantly improves data efficiency in SFT for mathematical problem-solving, reducing the training data to just 11% of the original MathInstruct dataset (Yue et al., 2023) to match full dataset performance while outperforming state-of-the-art data selection algorithms by an average of 4.7% across 6 in- and out-domain evaluation datasets. Remarkably, selecting only 50K data for SFT, S2L achieves a 32.7% accuracy on the most challenging MATH (Hendrycks et al., 2021) benchmark, improving Phi-2 (Li et al., 2023b) by 16.6%. In clinical text summarization on the MIMIC-III dataset (Johnson et al., 2016), S2L again outperforms training on the full dataset using only 50% of the data. Notably, S2L can perform data selection using a reference model 40x smaller than the target model, proportionally reducing the cost of data selection.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
An Adaptable, Safe, and Portable Robot-Assisted Feeding System
Authors:
Ethan Kroll Gordon,
Rajat Kumar Jenamani,
Amal Nanavati,
Ziang Liu,
Haya Bolotski,
Raida Karim,
Daniel Stabile,
Atharva Kashyap,
Bernie Hao Zhu,
Xilai Dai,
Tyler Schrenk,
Jonathan Ko,
Taylor Kessler Faulkner,
Tapomayukh Bhattacharjee,
Siddhartha Srinivasa
Abstract:
We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For…
▽ More
We demonstrate a robot-assisted feeding system that enables people with mobility impairments to feed themselves. Our system design embodies Safety, Portability, and User Control, with comprehensive full-stack safety checks, the ability to be mounted on and powered by any powered wheelchair, and a custom web-app allowing care-recipients to leverage their own assistive devices for robot control. For bite acquisition, we leverage multi-modal online learning to tractably adapt to unseen food types. For bite transfer, we leverage real-time mouth perception and interaction-aware control. Co-designed with community researchers, our system has been validated through multiple end-user studies.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Multiple Ways of Working with Users to Develop Physically Assistive Robots
Authors:
Amal Nanavati,
Max Pascher,
Vinitha Ranganeni,
Ethan K. Gordon,
Taylor Kessler Faulkner,
Siddhartha S. Srinivasa,
Maya Cakmak,
Patrícia Alves-Oliveira,
Jens Gerken
Abstract:
Despite the growth of physically assistive robotics (PAR) research over the last decade, nearly half of PAR user studies do not involve participants with the target disabilities. There are several reasons for this -- recruitment challenges, small sample sizes, and transportation logistics -- all influenced by systemic barriers that people with disabilities face. However, it is well-established tha…
▽ More
Despite the growth of physically assistive robotics (PAR) research over the last decade, nearly half of PAR user studies do not involve participants with the target disabilities. There are several reasons for this -- recruitment challenges, small sample sizes, and transportation logistics -- all influenced by systemic barriers that people with disabilities face. However, it is well-established that working with end-users results in technology that better addresses their needs and integrates with their lived circumstances. In this paper, we reflect on multiple approaches we have taken to working with people with motor impairments across the design, development, and evaluation of three PAR projects: (a) assistive feeding with a robot arm; (b) assistive teleoperation with a mobile manipulator; and (c) shared control with a robot arm. We discuss these approaches to working with users along three dimensions -- individual vs. community-level insight, logistic burden on end-users vs. researchers, and benefit to researchers vs. community -- and share recommendations for how other PAR researchers can incorporate users into their work.
△ Less
Submitted 7 March, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
The SMART approach to instance-optimal online learning
Authors:
Siddhartha Banerjee,
Alankrita Bhatt,
Christina Lee Yu
Abstract:
We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on…
▽ More
We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor $e/(e-1) \approx 1.58$ of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical `best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a $1.43$-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a ``small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis
Authors:
Dmitry Petrov,
Pradyumn Goyal,
Vikas Thamizharasan,
Vladimir G. Kim,
Matheus Gadelha,
Melinos Averkiou,
Siddhartha Chaudhuri,
Evangelos Kalogerakis
Abstract:
We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a s…
▽ More
We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a skeleton-driven neural implicit formulation. The neural implicit takes into account the topological and geometric information stored in the generated skeleton representations to yield surfaces that are more topologically and geometrically accurate compared to previous neural field formulations. We discuss applications of our method in shape synthesis and point cloud reconstruction tasks, and evaluate our method both qualitatively and quantitatively. We demonstrate significantly more faithful surface reconstruction and diverse shape generation results compared to the state-of-the-art, also involving challenging scenarios of reconstructing and synthesizing structurally complex, high-genus shape surfaces from Thingi10K and ShapeNet.
△ Less
Submitted 10 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Numerical analysis of physics-informed neural networks and related models in physics-informed machine learning
Authors:
Tim De Ryck,
Siddhartha Mishra
Abstract:
Physics-informed neural networks (PINNs) and their variants have been very popular in recent years as algorithms for the numerical simulation of both forward and inverse problems for partial differential equations. This article aims to provide a comprehensive review of currently available results on the numerical analysis of PINNs and related models that constitute the backbone of physics-informed…
▽ More
Physics-informed neural networks (PINNs) and their variants have been very popular in recent years as algorithms for the numerical simulation of both forward and inverse problems for partial differential equations. This article aims to provide a comprehensive review of currently available results on the numerical analysis of PINNs and related models that constitute the backbone of physics-informed machine learning. We provide a unified framework in which analysis of the various components of the error incurred by PINNs in approximating PDEs can be effectively carried out. A detailed review of available results on approximation, generalization and training errors and their behavior with respect to the type of the PDE and the dimension of the underlying domain is presented. In particular, the role of the regularity of the solutions and their stability to perturbations in the error analysis is elucidated. Numerical results are also presented to illustrate the theory. We identify training errors as a key bottleneck which can adversely affect the overall performance of various models in physics-informed machine learning.
△ Less
Submitted 30 January, 2024;
originally announced February 2024.
-
Online Feature Updates Improve Online (Generalized) Label Shift Adaptation
Authors:
Ruihan Wu,
Siddhartha Datta,
Yi Su,
Dheeraj Baby,
Yu-Xiang Wang,
Kilian Q. Weinberger
Abstract:
This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. O…
▽ More
This paper addresses the prevalent issue of label shift in an online setting with missing labels, where data distributions change over time and obtaining timely labels is challenging. While existing methods primarily focus on adjusting or updating the final layer of a pre-trained classifier, we explore the untapped potential of enhancing feature representations using unlabeled data at test-time. Our novel method, Online Label Shift adaptation with Online Feature Updates (OLS-OFU), leverages self-supervised learning to refine the feature extraction process, thereby improving the prediction model. Theoretical analyses confirm that OLS-OFU reduces algorithmic regret by capitalizing on self-supervised learning for feature refinement. Empirical studies on various datasets, under both online label shift and generalized label shift conditions, underscore the effectiveness and robustness of OLS-OFU, especially in cases of domain shifts.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
The Matrix: A Bayesian learning model for LLMs
Authors:
Siddhartha Dalal,
Vishal Misra
Abstract:
In this paper, we introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs). We explore the optimization metric of LLMs, which is based on predicting the next token, and develop a novel model grounded in this principle. Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior, and…
▽ More
In this paper, we introduce a Bayesian learning model to understand the behavior of Large Language Models (LLMs). We explore the optimization metric of LLMs, which is based on predicting the next token, and develop a novel model grounded in this principle. Our approach involves constructing an ideal generative text model represented by a multinomial transition probability matrix with a prior, and we examine how LLMs approximate this matrix. We discuss the continuity of the map** between embeddings and multinomial distributions, and present the Dirichlet approximation theorem to approximate any prior. Additionally, we demonstrate how text generation by LLMs aligns with Bayesian learning principles and delve into the implications for in-context learning, specifically explaining why in-context learning emerges in larger models where prompts are considered as samples to be updated. Our findings indicate that the behavior of LLMs is consistent with Bayesian Learning, offering new insights into their functioning and potential applications.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM
Authors:
Gabriel Ryan,
Siddhartha Jain,
Mingyue Shang,
Shiqi Wang,
Xiaofei Ma,
Murali Krishna Ramanathan,
Baishakhi Ray
Abstract:
Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs,…
▽ More
Testing plays a pivotal role in ensuring software quality, yet conventional Search Based Software Testing (SBST) methods often struggle with complex software units, achieving suboptimal test coverage. Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance. As a result LLM-generated testsuites still suffer from low coverage. In this paper, we present SymPrompt, a code-aware prompting strategy for LLMs in test generation. SymPrompt's approach is based on recent work that demonstrates LLMs can solve more complex logical problems when prompted to reason about the problem in a multi-step fashion. We apply this methodology to test generation by deconstructing the testsuite generation process into a multi-stage sequence, each of which is driven by a specific prompt aligned with the execution paths of the method under test, and exposing relevant type and dependency focal context to the model. Our approach enables pretrained LLMs to generate more complete test cases without any additional training. We implement SymPrompt using the TreeSitter parsing framework and evaluate on a benchmark challenging methods from open source Python projects. SymPrompt enhances correct test generations by a factor of 5 and bolsters relative coverage by 26% for CodeGen2. Notably, when applied to GPT-4, SymPrompt improves coverage by over 2x compared to baseline prompting strategies.
△ Less
Submitted 2 April, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Carry Your Fault: A Fault Propagation Attack on Side-Channel Protected LWE-based KEM
Authors:
Suparna Kundu,
Siddhartha Chowdhury,
Sayandeep Saha,
Angshuman Karmakar,
Debdeep Mukhopadhyay,
Ingrid Verbauwhede
Abstract:
Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes - passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further a…
▽ More
Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes - passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further attack surfaces is, therefore, an important step for eventually securing the deployment of these algorithms. Also, it is important to test the robustness of the already proposed countermeasures in this regard. In this work, we propose a new fault attack on side-channel secure masked implementation of LWE-based key-encapsulation mechanisms (KEMs) exploiting fault propagation. The attack typically originates due to an algorithmic modification widely used to enable masking, namely the Arithmetic-to-Boolean (A2B) conversion. We exploit the data dependency of the adder carry chain in A2B and extract sensitive information, albeit masking (of arbitrary order) being present. As a practical demonstration of the exploitability of this information leakage, we show key recovery attacks of Kyber, although the leakage also exists for other schemes like Saber. The attack on Kyber targets the decapsulation module and utilizes Belief Propagation (BP) for key recovery. To the best of our knowledge, it is the first attack exploiting an algorithmic component introduced to ease masking rather than only exploiting the randomness introduced by masking to obtain desired faults (as done by Delvaux). Finally, we performed both simulated and electromagnetic (EM) fault-based practical validation of the attack for an open-source first-order secure Kyber implementation running on an STM32 platform.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
On Pigeonhole Principles and Ramsey in TFNP
Authors:
Siddhartha Jain,
Jiawei Li,
Robert Robere,
Zhiyang Xun
Abstract:
The generalized pigeonhole principle says that if tN + 1 pigeons are put into N holes then there must be a hole containing at least t + 1 pigeons. Let t-PPP denote the class of all total NP-search problems reducible to finding such a t-collision of pigeons. We introduce a new hierarchy of classes defined by the problems t-PPP. In addition to being natural problems in TFNP, we show that classes in…
▽ More
The generalized pigeonhole principle says that if tN + 1 pigeons are put into N holes then there must be a hole containing at least t + 1 pigeons. Let t-PPP denote the class of all total NP-search problems reducible to finding such a t-collision of pigeons. We introduce a new hierarchy of classes defined by the problems t-PPP. In addition to being natural problems in TFNP, we show that classes in and above the hierarchy are related to the notion of multi-collision resistance in cryptography, and contain the problem underlying the breakthrough average-case quantum advantage result shown by Yamakawa & Zhandry (FOCS 2022).
Finally, we give lower bound techniques for the black-box versions of t-PPP for any t. In particular, we prove that RAMSEY is not in t-PPP, for any t that is sub-polynomial in log (N), in the black-box setting. Goldberg and Papadimitriou conjectured that RAMSEY reduces to 2-PPP, we thus refute it and more in the black-box setting. We also provide an ensemble of black-box separations which resolve the relative complexity of the t-PPP classes with other well-known TFNP classes.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Conceptual Mutation Testing for Student Programming Misconceptions
Authors:
Siddhartha Prasad,
Ben Greenman,
Tim Nelson,
Shriram Krishnamurthi
Abstract:
Context: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations -- analogou…
▽ More
Context: Students often misunderstand programming problem descriptions. This can lead them to solve the wrong problem, which creates frustration, obstructs learning, and imperils grades. Researchers have found that students can be made to better understand the problem by writing examples before they start programming. These examples are checked against correct and wrong implementations -- analogous to mutation testing -- provided by course staff. Doing so results in better student understanding of the problem as well as better test suites to accompany the program, both of which are desirable educational outcomes.
Inquiry: Producing mutant implementations requires care. If there are too many, or they are too obscure, students will end up spending a lot of time on an unproductive task and also become frustrated. Instead, we want a small number of mutants that each correspond to common problem misconceptions. This paper presents a workflow with partial automation to produce mutants of this form which, notably, are not those produced by mutation-testing tools.
Approach: We comb through student tests that fail a correct implementation. The student misconceptions are embedded in these failures. We then use methods to semantically cluster these failures. These clusters are then translated into conceptual mutants. These can then be run against student data to determine whether we they are better than prior methods. Some of these processes also enjoy automation.
Knowledge: We find that student misconceptions illustrated by failing tests can be operationalized by the above process. The resulting mutants do much better at identifying student misconceptions.
Grounding: Our findings are grounded in a manual analysis of student examples and a quantitative evaluation of both our clustering techniques and our process for making conceptual mutants. The clustering evaluation compares against a ground truth using standard cluster-correspondence measures, while the mutant evaluation examines how conceptual mutants perform against student data.
Importance: Our work contributes a workflow, with some automation, to reduce the cost and increase the effectiveness of generating conceptually interesting mutants. Such mutants can both improve learning outcomes and reduce student frustration, leading to better educational outcomes. In the process, we also identify a variation of mutation testing not commonly discussed in the software literature.
△ Less
Submitted 28 December, 2023;
originally announced January 2024.
-
Gradient-based Planning with World Models
Authors:
Jyothir S V,
Siddhartha Jalagam,
Yann LeCun,
Vlad Sobal
Abstract:
The enduring challenge in the field of artificial intelligence has been the control of systems to achieve desired behaviours. While for systems governed by straightforward dynamics equations, methods like Linear Quadratic Regulation (LQR) have historically proven highly effective, most real-world tasks, which require a general problem-solver, demand world models with dynamics that cannot be easily…
▽ More
The enduring challenge in the field of artificial intelligence has been the control of systems to achieve desired behaviours. While for systems governed by straightforward dynamics equations, methods like Linear Quadratic Regulation (LQR) have historically proven highly effective, most real-world tasks, which require a general problem-solver, demand world models with dynamics that cannot be easily described by simple equations. Consequently, these models must be learned from data using neural networks. Most model predictive control (MPC) algorithms designed for visual world models have traditionally explored gradient-free population-based optimisation methods, such as Cross Entropy and Model Predictive Path Integral (MPPI) for planning. However, we present an exploration of a gradient-based alternative that fully leverages the differentiability of the world model. In our study, we conduct a comparative analysis between our method and other MPC-based alternatives, as well as policy-based algorithms. In a sample-efficient setting, our method achieves on par or superior performance compared to the alternative approaches in most tasks. Additionally, we introduce a hybrid model that combines policy networks and gradient-based MPC, which outperforms pure policy based methods thereby holding promise for Gradient-based planning with world models in complex real-world tasks.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Prompt Expansion for Adaptive Text-to-Image Generation
Authors:
Siddhartha Datta,
Alexander Ku,
Deepak Ramachandran,
Peter Anderson
Abstract:
Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such t…
▽ More
Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.
△ Less
Submitted 27 December, 2023;
originally announced December 2023.
-
A universal approximation theorem for nonlinear resistive networks
Authors:
Benjamin Scellier,
Siddhartha Mishra
Abstract:
Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the cir…
▽ More
Resistor networks have recently had a surge of interest as substrates for energy-efficient self-learning machines. This work studies the computational capabilities of these resistor networks. We show that electrical networks composed of voltage sources, linear resistors, diodes and voltage-controlled voltage sources (VCVS) can implement any continuous functions. To prove it, we assume that the circuit elements are ideal and that the conductances of variable resistors and the amplification factors of the VCVS's can take arbitrary values -- arbitrarily small or arbitrarily large. The constructive nature of our proof could also inform the design of such self-learning electrical networks.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Conceptualizing Suicidal Behavior: Utilizing Explanations of Predicted Outcomes to Analyze Longitudinal Social Media Data
Authors:
Van Minh Nguyen,
Nasheen Nur,
William Stern,
Thomas Mercer,
Chiradeep Sen,
Siddhartha Bhattacharyya,
Victor Tumbiolo,
Seng Jhing Goh
Abstract:
The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar d…
▽ More
The COVID-19 pandemic has escalated mental health crises worldwide, with social isolation and economic instability contributing to a rise in suicidal behavior. Suicide can result from social factors such as shame, abuse, abandonment, and mental health conditions like depression, Post-Traumatic Stress Disorder (PTSD), Attention-Deficit/Hyperactivity Disorder (ADHD), anxiety disorders, and bipolar disorders. As these conditions develop, signs of suicidal ideation may manifest in social media interactions. Analyzing social media data using artificial intelligence (AI) techniques can help identify patterns of suicidal behavior, providing invaluable insights for suicide prevention agencies, professionals, and broader community awareness initiatives. Machine learning algorithms for this purpose require large volumes of accurately labeled data. Previous research has not fully explored the potential of incorporating explanations in analyzing and labeling longitudinal social media data. In this study, we employed a model explanation method, Layer Integrated Gradients, on top of a fine-tuned state-of-the-art language model, to assign each token from Reddit users' posts an attribution score for predicting suicidal ideation. By extracting and analyzing attributions of tokens from the data, we propose a methodology for preliminary screening of social media posts for suicidal ideation without using large language models during inference.
△ Less
Submitted 30 December, 2023; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Clinical Risk Prediction Using Language Models: Benefits And Considerations
Authors:
Angeela Acharya,
Sulabh Shrestha,
Anyi Chen,
Joseph Conte,
Sanja Avramovic,
Siddhartha Sikdar,
Antonios Anastasopoulos,
Sanmay Das
Abstract:
The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learni…
▽ More
The utilization of Electronic Health Records (EHRs) for clinical risk prediction is on the rise. However, strict privacy regulations limit access to comprehensive health records, making it challenging to apply standard machine learning algorithms in practical real-world scenarios. Previous research has addressed this data limitation by incorporating medical ontologies and employing transfer learning methods. In this study, we investigate the potential of leveraging language models (LMs) as a means to incorporate supplementary domain knowledge for improving the performance of various EHR-based risk prediction tasks. Unlike applying LMs to unstructured EHR data such as clinical notes, this study focuses on using textual descriptions within structured EHR to make predictions exclusively based on that information. We extensively compare against previous approaches across various data types and sizes. We find that employing LMs to represent structured EHRs, such as diagnostic histories, leads to improved or at least comparable performance in diverse risk prediction tasks. Furthermore, LM-based approaches offer numerous advantages, including few-shot learning, the capability to handle previously unseen medical concepts, and adaptability to various medical vocabularies. Nevertheless, we underscore, through various experiments, the importance of being cautious when employing such models, as concerns regarding the reliability of LMs persist.
△ Less
Submitted 28 November, 2023;
originally announced December 2023.
-
Improved Algorithms for Minimum-Membership Geometric Set Cover
Authors:
Sathish Govindarajan,
Siddhartha Sarkar
Abstract:
Bandyapadhyay et al. introduced the generalized minimum-membership geometric set cover (GMMGSC) problem [SoCG, 2023], which is defined as follows. We are given two sets $P$ and $P'$ of points in $\mathbb{R}^{2}$, $n=\max(|P|, |P'|)$, and a set $\mathcal{S}$ of $m$ axis-parallel unit squares. The goal is to find a subset $\mathcal{S}^{*}\subseteq \mathcal{S}$ that covers all the points in $P$ while…
▽ More
Bandyapadhyay et al. introduced the generalized minimum-membership geometric set cover (GMMGSC) problem [SoCG, 2023], which is defined as follows. We are given two sets $P$ and $P'$ of points in $\mathbb{R}^{2}$, $n=\max(|P|, |P'|)$, and a set $\mathcal{S}$ of $m$ axis-parallel unit squares. The goal is to find a subset $\mathcal{S}^{*}\subseteq \mathcal{S}$ that covers all the points in $P$ while minimizing $\mathsf{memb}(P', \mathcal{S}^{*})$, where $\mathsf{memb}(P', \mathcal{S}^{*})=\max_{p\in P'}|\{s\in \mathcal{S}^{*}: p\in s\}|$. We study GMMGSC problem and give a $16$-approximation algorithm that runs in $O(m^2\log m + m^2n)$ time. Our result is a significant improvement to the $144$-approximation given by Bandyapadhyay et al. that runs in $\tilde{O}(nm)$ time.
GMMGSC problem is a generalization of another well-studied problem called Minimum Ply Geometric Set Cover (MPGSC), in which the goal is to minimize the ply of $\mathcal{S}^{*}$, where the ply is the maximum cardinality of a subset of the unit squares that have a non-empty intersection. The best-known result for the MPGSC problem is an $8$-approximation algorithm by Durocher et al. that runs in $O(n + m^{8}k^{4}\log k + m^{8}\log m\log k)$ time, where $k$ is the optimal ply value [WALCOM, 2023].
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Authors:
Lukas Schäfer,
Logan Jones,
Anssi Kanervisto,
Yuhan Cao,
Tabish Rashid,
Raluca Georgescu,
Dave Bignell,
Siddhartha Sen,
Andrea Treviño Gavito,
Sam Devlin
Abstract:
Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it…
▽ More
Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it is currently unclear which of these models have learnt representations that retain information critical for sequential decision making. Towards enabling wider participation in the research of gameplaying agents in modern games, we present a systematic study of imitation learning with publicly available visual encoders compared to the typical, task-specific, end-to-end training approach in Minecraft, Minecraft Dungeons and Counter-Strike: Global Offensive.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
HOUND: An Open-Source, Low-cost Research Platform for High-speed Off-road Underactuated Nonholonomic Driving
Authors:
Sidharth Talia,
Matt Schmittle,
Alexander Lambert,
Alexander Spitzer,
Christoforos Mavrogiannis,
Siddhartha S. Srinivasa
Abstract:
Off-road vehicles are susceptible to rollovers in terrains with large elevation features, such as steep hills, ditches, and berms. One way to protect them against rollovers is ruggedization through the use of industrial-grade parts and physical modifications. However, this solution can be prohibitively expensive for academic research labs. Our key insight is that a software-based rollover-preventi…
▽ More
Off-road vehicles are susceptible to rollovers in terrains with large elevation features, such as steep hills, ditches, and berms. One way to protect them against rollovers is ruggedization through the use of industrial-grade parts and physical modifications. However, this solution can be prohibitively expensive for academic research labs. Our key insight is that a software-based rollover-prevention system (RPS) enables the use of commercial-off-the-shelf hardware parts that are cheaper than their industrial counterparts, thus reducing overall cost. In this paper, we present HOUND, a small-scale, inexpensive, off-road autonomy platform that can handle challenging outdoor terrains at high speeds through the integration of an RPS. HOUND is integrated with a complete stack for perception and control, geared towards aggressive offroad driving. We deploy HOUND in the real world, at high speeds, on four different terrains covering 50 km of driving and highlight its utility in preventing rollovers and traversing difficult terrain. Additionally, through integration with BeamNG, a state-of-the-art driving simulator, we demonstrate a significant reduction in rollovers without compromising turning ability across a series of simulated experiments. Supplementary material can be found on our website, where we will also release all design documents for the platform: https://sites.google.com/view/prl-hound .
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Instruction-Following Evaluation for Large Language Models
Authors:
Jeffrey Zhou,
Tianjian Lu,
Swaroop Mishra,
Siddhartha Brahma,
Sujoy Basu,
Yi Luan,
Denny Zhou,
Le Hou
Abstract:
One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval…
▽ More
One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval (IFEval) for large language models. IFEval is a straightforward and easy-to-reproduce evaluation benchmark. It focuses on a set of "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times". We identified 25 types of those verifiable instructions and constructed around 500 prompts, with each prompt containing one or more verifiable instructions. We show evaluation results of two widely available LLMs on the market. Our code and data can be found at https://github.com/google-research/google-research/tree/master/instruction_following_eval
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning
Authors:
Liyiming Ke,
Yunchu Zhang,
Abhay Deshpande,
Siddhartha Srinivasa,
Abhishek Gupta
Abstract:
We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage…
▽ More
We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.
△ Less
Submitted 3 June, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Conformal Drug Property Prediction with Density Estimation under Covariate Shift
Authors:
Siddhartha Laghuvarapu,
Zhen Lin,
Jimeng Sun
Abstract:
In drug discovery, it is vital to confirm the predictions of pharmaceutical properties from computational models using costly wet-lab experiments. Hence, obtaining reliable uncertainty estimates is crucial for prioritizing drug molecules for subsequent experimental validation. Conformal Prediction (CP) is a promising tool for creating such prediction sets for molecular properties with a coverage g…
▽ More
In drug discovery, it is vital to confirm the predictions of pharmaceutical properties from computational models using costly wet-lab experiments. Hence, obtaining reliable uncertainty estimates is crucial for prioritizing drug molecules for subsequent experimental validation. Conformal Prediction (CP) is a promising tool for creating such prediction sets for molecular properties with a coverage guarantee. However, the exchangeability assumption of CP is often challenged with covariate shift in drug discovery tasks: Most datasets contain limited labeled data, which may not be representative of the vast chemical space from which molecules are drawn. To address this limitation, we propose a method called CoDrug that employs an energy-based model leveraging both training data and unlabelled data, and Kernel Density Estimation (KDE) to assess the densities of a molecule set. The estimated densities are then used to weigh the molecule samples while building prediction sets and rectifying for distribution shift. In extensive experiments involving realistic distribution drifts in various small-molecule drug discovery tasks, we demonstrate the ability of CoDrug to provide valid prediction sets and its utility in addressing the distribution shift arising from de novo drug design models. On average, using CoDrug can reduce the coverage gap by over 35% when compared to conformal prediction sets not adjusted for covariate shift.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Online Resource Sharing via Dynamic Max-Min Fairness: Efficiency, Robustness and Non-Stationarity
Authors:
Giannis Fikioris,
Siddhartha Banerjee,
Éva Tardos
Abstract:
We study the allocation of shared resources over multiple rounds among competing agents, via a dynamic max-min fair (DMMF) mechanism: the good in each round is allocated to the requesting agent with the least number of allocations received to date. Previous work has shown that when an agent has i.i.d. values across rounds, then in the worst case, she can never get more than a constant strictly les…
▽ More
We study the allocation of shared resources over multiple rounds among competing agents, via a dynamic max-min fair (DMMF) mechanism: the good in each round is allocated to the requesting agent with the least number of allocations received to date. Previous work has shown that when an agent has i.i.d. values across rounds, then in the worst case, she can never get more than a constant strictly less than $1$ fraction of her ideal utility -- her highest achievable utility given her nominal share of resources. Moreover, an agent can achieve at least half her utility under carefully designed `pseudo-market' mechanisms, even though other agents may act in an arbitrary (possibly adversarial and collusive) manner.
We show that this robustness guarantee also holds under the much simpler DMMF mechanism. More significantly, under mild assumptions on the value distribution, we show that DMMF in fact allows each agent to realize a $1 - o(1)$ fraction of her ideal utility, despite arbitrary behavior by other agents. We achieve this by characterizing the utility achieved under a richer space of strategies, wherein an agent can tune how aggressive to be in requesting the item. Our new strategies also allow us to handle settings where an agent's values are correlated across rounds, thereby allowing an adversary to predict and block her future values. We prove that again by tuning one's aggressiveness, an agent can guarantee $Ω(γ)$ fraction of her ideal utility, where $γ\in [0, 1]$ is a parameter that quantifies dependence across rounds (with $γ= 1$ indicating full independence and lower values indicating more correlation). Finally, we extend our efficiency results to the case of reusable resources, where an agent might need to hold the item over multiple rounds to receive utility.
△ Less
Submitted 13 February, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
On the Rational Degree of Boolean Functions and Applications
Authors:
Vishnu Iyer,
Siddhartha Jain,
Matt Kovacs-Deak,
Vinayak M. Kumar,
Luke Schaeffer,
Daochen Wang,
Michael Whitmeyer
Abstract:
We study a natural complexity measure of Boolean functions known as the (exact) rational degree. For total functions $f$, it is conjectured that $\mathrm{rdeg}(f)$ is polynomially related to $\mathrm{deg}(f)$, where $\mathrm{deg}(f)$ is the Fourier degree. Towards this conjecture, we show that symmetric functions have rational degree at least $\mathrm{deg}(f)/2$ and monotone functions have rationa…
▽ More
We study a natural complexity measure of Boolean functions known as the (exact) rational degree. For total functions $f$, it is conjectured that $\mathrm{rdeg}(f)$ is polynomially related to $\mathrm{deg}(f)$, where $\mathrm{deg}(f)$ is the Fourier degree. Towards this conjecture, we show that symmetric functions have rational degree at least $\mathrm{deg}(f)/2$ and monotone functions have rational degree at least $\sqrt{\mathrm{deg}(f)}$. We observe that both of these lower bounds are tight. In addition, we show that all read-once depth-$d$ Boolean formulae have rational degree at least $Ω(\mathrm{deg}(f)^{1/d})$. Furthermore, we show that almost every Boolean function on $n$ variables has rational degree at least $n/2 - O(\sqrt{n})$.
In contrast to total functions, we exhibit partial functions that witness unbounded separations between rational and approximate degree, in both directions. As a consequence, we show that for quantum computers, post-selection and bounded-error are incomparable resources in the black-box model.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Explorable Mesh Deformation Subspaces from Unstructured Generative Models
Authors:
Arman Maesumi,
Paul Guerrero,
Vladimir G. Kim,
Matthew Fisher,
Siddhartha Chaudhuri,
Noam Aigerman,
Daniel Ritchie
Abstract:
Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not…
▽ More
Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not relevant to the input shapes, and linear paths through them often lead to sub-optimal shape transitions. Furthermore, one would ideally be able to explore variations in the original high-quality meshes used to train the generative model, not its lower-quality output geometry. In this paper, we present a method to explore variations among a given set of landmark shapes by constructing a map** from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model. We first describe how to find a map** that spans the set of input landmark shapes and exhibits smooth variations between them. We then show how to turn the variations in this subspace into deformation fields, to transfer those variations to high-quality meshes for the landmark shapes. Our results show that our method can produce visually-pleasing and easily-navigable 2D exploration spaces for several different shape categories, especially as compared to prior work on learning deformation spaces for 3D shapes.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
NEWTON: Are Large Language Models Capable of Physical Reasoning?
Authors:
Yi Ru Wang,
Jiafei Duan,
Dieter Fox,
Siddhartha Srinivasa
Abstract:
Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repositor…
▽ More
Large Language Models (LLMs), through their contextualized representations, have been empirically proven to encapsulate syntactic, semantic, word sense, and common-sense knowledge. However, there has been limited exploration of their physical reasoning abilities, specifically concerning the crucial attributes for comprehending everyday objects. To address this gap, we introduce NEWTON, a repository and benchmark for evaluating the physics reasoning skills of LLMs. Further, to enable domain-specific adaptation of this benchmark, we present a pipeline to enable researchers to generate a variant of this benchmark that has been customized to the objects and attributes relevant for their application. The NEWTON repository comprises a collection of 2800 object-attribute pairs, providing the foundation for generating infinite-scale assessment templates. The NEWTON benchmark consists of 160K QA questions, curated using the NEWTON repository to investigate the physical reasoning capabilities of several mainstream language models across foundational, explicit, and implicit reasoning tasks. Through extensive empirical analysis, our results highlight the capabilities of LLMs for physical reasoning. We find that LLMs like GPT-4 demonstrate strong reasoning capabilities in scenario-based tasks but exhibit less consistency in object-attribute reasoning compared to humans (50% vs. 84%). Furthermore, the NEWTON platform demonstrates its potential for evaluating and enhancing language models, paving the way for their integration into physically grounded settings, such as robotic manipulation. Project site: https://newtonreasoning.github.io
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
An operator preconditioning perspective on training in physics-informed machine learning
Authors:
Tim De Ryck,
Florent Bonnet,
Siddhartha Mishra,
Emmanuel de Bézenac
Abstract:
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitia…
▽ More
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
△ Less
Submitted 3 May, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Molecular De Novo Design through Transformer-based Reinforcement Learning
Authors:
Pengcheng Xu,
Tao Feng,
Tianfan Fu,
Siddhartha Laghuvarapu,
Jimeng Sun
Abstract:
In this work, we introduce a method to fine-tune a Transformer-based generative model for molecular de novo design. Leveraging the superior sequence learning capacity of Transformers over Recurrent Neural Networks (RNNs), our model can generate molecular structures with desired properties effectively. In contrast to the traditional RNN-based models, our proposed method exhibits superior performanc…
▽ More
In this work, we introduce a method to fine-tune a Transformer-based generative model for molecular de novo design. Leveraging the superior sequence learning capacity of Transformers over Recurrent Neural Networks (RNNs), our model can generate molecular structures with desired properties effectively. In contrast to the traditional RNN-based models, our proposed method exhibits superior performance in generating compounds predicted to be active against various biological targets, capturing long-term dependencies in the molecular structure sequence. The model's efficacy is demonstrated across numerous tasks, including generating analogues to a query structure and producing compounds with particular attributes, outperforming the baseline RNN-based methods. Our approach can be used for scaffold hop**, library expansion starting from a single molecule, and generating compounds with high predicted activity against biological targets.
△ Less
Submitted 8 March, 2024; v1 submitted 8 October, 2023;
originally announced October 2023.
-
SLM: Bridge the thin gap between speech and text foundation models
Authors:
Mingqiu Wang,
Wei Han,
Izhak Shafran,
Zelin Wu,
Chung-Cheng Chiu,
Yuan Cao,
Yongqiang Wang,
Nanxin Chen,
Yu Zhang,
Hagen Soltau,
Paul Rubenstein,
Lukas Zilka,
Dian Yu,
Zhong Meng,
Golan Pundak,
Nikhil Siddhartha,
Johan Schalkwyk,
Yonghui Wu
Abstract:
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achiev…
▽ More
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their capabilities, and only trains a simple adapter with just 1\% (156M) of the foundation models' parameters. This adaptation not only leads SLM to achieve strong performance on conventional tasks such as speech recognition (ASR) and speech translation (AST), but also introduces the novel capability of zero-shot instruction-following for more diverse tasks: given a speech input and a text instruction, SLM is able to perform unseen generation tasks including contextual biasing ASR using real-time context, dialog generation, speech continuation, and question answering, etc. Our approach demonstrates that the representational gap between pretrained speech and language models might be narrower than one would expect, and can be bridged by a simple adaptation mechanism. As a result, SLM is not only efficient to train, but also inherits strong capabilities already acquired in foundation models of different modalities.
△ Less
Submitted 29 September, 2023;
originally announced October 2023.
-
Efficient tensor network simulation of IBM's largest quantum processors
Authors:
Siddhartha Patra,
Saeed S. Jahromi,
Sukhbinder Singh,
Roman Orus
Abstract:
We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -- specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023) -- using graph-base…
▽ More
We show how quantum-inspired 2d tensor networks can be used to efficiently and accurately simulate the largest quantum processors from IBM, namely Eagle (127 qubits), Osprey (433 qubits) and Condor (1121 qubits). We simulate the dynamics of a complex quantum many-body system -- specifically, the kicked Ising experiment considered recently by IBM in Nature 618, p. 500-505 (2023) -- using graph-based Projected Entangled Pair States (gPEPS), which was proposed by some of us in PRB 99, 195105 (2019). Our results show that simple tensor updates are already sufficient to achieve very large unprecedented accuracy with remarkably low computational resources for this model. Apart from simulating the original experiment for 127 qubits, we also extend our results to 433 and 1121 qubits, and for evolution times around 8 times longer, thus setting a benchmark for the newest IBM quantum machines. We also report accurate simulations for infinitely-many qubits. Our results show that gPEPS are a natural tool to efficiently simulate quantum computers with an underlying lattice-based qubit connectivity, such as all quantum processors based on superconducting qubits.
△ Less
Submitted 2 April, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.